Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)
On Tue, Nov 18, 2014 at 04:34:12PM -0800, Mike Stump wrote: > On Nov 18, 2014, at 3:42 PM, Jakub Jelinek wrote: > > No, I'm not touching tmp array at all in that case, only look at vp > > individual bytes. Either they are all 0, or all 0xff, or I return NULL. > > Oh, sorry, I misread where the break; in your code was going. I might have > been misled by: > > > - gcc_assert (GET_MODE_PRECISION (outer_submode) > > - <= MAX_BITSIZE_MODE_ANY_INT); > > in your patch. You don’t need that anymore, do you? If not, can you remove > this part. I thought the assert is unnecessary given the condition just a few lines above it. But can keep it, perhaps gcc_checking_assert would be enough, and hopefully compiler optimizes it away completely. > > The rest looks like normal rtl/vector code, I don’t see anything wrong with > it. Jakub
Re: [PATCH, testsuite] Add bind_pic_locally to certain tests
* gcc.dg/pure-2.c: Update line numbers. diff --git a/gcc/testsuite/gcc.dg/pure-2.c b/gcc/testsuite/gcc.dg/pure-2.c index 638bd7c..fe6e2bc 100644 --- a/gcc/testsuite/gcc.dg/pure-2.c +++ b/gcc/testsuite/gcc.dg/pure-2.c @@ -8,14 +8,14 @@ extern int v; /* Trivial. */ int foo1(int a) /* { dg-bogus "normally" "detect pure candidate" } */ -{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "9" } */ +{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "10" } */ return v; } /* Loops known to be normally and extern const calls should be safe. */ int __attribute__ ((noinline)) foo2(int n) /* { dg-bogus "normally" "detect pure candidate" } */ -{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "16" } */ +{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "17" } */ int ret = 0; int i; for (i=0; i
Re: [PATCH, PR62167] Fix tail-merge pass for dead type-unsafe code
On Tue, 18 Nov 2014, Tom de Vries wrote: > Richard, > > this (trunk) patch fixes PR62167. > > The patch fixes a problem that triggers with the test-case on the 4.8 branch, > when tail-merge makes a dead type-unsafe load alive. > > I'm not able to reproduce this bug on 4.9 and trunk with the same test-case. > On those branches, the tail-merge already does not happen. > > The reason for the difference is as follows: With 4.8 the two phi arguments of > the phi in the tail block are value-numbered identically: > ... > SCC consists of: p_14 > Value numbering p_14 stmt = p_14 = MEM[(struct head *)_13].first; > Setting value number of p_14 to p_14 (changed) > > SCC consists of: p_15 > Value numbering p_15 stmt = p_15 = _13->next; > Setting value number of p_15 to p_14 (changed) > ... > > With 4.9 (and trunk), that's not the case: > ... > SCC consists of: p_14 > Value numbering p_14 stmt = p_14 = MEM[(struct head *)&heads][k.1_9].first; > Setting value number of p_14 to p_14 (changed) > > SCC consists of: p_15 > Value numbering p_15 stmt = p_15 = _13->next; > Setting value number of p_15 to p_15 (changed) > ... > > I'm not sure the bug triggers on trunk and 4.9, but I see no reason why it > could not trigger, so I'd prefer to apply the patch to 4.9 and trunk as well. > > The patch introduces an xfail for pr51879-12.c. I can follow up with a patch > to improve upon that, but I think that's better limited to trunk only. Yeah. > Bootstrapped and reg-tested on x86_64/trunk. > > OK for trunk/stage3, 4.8, 4.9? Ok. Note that this goes well with disabling the (re-)use of value-numbering in tail-merging (which causes me quite some headaches...). On trunk it shouldn't be necessary as much as it was earlier as we now fully propagate constants and copies during FRE/PRE elimination. Thanks, Richard.
Re: [PATCH] Fix -fsanitize=bool -fnon-call-exceptions (PR sanitizer/63913)
On Tue, 18 Nov 2014, Jakub Jelinek wrote: > Hi! > > This patch fixes instrumentation of bool/enum loads if they could > throw. We want to keep the EH on the load, and push > further statements on the fallthru edge. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2014-11-18 Jakub Jelinek > > PR sanitizer/63913 > * ubsan.c: Include tree-eh.h. > (instrument_bool_enum_load): Handle loads that can throw. > > * g++.dg/ubsan/pr63913.C: New test. > > --- gcc/ubsan.c.jj2014-11-14 15:11:49.0 +0100 > +++ gcc/ubsan.c 2014-11-18 09:29:20.409209556 +0100 > @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3. > #include "dfp.h" > #include "builtins.h" > #include "tree-object-size.h" > +#include "tree-eh.h" > > /* Map from a tree to a VAR_DECL tree. */ > > @@ -1135,7 +1136,9 @@ instrument_bool_enum_load (gimple_stmt_i >|| TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME) > return; > > + bool can_throw = stmt_could_throw_p (stmt); >location_t loc = gimple_location (stmt); > + tree lhs = gimple_assign_lhs (stmt); >tree ptype = build_pointer_type (TREE_TYPE (rhs)); >tree atype = reference_alias_ptr_type (rhs); >gimple g = gimple_build_assign (make_ssa_name (ptype, NULL), > @@ -1145,9 +1148,24 @@ instrument_bool_enum_load (gimple_stmt_i >tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g), >build_int_cst (atype, 0)); >tree urhs = make_ssa_name (utype, NULL); > - g = gimple_build_assign (urhs, mem); > - gimple_set_location (g, loc); > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > + if (can_throw) > +{ > + gimple_assign_set_lhs (stmt, urhs); > + g = gimple_build_assign_with_ops (NOP_EXPR, lhs, urhs, NULL_TREE); > + gimple_set_location (g, loc); > + edge e = find_fallthru_edge (gimple_bb (stmt)->succs); > + gsi_insert_on_edge_immediate (e, g); > + gimple_assign_set_rhs_with_ops (gsi, MEM_REF, mem, NULL_TREE); Please use gimple_assign_set_rhs_from_tree for single-rhs trees. Ok with that change. Thanks, RIchard. > + update_stmt (stmt); > + *gsi = gsi_for_stmt (g); > + g = stmt; > +} > + else > +{ > + g = gimple_build_assign (urhs, mem); > + gimple_set_location (g, loc); > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > +} >minv = fold_convert (utype, minv); >maxv = fold_convert (utype, maxv); >if (!integer_zerop (minv)) > @@ -1169,8 +1187,11 @@ instrument_bool_enum_load (gimple_stmt_i >gimple_set_location (g, loc); >gsi_insert_after (gsi, g, GSI_NEW_STMT); > > - gimple_assign_set_rhs_with_ops (&gsi2, NOP_EXPR, urhs, NULL_TREE); > - update_stmt (stmt); > + if (!can_throw) > +{ > + gimple_assign_set_rhs_with_ops (&gsi2, NOP_EXPR, urhs, NULL_TREE); > + update_stmt (stmt); > +} > >gsi2 = gsi_after_labels (then_bb); >if (flag_sanitize_undefined_trap_on_error) > --- gcc/testsuite/g++.dg/ubsan/pr63913.C.jj 2014-11-18 09:34:10.603981487 > +0100 > +++ gcc/testsuite/g++.dg/ubsan/pr63913.C 2014-11-18 09:33:47.0 > +0100 > @@ -0,0 +1,12 @@ > +// PR sanitizer/63913 > +// { dg-do compile } > +// { dg-options "-fsanitize=bool -fnon-call-exceptions" } > + > +struct B { B (); ~B (); }; > + > +double > +foo (bool *x) > +{ > + B b; > + return *x; > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)
On Tue, 18 Nov 2014, Jakub Jelinek wrote: > Hi! > > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because > they are never used for integer arithmetics (and there is no way > to represent all their values in RTL if not using CONST_WIDE_INT). > As the following testcase shows, simplify_immed_subreg can be called > with such modes though, e.g. trying to forward propagate a CONST_VECTOR > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear > in the IL directly) into a SUBREG_REG. So we have (subreg:OI (reg:V4DF ...) ...) for example? What do we end doing with that OI mode subreg? (why can't we simply use the V4DF one?) > The following patch instead of ICE handles the most common cases (all 0 > and all 1 CONST_VECTORs) and returns NULL otherwise. > > Before wide-int got merged, the testcase worked, though the code didn't > bother checking anything, just created 0 or constm1_rtx for the two cases > that could happen and if something else appeared, could just return what > matched low TImode (or DImode for -m32). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Looks ok to me. Not sure if the zero/all-ones case really happens that much to be worth special-casing - I think you could use fixed_wide_int and simply see if the result is representable in a CONST_INT/CONST_DOUBLE? Can you try if that works? It looks like the patch may be smaller for that. Thanks, Richard. > 2014-11-18 Jakub Jelinek > > PR target/63910 > * simplify-rtx.c (simplify_immed_subreg): For integer modes > wider than MAX_BITSIZE_MODE_ANY_INT, handle all zeros and all ones > and for other values return NULL_RTX. > > * gcc.target/i386/pr63910.c: New test. > > --- gcc/simplify-rtx.c.jj 2014-11-11 00:06:19.0 +0100 > +++ gcc/simplify-rtx.c2014-11-18 10:17:01.198668555 +0100 > @@ -5505,6 +5505,21 @@ simplify_immed_subreg (machine_mode oute > HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / > HOST_BITS_PER_WIDE_INT]; > wide_int r; > > + if (GET_MODE_PRECISION (outer_submode) > MAX_BITSIZE_MODE_ANY_INT) > + { > + /* Handle just all zeros and all ones CONST_VECTORs in > +this case. */ > + if ((vp[0] & value_mask) == 0) > + elems[elem] = const0_rtx; > + else if ((vp[0] & value_mask) == value_mask) > + elems[elem] = constm1_rtx; > + else > + return NULL_RTX; > + for (i = value_bit; i < elem_bitsize; i += value_bit) > + if ((vp[i / value_bit] & value_mask) != (vp[0] & value_mask)) > + return NULL_RTX; > + break; > + } > for (u = 0; u < units; u++) > { > unsigned HOST_WIDE_INT buf = 0; > @@ -5516,8 +5531,6 @@ simplify_immed_subreg (machine_mode oute > tmp[u] = buf; > base += HOST_BITS_PER_WIDE_INT; > } > - gcc_assert (GET_MODE_PRECISION (outer_submode) > - <= MAX_BITSIZE_MODE_ANY_INT); > r = wide_int::from_array (tmp, units, > GET_MODE_PRECISION (outer_submode)); > elems[elem] = immed_wide_int_const (r, outer_submode); > --- gcc/testsuite/gcc.target/i386/pr63910.c.jj2014-11-18 > 10:12:24.282659318 +0100 > +++ gcc/testsuite/gcc.target/i386/pr63910.c 2014-11-18 10:12:00.0 > +0100 > @@ -0,0 +1,12 @@ > +/* PR target/63910 */ > +/* { dg-do compile } */ > +/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */ > + > +extern void bar (float *c); > + > +void > +foo (void) > +{ > + float c[1024] = { }; > + bar (c); > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)
On Tue, 18 Nov 2014, Jakub Jelinek wrote: > Hi! > > Apparently, expand_expr with EXPR_WRITE can return > a SUBREG with SUBREG_PROMOTED_VAR_P set on it. For Huh, that looks bogus to me. But of course I know nothing (read: not enough) to really tell. Eric? Richard. > UBSAN_CHECK_{ADD,SUB,MUL} expansion, I've been doing just > emit_move_insn into it, which apparently is wrong in that case, > store_expr instead uses convert_move for it. The > {ADD,SUB,MUL}_OVERFLOW (i.e. __builtin_*_overflow) expansion > shouldn't need it, as the result is complex and complex integers > aren't promoted that way. As store_expr* uses a tree expression > to store, while I have rtx, I just wrote a short helper function > for this. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2014-11-18 Jakub Jelinek > > PR sanitizer/63520 > * internal-fn.c (expand_ubsan_result_store): New function. > (expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow): > Use it instead of just emit_move_insn. > > * c-c++-common/ubsan/pr63520.c: New test. > > --- gcc/internal-fn.c.jj 2014-11-12 13:28:47.0 +0100 > +++ gcc/internal-fn.c 2014-11-18 15:35:46.395916823 +0100 > @@ -395,6 +395,21 @@ expand_arith_overflow_result_store (tree >write_complex_part (target, lres, false); > } > > +/* Helper for expand_*_overflow. Store RES into TARGET. */ > + > +static void > +expand_ubsan_result_store (rtx target, rtx res) > +{ > + if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target)) > +/* If this is a scalar in a register that is stored in a wider mode > + than the declared mode, compute the result into its declared mode > + and then convert to the wider mode. Our value is the computed > + expression. */ > +convert_move (SUBREG_REG (target), res, SUBREG_PROMOTED_SIGN (target)); > + else > +emit_move_insn (target, res); > +} > + > /* Add sub/add overflow checking to the statement STMT. > CODE says whether the operation is +, or -. */ > > @@ -809,7 +824,7 @@ expand_addsub_overflow (location_t loc, >if (lhs) > { >if (is_ubsan) > - emit_move_insn (target, res); > + expand_ubsan_result_store (target, res); >else > { > if (do_xor) > @@ -904,7 +919,7 @@ expand_neg_overflow (location_t loc, tre >if (lhs) > { >if (is_ubsan) > - emit_move_insn (target, res); > + expand_ubsan_result_store (target, res); >else > expand_arith_overflow_result_store (lhs, target, mode, res); > } > @@ -1590,7 +1605,7 @@ expand_mul_overflow (location_t loc, tre >if (lhs) > { >if (is_ubsan) > - emit_move_insn (target, res); > + expand_ubsan_result_store (target, res); >else > expand_arith_overflow_result_store (lhs, target, mode, res); > } > --- gcc/testsuite/c-c++-common/ubsan/pr63520.c.jj 2014-11-18 > 15:40:07.271273710 +0100 > +++ gcc/testsuite/c-c++-common/ubsan/pr63520.c2014-11-18 > 15:40:40.971673904 +0100 > @@ -0,0 +1,16 @@ > +/* PR sanitizer/63520 */ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=undefined" } */ > + > +int a; > + > +void > +foo (void) > +{ > + while (1) > +{ > + if (a == 1) > + break; > + a -= 1; > +} > +} > > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
[PATCH] Fix PR63844
The following improves code-generation for PR63844 by using a restrict qualified reference type for the OMP receiver decl. This improves alias analysis and points-to analysis enough to usually allow invariant and store motion where that was possible in the non-split-out variant. Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? Sadly I don't have a good testcase (for example one that wasn't vectorized before but is now). We can add one later - ISTR there were a few other bugs about not vectorizing with -fopenmp, I'll try to find them again and see whether they are fixed. Thanks, Richard. 2014-11-19 Richard Biener PR tree-optimization/63844 * omp-low.c (fixup_child_record_type): Use a restrict qualified referece type for the receiver parameter. Index: gcc/omp-low.c === --- gcc/omp-low.c (revision 217692) +++ gcc/omp-low.c (working copy) @@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct layout_type (type); } - TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type); + TREE_TYPE (ctx->receiver_decl) += build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT); } /* Instantiate decls as necessary in CTX to satisfy the data sharing
Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)
On Wed, Nov 19, 2014 at 10:02:18AM +0100, Richard Biener wrote: > On Tue, 18 Nov 2014, Jakub Jelinek wrote: > > Apparently, expand_expr with EXPR_WRITE can return > > a SUBREG with SUBREG_PROMOTED_VAR_P set on it. For > > Huh, that looks bogus to me. But of course I know nothing > (read: not enough) to really tell. Eric? I've tried to look where it comes from, and it dates back to r2xxx or so, so forever. And store_expr has a large: else if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target)) /* If this is a scalar in a register that is stored in a wider mode than the declared mode, compute the result into its declared mode and then convert to the wider mode. Our value is the computed expression. */ handling block. For targets with only word sized operations something like that actually makes a lot of sense, I have just not been aware of that. Jakub
Re: [PATCH] Fix PR63844
On Wed, Nov 19, 2014 at 10:07:07AM +0100, Richard Biener wrote: > > The following improves code-generation for PR63844 by using > a restrict qualified reference type for the OMP receiver decl. > This improves alias analysis and points-to analysis enough to > usually allow invariant and store motion where that was possible > in the non-split-out variant. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? > > Sadly I don't have a good testcase (for example one that wasn't > vectorized before but is now). We can add one later - ISTR there > were a few other bugs about not vectorizing with -fopenmp, I'll > try to find them again and see whether they are fixed. Most of those PRs should have openmp keyword. > 2014-11-19 Richard Biener > > PR tree-optimization/63844 > * omp-low.c (fixup_child_record_type): Use a restrict qualified > referece type for the receiver parameter. Ok, thanks. > --- gcc/omp-low.c (revision 217692) > +++ gcc/omp-low.c (working copy) > @@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct >layout_type (type); > } > > - TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type); > + TREE_TYPE (ctx->receiver_decl) > += build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT); > } > > /* Instantiate decls as necessary in CTX to satisfy the data sharing Jakub
Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler
On 14/11/14 15:12, Maxim Kuvyrkov wrote: On Nov 14, 2014, at 8:38 AM, Jeff Law wrote: On 10/20/14 22:06, Maxim Kuvyrkov wrote: Hi, Ramana, this change requires benchmarking, which I can't easily do at the moment. I would appreciate any benchmarking results that you can share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be tuned/confirmed for Cortex-A15. What were the results of that benchmarking? IIRC I tabled reviewing this work waiting for those results (and I probably should have let you know that. Sorry, my bad there). I don't have the benchmarking results yet, and I was hoping for ARM to help with getting the numbers. The arm maintainers still need to OK the arm-specific portion of the patch, which, I imagine, will happen only of benchmark scores improve. I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees you gave me links to against the equivalent version on trunk. The results with SPEC2k on A15 were in the noise with the default value for PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still waiting on results for values 0, 1 and 3 and hopefully something will come back soon for SPEC2k. @@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void) return issue_rate > 1 ? issue_rate : 0; } +/* Enable modeling of Cortex-A15 L2 auto-prefetcher. */ +static int +arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index) +{ + switch (arm_tune) +{ +case cortexa15: + return autopref_multipass_dfa_lookahead_guard (insn, ready_index); + +default: + return 0; +} +} + It would be better to have this as a flag in the tuning tables rather than hardcoding for a core here. The backend has been moving in that direction for all core centric information and it is preferable that be continued. So this logic here should just be if (current_tune->multipass_lookahead) return autopref_multipass_lookahead_guard (insn, ready_index); else return 0; regards Ramana ... Can this be built on top of Bin's work for insn fusion? There's a lot of commonality in the structure of the insns you care about. He's already got a nice little priority function that I think you could utilize to to ensure the insns with smaller offsets fire first. I would argue that macro-fusion should have been implemented the way autopref_model is -- via targetm.sched.first_cycle_multipass_dfa_lookahead_guard hook. To implement the autopref model I cleaned up and generalized existing infrastructure (max_issue and dfa_lookahead_guard hook) instead of adding yet another decision-making primitive to the scheduler. My biggest concern would be sched2 coming along and undoing that work since you're not going to fuse those into move-multiple types of instructions. The autoprefetcher will be active only during sched2. It is disabled during sched1 by the fact that max_issue is not used when scheduling for register pressure. Thanks, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
> > Sorry for missing the point. It seems to me that 't2' here will conflict > > with > condition of the pattern *movhi_insn_arch4: > >"TARGET_ARM > > && arm_arch4 > > && (register_operand (operands[0], HImode) > > || register_operand (operands[1], HImode))" > > > > #define TARGET_ARM (! TARGET_THUMB) > > /* 32-bit Thumb-2 code. */ > > #define TARGET_THUMB2 (TARGET_THUMB && > arm_arch_thumb2) > > > > Bah, Indeed ! - I misremembered the t2 there, my mistake. > > Yes you are right there, but what I'd like you to do is to use that mechanism > rather than putting all this logic in the predicate. > > So, I'd prefer you to add a v6t2 to the values for the "arch" attribute, > don't forget > to update the comments above. > > and in arch_enabled you need to enforce this with > > (and (eq_attr "arch" "v6t2") >(match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2")) >(const_string "yes") > > And in the pattern use v6t2 ... > > arm_arch_thumb2 implies that this is at the architecture level of v6t2. > Therefore TARGET_ARM && arm_arch_thumb2 implies ARM state. Hi Ramana, Thank you for your suggestions. I rebased the patch on the latest trunk and updated it accordingly. As this patch will not work for architectures older than armv6t2, I also prefer Thomas's patch to fix for them. I am currently performing test for this patch. Assuming no issues pops up, OK for the trunk? And is it necessary to backport this patch to the 4.8 & 4.9 branches? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217717) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2014-11-19 Felix Yang + Shanyao Chen + + PR target/59593 + * config/arm/arm.md (define_attr "arch"): Add v6t2. + (define_attr "arch_enabled"): Add test for the above. + (*movhi_insn_arch4): Add new alternative. + 2014-11-18 Felix Yang * config/aarch64/aarch64.c (doloop_end): New pattern. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 217717) +++ gcc/config/arm/arm.md (working copy) @@ -125,9 +125,10 @@ ; This can be "a" for ARM, "t" for either of the Thumbs, "32" for ; TARGET_32BIT, "t1" or "t2" to specify a specific Thumb mode. "v6" ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without -; arm_arch6. This attribute is used to compute attribute "enabled", -; use type "any" to enable an alternative in all cases. -(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3" +; arm_arch6. "v6t2" for Thumb-2 with arm_arch6. This attribute is +; used to compute attribute "enabled", use type "any" to enable an +; alternative in all cases. +(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3" (const_string "any")) (define_attr "arch_enabled" "no,yes" @@ -162,6 +163,10 @@ (match_test "TARGET_32BIT && !arm_arch6")) (const_string "yes") +(and (eq_attr "arch" "v6t2") + (match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2")) +(const_string "yes") + (and (eq_attr "arch" "avoid_neon_for_64bits") (match_test "TARGET_NEON") (not (match_test "TARGET_PREFER_NEON_64BITS"))) @@ -6288,8 +6293,8 @@ ;; Pattern to recognize insn generated default case above (define_insn "*movhi_insn_arch4" - [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,m,r") - (match_operand:HI 1 "general_operand" "rIk,K,r,mi"))] + [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r") + (match_operand:HI 1 "general_operand" "rIk,K,n,r,mi"))] "TARGET_ARM && arm_arch4 && (register_operand (operands[0], HImode) @@ -6297,16 +6302,19 @@ "@ mov%?\\t%0, %1\\t%@ movhi mvn%?\\t%0, #%B1\\t%@ movhi + movw%?\\t%0, %1\\t%@ movhi str%(h%)\\t%1, %0\\t%@ movhi ldr%(h%)\\t%0, %1\\t%@ movhi" [(set_attr "predicable" "yes") - (set_attr "pool_range" "*,*,*,256") - (set_attr "neg_pool_range" "*,*,*,244") + (set_attr "pool_range" "*,*,*,*,256") + (set_attr "neg_pool_range" "*,*,*,*,244") + (set_attr "arch" "*,*,v6t2,*,*") (set_attr_alternative "type" [(if_then_else (match_operand 1 "const_int_operand" "") (const_string "mov_imm" ) (const_string "mov_reg")) (const_string "mvn_imm") + (const_string "mov_imm") (const_string "store1") (const_string "load1")])] ) arm-patch-v3.diff Description: arm-patch-v3.diff
Re: [PATCH] Fix simd clone vectorization with EH (PR tree-optimization/63915)
On Tue, 18 Nov 2014, Jakub Jelinek wrote: > Hi! > > Simd clone vectorization uses vect_finish_stmt_generation which > handles adding the new calls properly to EH tables, but afterwards > we replace the original call with a normal assignment, and if the original > call can throw, we need to remove it from the EH tables. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, > ok for trunk? Ok. Thanks, Richard. > 2014-11-18 Jakub Jelinek > > PR tree-optimization/63915 > * tree-vect-stmts.c (vectorizable_simd_clone_call): Pass > true instead of false as last argument to gsi_replace. > > * c-c++-common/gomp/pr60823-4.c: New test. > > --- gcc/tree-vect-stmts.c.jj 2014-11-14 00:10:39.0 +0100 > +++ gcc/tree-vect-stmts.c 2014-11-18 17:02:15.635257023 +0100 > @@ -3195,7 +3195,7 @@ vectorizable_simd_clone_call (gimple stm >set_vinfo_for_stmt (new_stmt, stmt_info); >set_vinfo_for_stmt (stmt, NULL); >STMT_VINFO_STMT (stmt_info) = new_stmt; > - gsi_replace (gsi, new_stmt, false); > + gsi_replace (gsi, new_stmt, true); >unlink_stmt_vdef (stmt); > >return true; > --- gcc/testsuite/c-c++-common/gomp/pr60823-4.c.jj2014-11-18 > 17:06:38.859319961 +0100 > +++ gcc/testsuite/c-c++-common/gomp/pr60823-4.c 2014-11-18 > 17:06:34.602383632 +0100 > @@ -0,0 +1,7 @@ > +/* PR tree-optimization/63915 */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_simd_clones } */ > +/* { dg-options "-O2 -fopenmp-simd" } */ > +/* { dg-additional-options "-fpic" { target fpic } } */ > + > +#include "pr60823-2.c" > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [PATCH, PR62167] Fix tail-merge pass for dead type-unsafe code
On Tue, 18 Nov 2014, Jeff Law wrote: > On 11/18/14 09:57, Tom de Vries wrote: > > Richard, > > > > this (trunk) patch fixes PR62167. > > > > The patch fixes a problem that triggers with the test-case on the 4.8 > > branch, when tail-merge makes a dead type-unsafe load alive. > > > > I'm not able to reproduce this bug on 4.9 and trunk with the same > > test-case. On those branches, the tail-merge already does not happen. > > > > The reason for the difference is as follows: With 4.8 the two phi > > arguments of the phi in the tail block are value-numbered identically: > > ... > > SCC consists of: p_14 > > Value numbering p_14 stmt = p_14 = MEM[(struct head *)_13].first; > > Setting value number of p_14 to p_14 (changed) > > > > SCC consists of: p_15 > > Value numbering p_15 stmt = p_15 = _13->next; > > Setting value number of p_15 to p_14 (changed) > > ... > > > > With 4.9 (and trunk), that's not the case: > > ... > > SCC consists of: p_14 > > Value numbering p_14 stmt = p_14 = MEM[(struct head *)&heads][k.1_9].first; > > Setting value number of p_14 to p_14 (changed) > > > > SCC consists of: p_15 > > Value numbering p_15 stmt = p_15 = _13->next; > > Setting value number of p_15 to p_15 (changed) > > ... > > > > I'm not sure the bug triggers on trunk and 4.9, but I see no reason why > > it could not trigger, so I'd prefer to apply the patch to 4.9 and trunk > > as well. > > > > The patch introduces an xfail for pr51879-12.c. I can follow up with a > > patch to improve upon that, but I think that's better limited to trunk > > only. > > > > Bootstrapped and reg-tested on x86_64/trunk. > So instead of simply disabling for anything with virtual operands, shouldn't > instead we be comparing the virtual operands and alias information? Seems to > me that if we did that properly, this would "just work". Right? But that would simply use operand_equal_p () Richard.
Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler
On Nov 19, 2014, at 12:27 PM, Ramana Radhakrishnan wrote: > > > On 14/11/14 15:12, Maxim Kuvyrkov wrote: >> On Nov 14, 2014, at 8:38 AM, Jeff Law wrote: >> >>> On 10/20/14 22:06, Maxim Kuvyrkov wrote: Hi, Ramana, this change requires benchmarking, which I can't easily do at >>> the moment. I would appreciate any benchmarking results that you can >>> share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH >>> needs to be tuned/confirmed for Cortex-A15. >>> What were the results of that benchmarking? IIRC I tabled reviewing this >>> work waiting for those results (and I probably should have let you know >>> that. Sorry, my bad there). >> >> I don't have the benchmarking results yet, and I was hoping for ARM to help >> with getting the numbers. The arm maintainers still need to OK the >> arm-specific portion of the patch, which, I imagine, will happen only of >> benchmark scores improve. > > > I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees > you gave me links to against the equivalent version on trunk. > > The results with SPEC2k on A15 were in the noise with the default value for > PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still waiting > on results for values 0, 1 and 3 and hopefully something will come back soon > for SPEC2k. I think there is a good chance that this optimization will not improve SPEC benchmarks, my main concern is to make sure SPECs don't regress. If SPECs are neutral and the patch gives good improvement (2.5 times faster) on another benchmark (STREAM), is this a good enough argument for commit? > > >> >> @@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void) >> return issue_rate > 1 ? issue_rate : 0; >> } >> >> +/* Enable modeling of Cortex-A15 L2 auto-prefetcher. */ >> +static int >> +arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index) >> +{ >> + switch (arm_tune) >> +{ >> +case cortexa15: >> + return autopref_multipass_dfa_lookahead_guard (insn, ready_index); >> + >> +default: >> + return 0; >> +} >> +} >> + > > > It would be better to have this as a flag in the tuning tables rather than > hardcoding for a core here. The backend has been moving in that direction for > all core centric information and it is preferable that be continued. > > So this logic here should just be > > if (current_tune->multipass_lookahead) > return autopref_multipass_lookahead_guard (insn, ready_index); > else > return 0; > OK, I'll convert this to use tuning tables. Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)
On Wed, 19 Nov 2014, Jakub Jelinek wrote: > On Wed, Nov 19, 2014 at 10:02:18AM +0100, Richard Biener wrote: > > On Tue, 18 Nov 2014, Jakub Jelinek wrote: > > > Apparently, expand_expr with EXPR_WRITE can return > > > a SUBREG with SUBREG_PROMOTED_VAR_P set on it. For > > > > Huh, that looks bogus to me. But of course I know nothing > > (read: not enough) to really tell. Eric? > > I've tried to look where it comes from, and it dates back to r2xxx > or so, so forever. > And store_expr has a large: > else if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target)) > /* If this is a scalar in a register that is stored in a wider mode >than the declared mode, compute the result into its declared mode >and then convert to the wider mode. Our value is the computed >expression. */ > handling block. For targets with only word sized operations something > like that actually makes a lot of sense, I have just not been aware of that. Ok. Then the patch is ok. Thanks, Richard.
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
On 19/11/14 09:29, Yangfei (Felix) wrote: Sorry for missing the point. It seems to me that 't2' here will conflict with condition of the pattern *movhi_insn_arch4: "TARGET_ARM && arm_arch4 && (register_operand (operands[0], HImode) || register_operand (operands[1], HImode))" #define TARGET_ARM (! TARGET_THUMB) /* 32-bit Thumb-2 code. */ #define TARGET_THUMB2 (TARGET_THUMB && arm_arch_thumb2) Bah, Indeed ! - I misremembered the t2 there, my mistake. Yes you are right there, but what I'd like you to do is to use that mechanism rather than putting all this logic in the predicate. So, I'd prefer you to add a v6t2 to the values for the "arch" attribute, don't forget to update the comments above. and in arch_enabled you need to enforce this with (and (eq_attr "arch" "v6t2") (match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2")) (const_string "yes") And in the pattern use v6t2 ... arm_arch_thumb2 implies that this is at the architecture level of v6t2. Therefore TARGET_ARM && arm_arch_thumb2 implies ARM state. Hi Ramana, Thank you for your suggestions. I rebased the patch on the latest trunk and updated it accordingly. As this patch will not work for architectures older than armv6t2, I also prefer Thomas's patch to fix for them. I am currently performing test for this patch. Assuming no issues pops up, OK for the trunk? And is it necessary to backport this patch to the 4.8 & 4.9 branches? This is OK for trunk if no regressions - please let it bake for a few days on trunk to let auto-testers catch up. This is OK for 4.8 / 4.9 after that and please test your backport. regards Ramana Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217717) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2014-11-19 Felix Yang + Shanyao Chen + + PR target/59593 + * config/arm/arm.md (define_attr "arch"): Add v6t2. + (define_attr "arch_enabled"): Add test for the above. + (*movhi_insn_arch4): Add new alternative. + 2014-11-18 Felix Yang * config/aarch64/aarch64.c (doloop_end): New pattern. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 217717) +++ gcc/config/arm/arm.md (working copy) @@ -125,9 +125,10 @@ ; This can be "a" for ARM, "t" for either of the Thumbs, "32" for ; TARGET_32BIT, "t1" or "t2" to specify a specific Thumb mode. "v6" ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without -; arm_arch6. This attribute is used to compute attribute "enabled", -; use type "any" to enable an alternative in all cases. -(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3" +; arm_arch6. "v6t2" for Thumb-2 with arm_arch6. This attribute is +; used to compute attribute "enabled", use type "any" to enable an +; alternative in all cases. +(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3" (const_string "any")) (define_attr "arch_enabled" "no,yes" @@ -162,6 +163,10 @@ (match_test "TARGET_32BIT && !arm_arch6")) (const_string "yes") +(and (eq_attr "arch" "v6t2") + (match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2")) +(const_string "yes") + (and (eq_attr "arch" "avoid_neon_for_64bits") (match_test "TARGET_NEON") (not (match_test "TARGET_PREFER_NEON_64BITS"))) @@ -6288,8 +6293,8 @@ ;; Pattern to recognize insn generated default case above (define_insn "*movhi_insn_arch4" - [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,m,r") - (match_operand:HI 1 "general_operand" "rIk,K,r,mi"))] + [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r") + (match_operand:HI 1 "general_operand" "rIk,K,n,r,mi"))] "TARGET_ARM && arm_arch4 && (register_operand (operands[0], HImode) @@ -6297,16 +6302,19 @@ "@ mov%?\\t%0, %1\\t%@ movhi mvn%?\\t%0, #%B1\\t%@ movhi + movw%?\\t%0, %1\\t%@ movhi str%(h%)\\t%1, %0\\t%@ movhi ldr%(h%)\\t%0, %1\\t%@ movhi" [(set_attr "predicable" "yes") - (set_attr "pool_range" "*,*,*,256") - (set_attr "neg_pool_range" "*,*,*,244") + (set_attr "pool_range" "*,*,*,*,256") + (set_attr "neg_pool_range" "*,*,*,*,244") + (set_attr "arch" "*,*,v6t2,*,*") (set_attr_alternative "type" [(if_then_else (match_operand 1 "const_int_operand" "") (const_string "mov_imm" ) (const_string "mov_reg")) (const_string "mvn_imm") + (const_string "mov_imm")
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On Tue, Nov 18, 2014 at 10:04 PM, David Malcolm wrote: > On Tue, 2014-11-18 at 10:53 +0100, Richard Biener wrote: >> On Tue, Nov 18, 2014 at 2:59 AM, David Malcolm wrote: >> > On Mon, 2014-11-17 at 11:06 +0100, Richard Biener wrote: >> >> On Sat, Nov 15, 2014 at 12:00 PM, David Malcolm >> >> wrote: >> >> > On Thu, 2014-11-13 at 11:45 +0100, Richard Biener wrote: >> >> >> On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm >> >> >> wrote: >> >> >> > On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: >> >> >> >> On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek >> >> >> >> wrote: >> >> >> >> > On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: >> >> >> >> >> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: >> >> >> >> >> > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: >> >> >> >> >> > > To be constructive here - the above case is from within a >> >> >> >> >> > > GIMPLE_ASSIGN case label >> >> >> >> >> > > and thus I'd have expected >> >> >> >> >> > > >> >> >> >> >> > > case GIMPLE_ASSIGN: >> >> >> >> >> > > { >> >> >> >> >> > > gassign *a1 = as_a (s1); >> >> >> >> >> > > gassign *a2 = as_a (s2); >> >> >> >> >> > > lhs1 = gimple_assign_lhs (a1); >> >> >> >> >> > > lhs2 = gimple_assign_lhs (a2); >> >> >> >> >> > > if (TREE_CODE (lhs1) != SSA_NAME >> >> >> >> >> > > && TREE_CODE (lhs2) != SSA_NAME) >> >> >> >> >> > > return (operand_equal_p (lhs1, lhs2, 0) >> >> >> >> >> > > && gimple_operand_equal_value_p >> >> >> >> >> > > (gimple_assign_rhs1 (a1), >> >> >> >> >> > > >> >> >> >> >> > > gimple_assign_rhs1 (a2))); >> >> >> >> >> > > else if (TREE_CODE (lhs1) == SSA_NAME >> >> >> >> >> > >&& TREE_CODE (lhs2) == SSA_NAME) >> >> >> >> >> > > return vn_valueize (lhs1) == vn_valueize (lhs2); >> >> >> >> >> > > return false; >> >> >> >> >> > > } >> >> >> >> >> > > >> >> >> >> >> > > instead. That's the kind of changes I have expected and >> >> >> >> >> > > have approved of. >> >> >> >> >> > >> >> >> >> >> > But even that looks like just adding extra work for all >> >> >> >> >> > developers, with no >> >> >> >> >> > gain. You only have to add extra code and extra temporaries, >> >> >> >> >> > in switches >> >> >> >> >> > typically also have to add {} because of the temporaries and >> >> >> >> >> > thus extra >> >> >> >> >> > indentation level, and it doesn't simplify anything in the >> >> >> >> >> > code. >> >> >> >> >> >> >> >> >> >> The branch attempts to use the C++ typesystem to capture >> >> >> >> >> information >> >> >> >> >> about the kinds of gimple statement we expect, both: >> >> >> >> >> (A) so that the compiler can detect type errors, and >> >> >> >> >> (B) as a comprehension aid to the human reader of the code >> >> >> >> >> >> >> >> >> >> The ideal here is when function params and struct field can be >> >> >> >> >> strengthened from "gimple" to a subclass ptr. This captures the >> >> >> >> >> knowledge that every use of a function or within a struct has a >> >> >> >> >> given >> >> >> >> >> gimple code. >> >> >> >> > >> >> >> >> > I just don't like all the as_a/is_a stuff enforced everywhere, >> >> >> >> > it means more typing, more temporaries, more indentation. >> >> >> >> > So, as I view it, instead of the checks being done cheaply (yes, >> >> >> >> > I think >> >> >> >> > the gimple checking as we have right now is very cheap) under the >> >> >> >> > hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those >> >> >> >> > changes >> >> >> >> > put the burden on the developers, who has to check that manually >> >> >> >> > through >> >> >> >> > the as_a/is_a stuff everywhere, more typing and uglier syntax. >> >> >> >> > I just don't see that as a step forward, instead a huge step >> >> >> >> > backwards. >> >> >> >> > But perhaps I'm alone with this. >> >> >> >> > Can you e.g. compare the size of - lines in your patchset >> >> >> >> > combined, and >> >> >> >> > size of + lines in your patchset? As in, if your changes lead to >> >> >> >> > less >> >> >> >> > typing or more. >> >> >> >> >> >> >> >> I see two ways out here. One is to add overloads to all the >> >> >> >> functions >> >> >> >> taking the special types like >> >> >> >> >> >> >> >> tree >> >> >> >> gimple_assign_rhs1 (gimple *); >> >> >> >> >> >> >> >> or simply add >> >> >> >> >> >> >> >> gassign *operator ()(gimple *g) { return as_a (g); } >> >> >> >> >> >> >> >> into a gimple-compat.h header which you include in places that >> >> >> >> are not converted "nicely". >> >> >> > >> >> >> > Thanks for the suggestions. >> >> >> > >> >> >> > Am I missing something, or is the gimple-compat.h idea above not >> >> >> > valid C >> >> >> > ++? >> >> >> > >> >> >> > Note that "gimple" is still a typedef to >> >> >> > gimple_statement_base * >> >> >> > (as noted before, the gimple -> gimple * change would break e
Re: [PATCH] Cancel bswap opt when intermediate stmts are reused
On Tue, Nov 18, 2014 at 5:33 PM, Thomas Preud'homme wrote: >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: Monday, November 17, 2014 12:47 PM >> >> Hmm. I am a little bit concerned about the malloc traffic generated here. >> So why not use a vec, get rid of the ->next pointer and >> use a hash_map to associate the stmt with >> an index into the vector? > > Sure, it even makes things easier. However I don't understand why a vector is > better than malloc. Is it a matter of fragmentation? That and with a vector you get log(N) allocations instead of N. >> >> At this point I'd rather leave DCE to DCE ... > > I thought since all information is there why not do it. It makes it easier to > read the dump of the pass. Fair enough ... >> >> Ick - do we really have to use gotos here? Can you refactor this >> a bit to avoid it? > > Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and > found a way to avoid it. Thanks. >> >> The whole scheme wouldn't exactly fly with the idea of eventually >> using a lattice to propagate the symbolic numbers, but well... > >> >> I think the overall idea is sound and if you have time to adjust according >> to my comments that would be nice. > > To be honest I think it should wait for next stage1. After several ping I took > another look at the testcase with the idea of measuring the size reduction > the patch could give and was surprised that in all cases I could construct the > size was actually bigger. Performance might be improved nonetheless but > I think this needs more careful consideration. > > And as you said the approach would need to be changed if bswap was > rewritten to do a forward analysis. At last, nobody reported a code size or > performance regression so far due to the changes so that might be a non > issue. If such a report happens, then it will be a good time to revisit that > decision. > > Do you agree? Yes. If you can't come up with a testcase that it improves but it only pessimizes cases then IMHO this would be a very premature optimization. I suspect that we'd need to have a more complex cost analysis factoring in that we _do_ remove a lot of code even if parts need to be kept alive. So maybe only discard the transform if _all_ stmts need to be kept live (as when we even remove one of the partial loads then this should alread offset the single load we add ... for the bswap this may be slightly different). > >> >> Sorry for the very late review. > > That's alright, I know how it is. Thank you for keeping track of it. I > actually > feel sorry I didn't warn about my findings. I thought the patch fell through > the cracks and didn't want to spam gcc-patches uselessly. Sure. Thanks, Richard. > Best regards, > > Thomas > > >
[PATCH, libgomp]: Require vect_simd_clones effective target for examples-4/e.53.5.{c,f90}
Hello! 2014-11-19 Uros Bizjak * testsuite/libgomp.c/examples-4/e.53.5.c: Require vect_simd_clones effective target. * testsuite/libgomp.fortran/examples-4/e.53.5.f90: Ditto. Tested on x86_64-linux-gnu {,-m32} CentOS 5.11. OK for mainline? Uros. Index: testsuite/libgomp.fortran/examples-4/e.53.5.f90 === --- testsuite/libgomp.fortran/examples-4/e.53.5.f90 (revision 217718) +++ testsuite/libgomp.fortran/examples-4/e.53.5.f90 (working copy) @@ -1,4 +1,4 @@ -! { dg-do run } +! { dg-do run { target vect_simd_clones } } ! { dg-options "-O2" } ! { dg-additional-options "-msse2" { target sse2_runtime } } ! { dg-additional-options "-mavx" { target avx_runtime } } Index: testsuite/libgomp.c/examples-4/e.53.5.c === --- testsuite/libgomp.c/examples-4/e.53.5.c (revision 217718) +++ testsuite/libgomp.c/examples-4/e.53.5.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target vect_simd_clones } } */ /* { dg-options "-O2" } */ /* { dg-additional-options "-msse2" { target sse2_runtime } } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */
Re: [PATCH, libgomp]: Require vect_simd_clones effective target for examples-4/e.53.5.{c,f90}
On Wed, Nov 19, 2014 at 11:26:25AM +0100, Uros Bizjak wrote: > 2014-11-19 Uros Bizjak > > * testsuite/libgomp.c/examples-4/e.53.5.c: Require > vect_simd_clones effective target. > * testsuite/libgomp.fortran/examples-4/e.53.5.f90: Ditto. > > Tested on x86_64-linux-gnu {,-m32} CentOS 5.11. > > OK for mainline? Yes, thanks. > Index: testsuite/libgomp.fortran/examples-4/e.53.5.f90 > === > --- testsuite/libgomp.fortran/examples-4/e.53.5.f90 (revision 217718) > +++ testsuite/libgomp.fortran/examples-4/e.53.5.f90 (working copy) > @@ -1,4 +1,4 @@ > -! { dg-do run } > +! { dg-do run { target vect_simd_clones } } > ! { dg-options "-O2" } > ! { dg-additional-options "-msse2" { target sse2_runtime } } > ! { dg-additional-options "-mavx" { target avx_runtime } } > Index: testsuite/libgomp.c/examples-4/e.53.5.c > === > --- testsuite/libgomp.c/examples-4/e.53.5.c (revision 217718) > +++ testsuite/libgomp.c/examples-4/e.53.5.c (working copy) > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-do run { target vect_simd_clones } } */ > /* { dg-options "-O2" } */ > /* { dg-additional-options "-msse2" { target sse2_runtime } } */ > /* { dg-additional-options "-mavx" { target avx_runtime } } */ Jakub
[PATCH 01/21] PR jit/63854: Fix memory leak within gcc_options
Valgrind shows this fixes ~1 KB of leak per iteration (on x86_64) by plugging these leaks allocated at opts.c lines 286 and 289: ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 875 of 917 ==57820==at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==57820==by 0x59A6747: xmalloc (xmalloc.c:147) ==57820==by 0x595542A: init_options_struct(gcc_options*, gcc_options*) (opts.c:286) ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081) ==57820==by 0x4E43186: gcc::jit::playback::context::compile() (jit-playback.c:1615) ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() (jit-recording.c:861) ==57820==by 0x401CA4: test_jit (harness.h:190) ==57820==by 0x401D88: main (harness.h:232) ==57820== ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 876 of 917 ==57820==at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==57820==by 0x59A6780: xcalloc (xmalloc.c:162) ==57820==by 0x595543E: init_options_struct(gcc_options*, gcc_options*) (opts.c:289) ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081) ==57820==by 0x4E43186: gcc::jit::playback::context::compile() (jit-playback.c:1615) ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() (jit-recording.c:861) ==57820==by 0x401CA4: test_jit (harness.h:190) ==57820==by 0x401D88: main (harness.h:232) gcc/ChangeLog: PR jit/63854 * opts.c (finalize_options_struct): New. * opts.h (finalize_options_struct): New. * toplev.c (toplev::finalize): Call finalize_options_struct on global_options and global_options_set. --- gcc/opts.c | 8 gcc/opts.h | 1 + gcc/toplev.c | 3 +++ 3 files changed, 12 insertions(+) diff --git a/gcc/opts.c b/gcc/opts.c index d22882b..dabd3c6 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -307,6 +307,14 @@ init_options_struct (struct gcc_options *opts, struct gcc_options *opts_set) targetm_common.option_init_struct (opts); } +/* Release any allocations owned by OPTS. */ + +void +finalize_options_struct (struct gcc_options *opts) +{ + XDELETEVEC (opts->x_param_values); +} + /* If indicated by the optimization level LEVEL (-Os if SIZE is set, -Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language diff --git a/gcc/opts.h b/gcc/opts.h index f694082..c3ec942 100644 --- a/gcc/opts.h +++ b/gcc/opts.h @@ -325,6 +325,7 @@ extern void decode_cmdline_options_to_array (unsigned int argc, extern void init_options_once (void); extern void init_options_struct (struct gcc_options *opts, struct gcc_options *opts_set); +extern void finalize_options_struct (struct gcc_options *opts); extern void decode_cmdline_options_to_array_default_mask (unsigned int argc, const char **argv, struct cl_decoded_option **decoded_options, diff --git a/gcc/toplev.c b/gcc/toplev.c index 2e48047..4b4e568 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -2169,4 +2169,7 @@ toplev::finalize (void) ipa_cp_c_finalize (); ipa_reference_c_finalize (); params_c_finalize (); + + finalize_options_struct (&global_options); + finalize_options_struct (&global_options_set); } -- 1.8.5.3
[PATCH 02/21] PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack
Valgrind shows this fixes ~4 KB of leak per iteration (on x86_64) by plugging this leak allocated at reginfo.c:1327: gcc_obstack_init (&valid_mode_changes_obstack); ==57820== 16,256 bytes in 4 blocks are definitely lost in loss record 906 of 917 ==57820==at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==57820==by 0x59A6747: xmalloc (xmalloc.c:147) ==57820==by 0x30958842DB: _obstack_begin (obstack.c:184) ==57820==by 0x51C1EC1: init_subregs_of_mode() (reginfo.c:1327) ==57820==by 0x50D2A38: init_costs() (ira-costs.c:2181) ==57820==by 0x50D74A8: ira_costs() (ira-costs.c:2211) ==57820==by 0x50D1326: ira_build() (ira-build.c:3459) ==57820==by 0x50C909C: (anonymous namespace)::pass_ira::execute(function*) (ira.c:5227) ==57820==by 0x51884F7: execute_one_pass(opt_pass*) (passes.c:2269) ==57820==by 0x5188B75: execute_pass_list_1(opt_pass*) (passes.c:2321) ==57820==by 0x5188B87: execute_pass_list_1(opt_pass*) (passes.c:2322) ==57820==by 0x5188BC8: execute_pass_list(function*, opt_pass*) (passes.c:2332) gcc/ChangeLog: PR jit/63854 * reginfo.c (finish_subregs_of_mode): Replace obstack_finish with obstack_free when cleaning up valid_mode_changes_obstack. --- gcc/reginfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/reginfo.c b/gcc/reginfo.c index efe23cd..c2daf22 100644 --- a/gcc/reginfo.c +++ b/gcc/reginfo.c @@ -1343,7 +1343,7 @@ void finish_subregs_of_mode (void) { XDELETEVEC (valid_mode_changes); - obstack_finish (&valid_mode_changes_obstack); + obstack_free (&valid_mode_changes_obstack, NULL); } /* Free all data attached to the structure. This isn't a destructor because -- 1.8.5.3
[PATCH 06/21] PR jit/63854: Fix leak of opts_obstack
This was leaking about 4KB per iteration: 16,256 bytes in 4 blocks are definitely lost in loss record 233 of 239 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75C17: xmalloc (xmalloc.c:147) by 0x30958842DB: _obstack_begin (obstack.c:184) by 0x5D1D317: init_options_struct(gcc_options*, gcc_options*) (opts.c:279) by 0x532DB0C: toplev::main(int, char**) (toplev.c:2081) by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615) by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) by 0x401CA4: test_jit (harness.h:190) by 0x401D88: main (harness.h:232) gcc/ChangeLog: PR jit/63854 * toplev.c (toplev::finalize): Free opts_obstack. --- gcc/toplev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/toplev.c b/gcc/toplev.c index 291b84d..0d617c2 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -2178,4 +2178,6 @@ toplev::finalize (void) /* Clean up the context (and pass_manager etc). */ delete g; g = NULL; + + obstack_free (&opts_obstack, NULL); } -- 1.8.5.3
[PATCH 03/21] PR jit/63854: Fix memory leaks within context/pass_manager/dump_manager
We weren't cleaning up the context, pass_manager or dump_manager. An earlier version of the context and pass_manager classes had them allocated in the GC-heap, but they were moved to the system heap before that patch was committed; they were never cleaned up, so e.g. all passes leak on each in-process iteration. Cleaning all of this up fixes the largest of the leaks in the PR, about 60KB per iteration: ==57820== 80,560 (88 direct, 80,472 indirect) bytes in 1 blocks are definitely lost in loss record 913 of 917 ==57820==at 0x4A06965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==57820==by 0x536A462: make_pass_dce(gcc::context*) (tree-ssa-dce.c:1703) ==57820==by 0x518B8DA: gcc::pass_manager::pass_manager(gcc::context*) (pass-instances.def:173) ==57820==by 0x4E8D6D9: gcc::context::context() (context.c:37) ==57820==by 0x4E2ED19: toplev::main(int, char**) (toplev.c:1211) ==57820==by 0x4E43186: gcc::jit::playback::context::compile() (jit-playback.c:1615) ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() (jit-recording.c:861) ==57820==by 0x401CA4: test_jit (harness.h:190) ==57820==by 0x401D88: main (harness.h:232) ==57820== ==57820== 161,488 (352 direct, 161,136 indirect) bytes in 4 blocks are definitely lost in loss record 917 of 917 ==57820==at 0x4A06965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==57820==by 0x5553102: make_pass_insert_vzeroupper(gcc::context*) (i386.c:2581) ==57820==by 0x55571DF: ix86_option_override() (i386.c:4325) ==57820==by 0x4E2EEF4: toplev::main(int, char**) (toplev.c:1295) ==57820==by 0x4E43186: gcc::jit::playback::context::compile() (jit-playback.c:1615) ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() (jit-recording.c:861) ==57820==by 0x401CA4: test_jit (harness.h:190) ==57820==by 0x401D88: main (harness.h:232) All of this additional cleanup happens within toplev::finalize, and hence we don't call it from cc1, cc1plus, etc, only from libgccjit.so. Note that some calls to register_pass_info were using a static: static struct register_pass_info on the stack e.g.: opt_pass *pass_handle_trap_shadows = make_pass_handle_trap_shadows (g); static struct register_pass_info handle_trap_shadows_info = { pass_handle_trap_shadows, "eh_ranges", 1, PASS_POS_INSERT_AFTER }; This led to crashes on the 2nd iteration: the on-stack struct was only being built on the first iteration, using the result of the first call to the "make_pass_" function. Subsequent calls would thus be registering a freed pass, giving a use-after-free error (which was previously hidden because we weren't freeing them). The fix is to remove the "static" from such structs. This also fixes a leak of strings within dump_file_info for passes. gcc/ChangeLog: PR jit/63854 * config/alpha/alpha.c (alpha_option_override): Remove static from "handle_trap_shadows_info" and "align_insns_info". * config/i386/i386.c (ix86_option_override): Likewise for "insert_vzeroupper_info". * config/rl78/rl78.c (rl78_asm_file_start): Likewise for "rl78_devirt_info" and "rl78_move_elim_info". * config/rs6000/rs6000.c (rs6000_option_override): Likewise for "analyze_swaps_info". * context.c (gcc::context::~context): New. * context.h (gcc::context::~context): New. * dumpfile.c (dump_files): Add "false" initializers for new field "owns_strings". (gcc::dump_manager::~dump_manager): New. (gcc::dump_manager::dump_register): Add param "take_ownership". * dumpfile.h (struct dump_file_info): Add field "owns_strings". (gcc::dump_manager::~dump_manager): New. (gcc::dump_manager::dump_register): Add param "take_ownership". * pass_manager.h (gcc::pass_manager::operator delete): New. (gcc::pass_manager::~pass_manager): New. * passes.c (pass_manager::register_one_dump_file): Pass "true" to new "owns_strings" argument to dump_register. (pass_manager::operator delete): New. (delete_pass_tree): New function. (pass_manager::~pass_manager): New. * statistics.c (statistics_early_init): Pass "false" to new "owns_strings" argument to dump_register. * toplev.c (toplev::finalize): Clean up the context and thus the things it owns. --- gcc/config/alpha/alpha.c | 4 ++-- gcc/config/i386/i386.c | 2 +- gcc/config/rl78/rl78.c | 4 ++-- gcc/config/rs6000/rs6000.c | 2 +- gcc/context.c | 6 ++ gcc/context.h | 1 + gcc/dumpfile.c | 47 ++ gcc/dumpfile.h | 11 ++- gcc/pass_manager.h | 2 ++ gcc/passes.c | 39 +- gcc/statistics.c | 3 ++- gcc/toplev.c
[PATCH 00/21] PR 63854: Fix various memory leaks
Running the jit testsuite under valgrind showed various memory leaks. Some are per-invocation of the compiler, and hence only affect libgccjit.so (although presumably it's useful to cc1 etc to get valgrind clean). Others appear to be per-function and hence affect the non-JIT use cases. The following patch kit fixes most of the leaks. It also adds a contrib/valgrind.supp suppressions file, for sparseset (though I see now that gcc/configure.ac can detect if valgrind/memcheck.h is available and use it if --enable-valgrind-annotations Should I instead require that option when doing such testing?). Successfully bootstrapped®rtested the cumulative effect of the patches on x86_64-unknown-linux-gnu (Fedora 20). Are the non-JIT parts OK for trunk? (Presumably I can give myself approval for the JIT parts) David Malcolm (21): PR jit/63854: Fix memory leak within gcc_options PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack PR jit/63854: Fix memory leaks within context/pass_manager/dump_manager PR jit/63854: Fix memory leak within bb-reorder.c PR jit/63854: Fix memory leak of save_decoded_options PR jit/63854: Fix leak of opts_obstack PR jit/63854: Fix leak of optimization_summary_obstack PR jit/63854: Add ira_costs_c_finalize PR jit/63854: Don't leak producer_string in dwarf2out.c PR jit/63854: Fix leak of worklist within jit-recording.c PR jit/63854: Fix leak of "avail" within tree-ssa-pre.c PR jit/63854: Add a valgrind suppresion file PR jit/63854: Add support for running "make check-jit" under valgrind PR jit/63854: Fix leak of paths within jump threading PR jit/63854: lra.c: Fix leak of point_freq_vec's buffer when calling lra_inheritance PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS PR jit/63854: Fix leaking vec in jit PR jit/63854: Add "long-term" allocator to gcc::context PR jit/63854: Fix leak of ipa hooks PR jit/63854: Fix leak in ipa-icf.c PR jit/63854: Fix leaks in test-fuzzer.c contrib/valgrind.supp | 11 ++ gcc/bb-reorder.c | 2 +- gcc/config/alpha/alpha.c | 4 +- gcc/config/i386/i386.c | 2 +- gcc/config/rl78/rl78.c | 4 +- gcc/config/rs6000/rs6000.c | 2 +- gcc/context.c | 24 gcc/context.h | 47 +++ gcc/dumpfile.c | 47 +-- gcc/dumpfile.h | 11 +- gcc/dwarf2out.c| 2 + gcc/ipa-icf.c | 1 + gcc/ipa-prop.c | 3 +- gcc/ipa-reference.c| 17 +++- gcc/ira-costs.c| 6 +++ gcc/ira.h | 3 ++ gcc/jit/jit-playback.c | 73 +-- gcc/jit/jit-playback.h | 27 - gcc/jit/jit-recording.c| 19 + gcc/jit/jit-recording.h| 26 ++--- gcc/lra.c | 1 + gcc/opts.c | 8 gcc/opts.h | 1 + gcc/pass_manager.h | 3 ++ gcc/passes.c | 39 ++- gcc/reginfo.c | 2 +- gcc/statistics.c | 3 +- gcc/testsuite/jit.dg/jit.exp | 79 +- gcc/testsuite/jit.dg/test-fuzzer.c | 6 +++ gcc/toplev.c | 20 -- gcc/tree-ssa-pre.c | 2 +- gcc/tree-ssa-threadedge.c | 1 + gcc/tree-ssa-threadupdate.c| 1 + gcc/tree.c | 17 +--- 34 files changed, 434 insertions(+), 80 deletions(-) create mode 100644 contrib/valgrind.supp -- 1.8.5.3
[PATCH 10/21] PR jit/63854: Fix leak of worklist within jit-recording.c
Fix this leak: 160 bytes in 5 blocks are definitely lost in loss record 154 of 228 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75D4F: xrealloc (xmalloc.c:177) by 0x4DE1710: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310) by 0x4DDFAB5: vec::reserve(unsigned int, bool) (vec.h:1428) by 0x4DDFBFC: vec::reserve_exact(unsigned int) (vec.h:1448) by 0x4DDE588: vec::create(unsigned int) (vec.h:1463) by 0x4DD9B9F: gcc::jit::recording::function::validate() (jit-recording.c:2191) by 0x4DD7AD3: gcc::jit::recording::context::validate() (jit-recording.c:1005) by 0x4DD7660: gcc::jit::recording::context::compile() (jit-recording.c:848) by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) by 0x401CA4: test_jit (harness.h:190) by 0x401D88: main (harness.h:232) gcc/jit/ChangeLog: PR jit/63854 * jit-recording.c (recording::function::validate): Convert "worklist" from vec<> to autovec<> to fix a leak. --- gcc/jit/jit-recording.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c index 8daa8f2..8cce277 100644 --- a/gcc/jit/jit-recording.c +++ b/gcc/jit/jit-recording.c @@ -2187,8 +2187,7 @@ recording::function::validate () { /* Iteratively walk the graph of blocks, marking their "m_is_reachable" flag, starting at the initial block. */ - vec worklist; - worklist.create (m_blocks.length ()); + auto_vec worklist (m_blocks.length ()); worklist.safe_push (m_blocks[0]); while (worklist.length () > 0) { -- 1.8.5.3
[PATCH 08/21] PR jit/63854: Add ira_costs_c_finalize
this_target_ira_int->free_ira_costs () is called by ira_init_costs, but this isn't called after the compile, causing noise when running under valgrind. This patch adds a ira_costs_c_finalize function to clean this up, and calls it from toplev::finalize (and hence this isn't called by cc1/cc1plus/etc, just by libgccjit.so). Doing so eliminates about 27KB of "leak" from the valgrind report. gcc/ChangeLog: PR jit/63854 * ira-costs.c (ira_costs_c_finalize): New function. * ira.h (ira_costs_c_finalize): New prototype. * toplev.c (toplev::finalize): Call ira_costs_c_finalize. --- gcc/ira-costs.c | 6 ++ gcc/ira.h | 3 +++ gcc/toplev.c| 1 + 3 files changed, 10 insertions(+) diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c index 122815b..2dabead 100644 --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -2356,3 +2356,9 @@ ira_adjust_equiv_reg_cost (unsigned regno, int cost) else regno_equiv_gains[regno] += cost; } + +void +ira_costs_c_finalize (void) +{ + this_target_ira_int->free_ira_costs (); +} diff --git a/gcc/ira.h b/gcc/ira.h index b294ea1..d62656c 100644 --- a/gcc/ira.h +++ b/gcc/ira.h @@ -199,4 +199,7 @@ extern bool ira_bad_reload_regno (int, rtx, rtx); extern void ira_adjust_equiv_reg_cost (unsigned, int); +/* ira-costs.c */ +extern void ira_costs_c_finalize (void); + #endif /* GCC_IRA_H */ diff --git a/gcc/toplev.c b/gcc/toplev.c index 0d617c2..0181a3f 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -2168,6 +2168,7 @@ toplev::finalize (void) gcse_c_finalize (); ipa_cp_c_finalize (); ipa_reference_c_finalize (); + ira_costs_c_finalize (); params_c_finalize (); finalize_options_struct (&global_options); -- 1.8.5.3
[PATCH 07/21] PR jit/63854: Fix leak of optimization_summary_obstack
This was leaking ~4KB per iteration: 16,256 bytes in 4 blocks are definitely lost in loss record 234 of 239 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75C17: xmalloc (xmalloc.c:147) by 0x30958842DB: _obstack_begin (obstack.c:184) by 0x4DFECDE: bitmap_obstack_initialize(bitmap_obstack*) (bitmap.c:338) by 0x5CA3C1D: ipa_init() (ipa-reference.c:431) by 0x5CA3FB1: generate_summary() (ipa-reference.c:551) by 0x5CA4650: propagate() (ipa-reference.c:684) by 0x5CA5BEE: (anonymous namespace)::pass_ipa_reference::execute(function*) (ipa-reference.c:1178) by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306) by 0x5224427: execute_ipa_pass_list(opt_pass*) (passes.c:2700) by 0x4E495C1: ipa_passes() (cgraphunit.c:2088) by 0x4E498E0: symbol_table::compile() (cgraphunit.c:2172) gcc/ChangeLog: PR jit/63854 * ipa-reference.c (ipa_reference_c_finalize): Release optimization_summary_obstack. --- gcc/ipa-reference.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c index b421f63..1ce06d1 100644 --- a/gcc/ipa-reference.c +++ b/gcc/ipa-reference.c @@ -1193,5 +1193,9 @@ make_pass_ipa_reference (gcc::context *ctxt) void ipa_reference_c_finalize (void) { - ipa_init_p = false; + if (ipa_init_p) +{ + bitmap_obstack_release (&optimization_summary_obstack); + ipa_init_p = false; +} } -- 1.8.5.3
[PATCH 11/21] PR jit/63854: Fix leak of "avail" within tree-ssa-pre.c
Fix this leak: 120 bytes in 5 blocks are definitely lost in loss record 141 of 227 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75D8F: xrealloc (xmalloc.c:177) by 0x550DEBA: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310) by 0x550D509: vec::reserve(unsigned int, bool) (vec.h:1428) by 0x550DA3E: vec::reserve_exact(unsigned int) (vec.h:1448) by 0x550D167: vec::safe_grow(unsigned int) (vec.h:1576) by 0x55076FE: do_regular_insertion(basic_block_def*, basic_block_def*) (tree-ssa-pre.c:3209) by 0x5508379: insert_aux(basic_block_def*) (tree-ssa-pre.c:3526) by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536) by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536) by 0x55084C0: insert() (tree-ssa-pre.c:3559) by 0x550C5BC: (anonymous namespace)::pass_pre::execute(function*) (tree-ssa-pre.c:4805) gcc/ChangeLog: PR jit/63854 * tree-ssa-pre.c (do_regular_insertion): Convert "avail" from vec<> to auto_vec<> to fix a leak. --- gcc/tree-ssa-pre.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index ea99198..027dc1c 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -3202,7 +3202,7 @@ do_regular_insertion (basic_block block, basic_block dom) bool new_stuff = false; vec exprs; pre_expr expr; - vec avail = vNULL; + auto_vec avail; int i; exprs = sorted_array_from_bitmap_set (ANTIC_IN (block)); -- 1.8.5.3
[PATCH 12/21] PR jit/63854: Add a valgrind suppresion file
Valgrind complains about uninitialized data within sparseset_bit_p. Provide a suppression file to silence these warnings. Valgrind requires suppression files for C++ code to use the mangled names, so we do that here. contrib/ChangeLog: PR jit/63854 * valgrind.supp: New. --- contrib/valgrind.supp | 11 +++ 1 file changed, 11 insertions(+) create mode 100644 contrib/valgrind.supp diff --git a/contrib/valgrind.supp b/contrib/valgrind.supp new file mode 100644 index 000..ec9ff02 --- /dev/null +++ b/contrib/valgrind.supp @@ -0,0 +1,11 @@ +{ + suppress-uninit-cond-with-sparseset_bit_p + Memcheck:Cond + fun:_ZL15sparseset_bit_pP13sparseset_defm +} + +{ + suppress-uninit-use-with-sparseset_bit_p + Memcheck:Value8 + fun:_ZL15sparseset_bit_pP13sparseset_defm +} -- 1.8.5.3
[PATCH 13/21] PR jit/63854: Add support for running "make check-jit" under valgrind
This commit updates jit.exp so that if RUN_UNDER_VALGRIND is present in the environment, all of the built client code using libgccjit.so is run under valgrind, with --leak-check=full. Hence: RUN_UNDER_VALGRIND= make check-jit will run all jit testcases under valgrind (taking 27 mins on my machine). Results are written to testsuite/jit/test-FOO.exe.valgrind.txt jit.exp automatically parses these result file, looking for lines of the form definitely lost: 11,316 bytes in 235 blocks indirectly lost: 352 bytes in 4 blocks in the valgrind log's summary footer, adding PASSes if they are zero bytes, and, for now generating XFAILs for non-zero bytes. Sadly this diverges jit.exp's fixed_host_execute further from DejaGnu's host_execute, but I don't see a clean way to fix that. This currently adds 63 PASSes and 49 XFAILs to jit.sum, giving: # of expected passes 2481 # of expected failures 49 gcc/testsuite/ChangeLog: PR jit/63854 * jit.dg/jit.exp (report_leak): New. (parse_valgrind_logfile): New. (fixed_host_execute): Detect if RUN_UNDER_VALGRIND is present in the environment, and if so, run the executable under valgrind, capturing valgrind's output to a logfile. Parse the log file, generating PASSes and XFAILs for the summary of leaks. --- gcc/testsuite/jit.dg/jit.exp | 79 +++- 1 file changed, 78 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp index 531e929..26ab127 100644 --- a/gcc/testsuite/jit.dg/jit.exp +++ b/gcc/testsuite/jit.dg/jit.exp @@ -23,6 +23,48 @@ load_lib target-libpath.exp load_lib gcc.exp load_lib dejagnu.exp +# Look for lines of the form: +# definitely lost: 11,316 bytes in 235 blocks +# indirectly lost: 352 bytes in 4 blocks +# Ideally these would report zero bytes lost (which is a PASS); +# for now, report non-zero leaks as XFAILs. +proc report_leak {kind name logfile line} { +set match [regexp "$kind lost: .*" $line result] +if $match { + verbose "Saw \"$result\" within \"$line\"" 3 + # Extract bytes and blocks. + # These can contain commas as well as numerals, + # but we only care about whether we have zero. + regexp "$kind lost: (.+) bytes in (.+) blocks" \ + $result -> bytes blocks + verbose "bytes: '$bytes'" 2 + verbose "blocks: '$blocks'" 2 + if { $bytes == 0 } { + pass "$name: $logfile: $result" + } else { + xfail "$name: $logfile: $result" + } +} +} + +proc parse_valgrind_logfile {name logfile} { +verbose "parse_valgrind_logfile: $logfile" 2 +if [catch {set f [open $logfile]}] { + fail "$name: unable to read $logfile" + return +} + +while { [gets $f line] >= 0 } { + # Strip off the PID prefix e.g. ==7675== + set line [regsub "==\[0-9\]*== " $line ""] + verbose $line 2 + + report_leak "definitely" $name $logfile $line + report_leak "indirectly" $name $logfile $line +} +close $f +} + # This is host_execute from dejagnu.exp commit # 126a089777158a7891ff975473939f08c0e31a1c # with the following patch applied, and renaming to "fixed_host_execute". @@ -49,8 +91,11 @@ load_lib dejagnu.exp # if there was a problem. # proc fixed_host_execute {args} { +global env global text global spawn_id +global srcdir +verbose "srcdir: $srcdir" 2 set timeoutmsg "Timed out: Never got started, " set timeout 100 @@ -75,7 +120,28 @@ proc fixed_host_execute {args} { # spawn the executable and look for the DejaGnu output messages from the # test case. # spawn -noecho -open [open "|./${executable}" "r"] -spawn -noecho "./${executable}" ${params} + +set run_under_valgrind [info exists env(RUN_UNDER_VALGRIND)] + +if $run_under_valgrind { + set valgrind_logfile "${executable}.valgrind.txt" + set valgrind_params {"valgrind"} + lappend valgrind_params "--leak-check=full" + # srcdir is within gcc/testsuite; locate "contrib" relative to it: + lappend valgrind_params "--suppressions=${srcdir}/../../contrib/valgrind.supp" + lappend valgrind_params "--log-file=${valgrind_logfile}" +} else { + set valgrind_params {} +} +verbose "valgrind_params: $valgrind_params" 2 + +set args ${valgrind_params} +lappend args "./${executable}" +set args [concat $args ${params}] +verbose "args: $args" 2 + +eval spawn -noecho $args + expect_after full_buffer { error "got full_buffer" } set prefix "\[^\r\n\]*" @@ -147,6 +213,17 @@ proc fixed_host_execute {args} { # force a close of the executable to be safe. catch close +# valgrind might not have finished writing the log out +# before we parse it; need to wait for spawnee to +# finish. +catch wait reason +verbose "reason: $reason" 2 + +if $run_under_valgrind { +
[PATCH 15/21] PR jit/63854: lra.c: Fix leak of point_freq_vec's buffer when calling lra_inheritance
Valgrind showed a leak of 1640 bytes per iteration of one of the jit testcases (I think this is per-function in a compile): 8,200 bytes in 5 blocks are definitely lost in loss record 214 of 223 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75D1F: xrealloc (xmalloc.c:177) by 0x4E0C94E: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310) by 0x4E0C7A3: vec::reserve(unsigned int, bool) (vec.h:1428) by 0x4E0C55A: vec::reserve_exact(unsigned int) (vec.h:1448) by 0x4E0C2C6: vec::create(unsigned int) (vec.h:1463) by 0x519B316: lra_create_live_ranges(bool) (lra-lives.c:1248) by 0x5179BD7: lra(_IO_FILE*) (lra.c:2297) by 0x5126C4E: do_reload() (ira.c:5380) by 0x5127098: (anonymous namespace)::pass_reload::execute(function*) (ira.c:5550) by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306) by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358) which is this line: point_freq_vec.create (get_max_uid () * 2); point_freq_vec's buffer is released in lra_clear_live_ranges. Am unfamiliar with LRA, but my reading of the lra code is that the "live_p" boolean tracks whether or not we need to call lra_clear_live_ranges. Debugging calls to the above line and calls to lra_clear_live_ranges showed a mismatch; the call to lra_create_live_ranges before lra_inheritance wasn't setting live_p. Setting it after that callsite fixes the leak (though perhaps the code could be cleaned up by having live_p be set/cleared by the allocation/clear functions themselves, rather than by their callers? Vladimir?) gcc/ChangeLog: PR jit/63854 * lra.c (lra): After creating live ranges in preparation for call to lra_inheritance, set live_p to true. --- gcc/lra.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/lra.c b/gcc/lra.c index 9309d5e..ec122c7 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -2297,6 +2297,7 @@ lra (FILE *f) actual_call_used_reg_set, which is needed during lra_inheritance. */ lra_create_live_ranges (true, true); + live_p = true; } lra_inheritance (); } -- 1.8.5.3
[PATCH 04/21] PR jit/63854: Fix memory leak within bb-reorder.c
Valgrind showed this leaking 200 bytes per iteration in one of my testcases: 1,000 bytes in 5 blocks are definitely lost in loss record 200 of 241 at 0x4A083AA: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75C5C: xrealloc (xmalloc.c:179) by 0x4E10734: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310 ) by 0x4E105BB: vec::reserve(unsigned int, bool) (vec.h:1428) by 0x4E15B37: vec::safe_push(basic_block_def* const&) (vec.h:1537) by 0x5B61F44: find_rarely_executed_basic_blocks_and_crossing_edges() (bb-reorder.c:1614) by 0x5B63E90: (anonymous namespace)::pass_partition_blocks::execute(function*) (bb-reorder.c:2711) by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306) by 0x52237C4: execute_pass_list_1(opt_pass*) (passes.c:2358) by 0x52237F5: execute_pass_list_1(opt_pass*) (passes.c:2359) by 0x5223832: execute_pass_list(function*, opt_pass*) (passes.c:2369) by 0x4E4884F: cgraph_node::expand() (cgraphunit.c:1773) Fix is trivial. gcc/ChangeLog: PR jit/63854 * bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges): Convert local bbs_in_hot_partition from vec<> to auto_vec<>. --- gcc/bb-reorder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index 1f7c3ee..4ca3bb2 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -1582,7 +1582,7 @@ find_rarely_executed_basic_blocks_and_crossing_edges (void) edge e; edge_iterator ei; unsigned int cold_bb_count = 0; - vec bbs_in_hot_partition = vNULL; + auto_vec bbs_in_hot_partition; /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB_FN (bb, cfun) -- 1.8.5.3
[PATCH 17/21] PR jit/63854: Fix leaking vec in jit
This fixes various leaks of vec buffers seen via valgrind within jit (both recording and playback). Various vec<> within jit::recording are converted to auto_vec<>. Various playback::wrapper subclasses containing vec<> gain a finalizer so they can release the vec when they are collected. gcc/jit/ChangeLog: PR jit/63854 * jit-playback.c (gcc::jit::playback::compound_type::set_fields): Convert param from const vec & to const auto_vec *. (gcc::jit::playback::context::new_function_type): Convert param "param_types" from vec * to const auto_vec *. (gcc::jit::playback::context::new_function): Convert param "params" from vec * to const auto_vec *. (gcc::jit::playback::context::build_call): Convert param "args" from vec to const auto_vec *. (gcc::jit::playback::context::new_call): Likewise. (gcc::jit::playback::context::new_call_through_ptr): Likewise. (wrapper_finalizer): New function. (gcc::jit::playback::wrapper::operator new): Call the finalizer variant of ggc_internal_cleared_alloc, supplying wrapper_finalizer. (gcc::jit::playback::function::finalizer): New. (gcc::jit::playback::block::finalizer): New. (gcc::jit::playback::source_file::finalizer): New. (gcc::jit::playback::source_line::finalizer): New. * jit-playback.h (gcc::jit::playback::context::new_function_type): Convert param "param_types" from vec * to const auto_vec *. (gcc::jit::playback::context::new_function): Convert param "params" from vec * to const auto_vec *. (gcc::jit::playback::context::new_call): Convert param "args" from vec to const auto_vec *. (gcc::jit::playback::context::new_call_through_ptr): Likewise. (gcc::jit::playback::context::build_call): Likewise. (gcc::jit::playback::context): Convert fields "m_functions", "m_source_files", "m_cached_locations" from vec to auto_vec. (gcc::jit::playback::wrapper::finalizer): New virtual function. (gcc::jit::playback::compound_type::set_fields): Convert param fro const vec & to const auto_vec *. (gcc::jit::playback::function::finalizer): New. (gcc::jit::playback::block::finalizer): New. (gcc::jit::playback::source_file::finalizer): New. (gcc::jit::playback::source_line::finalizer): New. * jit-recording.c (gcc::jit::recording::function_type::replay_into): Convert local from a vec into an auto_vec. (gcc::jit::recording::fields::replay_into): Likewise. (gcc::jit::recording::function::replay_into): Likewise. (gcc::jit::recording::call::replay_into): Likewise. (gcc::jit::recording::call_through_ptr::replay_into): Likewise. * jit-recording.h (gcc::jit::recording::context): Convert fields "m_mementos", "m_compound_types", "m_functions" from vec<> to auto_vec <>. (gcc::jit::recording::function_type::get_param_types): Convert return type from vec to const vec &. (gcc::jit::recording::function_type): Convert field "m_param_types" from a vec<> to an auto_vec<>. (gcc::jit::recording::fields): Likewise for field "m_fields". (gcc::jit::recording::function::get_params): Convert return type from vec to const vec &. (gcc::jit::recording::function): Convert fields "m_params", "m_locals", "m_blocks" from vec<> to auto_vec<>. (gcc::jit::recording::block): Likewise for field "m_statements". vec<> to auto_vec<>. (gcc::jit::recording::call): Likewise for field "m_args". (gcc::jit::recording::call_through_ptr): Likewise. --- gcc/jit/jit-playback.c | 73 + gcc/jit/jit-playback.h | 27 -- gcc/jit/jit-recording.c | 16 +-- gcc/jit/jit-recording.h | 26 +- 4 files changed, 100 insertions(+), 42 deletions(-) diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c index 285a3ef..8fdfa29 100644 --- a/gcc/jit/jit-playback.c +++ b/gcc/jit/jit-playback.c @@ -285,15 +285,15 @@ new_compound_type (location *loc, } void -playback::compound_type::set_fields (const vec &fields) +playback::compound_type::set_fields (const auto_vec *fields) { /* Compare with c/c-decl.c: finish_struct. */ tree t = as_tree (); tree fieldlist = NULL; - for (unsigned i = 0; i < fields.length (); i++) + for (unsigned i = 0; i < fields->length (); i++) { - field *f = fields[i]; + field *f = (*fields)[i]; DECL_CONTEXT (f->as_tree ()) = t; fieldlist = chainon (f->as_tree (), fieldlist); } @@ -309,7 +309,7 @@ playback::compound_type::set_fields (const vec &fields) playback::type * playback::context:: new_function_type (type *return_type, - vec *param_types, + const auto_vec *param_type
[PATCH 16/21] PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS
Valgrind showed a per-iteration leak of pass_ipa_pta (and presumably pass_omp_simd_clone): 704 (352 direct, 352 indirect) bytes in 4 blocks are definitely lost in loss record 198 of 241 at 0x4A06965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x555A55D: make_pass_ipa_pta(gcc::context*) (tree-ssa-structalias.c:7425) by 0x5219FDA: gcc::pass_manager::pass_manager(gcc::context*) (pass-instances.def:141) by 0x4E4F5D1: gcc::context::context() (context.c:37) by 0x532C07B: general_init(char const*) (toplev.c:1212) by 0x532DB5E: toplev::main(int, char**) (toplev.c:2074) by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615) by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) by 0x401DC4: test_jit (harness.h:190) by 0x401EA8: main (harness.h:232) Investigation showed that ~pass_manager wasn't deleting these passes since for some reason GCC_PASS_LISTS didn't contain all_late_ipa_passes and so wasn't calling delete_pass_tree on it. Add it to GCC_PASS_LISTS, fixing the leak. gcc/ChangeLog: PR jit/63854 * pass_manager.h (GCC_PASS_LISTS): Add all_late_ipa_passes. --- gcc/pass_manager.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h index 82857a9..e6658ad 100644 --- a/gcc/pass_manager.h +++ b/gcc/pass_manager.h @@ -29,6 +29,7 @@ struct register_pass_info; DEF_PASS_LIST (all_lowering_passes) \ DEF_PASS_LIST (all_small_ipa_passes) \ DEF_PASS_LIST (all_regular_ipa_passes) \ + DEF_PASS_LIST (all_late_ipa_passes) \ DEF_PASS_LIST (all_passes) #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST, -- 1.8.5.3
[PATCH 20/21] PR jit/63854: Fix leak in ipa-icf.c
ipa-icf.c was leaking 16 bytes per iteration in the jit testcases here: 80 bytes in 5 blocks are definitely lost in loss record 94 of 199 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D764B7: xmalloc (xmalloc.c:147) by 0x5C962FF: ipa_icf::sem_item_optimizer::get_group_by_hash(unsigned int, ipa_icf::sem_item_type) (ipa-icf.c:1503) by 0x5C96C8C: ipa_icf::sem_item_optimizer::build_hash_based_classes() (ipa-icf.c:1723) by 0x5C96765: ipa_icf::sem_item_optimizer::execute() (ipa-icf.c:1616) by 0x5C98E81: ipa_icf::ipa_icf_driver() (ipa-icf.c:2373) by 0x5C993C8: ipa_icf::pass_ipa_icf::execute(function*) (ipa-icf.c:2421) by 0x5223CCD: execute_one_pass(opt_pass*) (passes.c:2306) by 0x5224BA7: execute_ipa_pass_list(opt_pass*) (passes.c:2700) by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088) by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172) by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325) by not freeing the allocation here: congruence_class_group *item = XNEW (congruence_class_group); in sem_item_optimizer::get_group_by_hash. Fix it. gcc/ChangeLog: PR jit/63854 * ipa-icf.c (sem_item_optimizer::~sem_item_optimizer): Free each congruence_class_group *. --- gcc/ipa-icf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index 2d5fcf5..b0ba82e 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -1322,6 +1322,7 @@ sem_item_optimizer::~sem_item_optimizer () delete (*it)->classes[i]; (*it)->classes.release (); + free (*it); } m_items.release (); -- 1.8.5.3
[PATCH 19/21] PR jit/63854: Fix leak of ipa hooks
This fixes three leaks in IPA seen in jit testcases with valgrind: This one: 96 bytes in 4 blocks are definitely lost in loss record 102 of 205 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D76447: xmalloc (xmalloc.c:147) by 0x4E35C23: symbol_table::add_cgraph_insertion_hook(void (*)(cgraph_node*, void*), void*) (cgraph.c:383) by 0x51070C6: ipa_register_cgraph_hooks() (ipa-prop.c:3864) by 0x5C753D8: ipcp_generate_summary() (ipa-cp.c:3786) by 0x5223540: execute_ipa_summary_passes(ipa_opt_pass_d*) (passes.c:2102) by 0x4E49A60: ipa_passes() (cgraphunit.c:2074) by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172) by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325) by 0x4DC999C: jit_langhook_write_globals() (dummy-frontend.c:216) by 0x532B3A6: compile_file() (toplev.c:583) by 0x532E15F: do_compile() (toplev.c:2020) appears to be caused by ipa-prop.c (ipa_register_cgraph_hooks) unconditionally inserting ipa_add_new_function as "function_insertion_hook_holder", rather than if the latter is non-NULL, like the other hooks. These two in ipa-reference.c: 96 bytes in 4 blocks are definitely lost in loss record 103 of 205 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D76447: xmalloc (xmalloc.c:147) by 0x4E35AA9: symbol_table::add_cgraph_removal_hook(void (*)(cgraph_node*, void*), void*) (cgraph.c:329) by 0x5CA446E: ipa_init() (ipa-reference.c:435) by 0x5CA47D1: generate_summary() (ipa-reference.c:551) by 0x5CA4E70: propagate() (ipa-reference.c:684) by 0x5CA640E: (anonymous namespace)::pass_ipa_reference::execute(function*) (ipa-reference.c:1178) by 0x5223CC1: execute_one_pass(opt_pass*) (passes.c:2306) by 0x5224B9B: execute_ipa_pass_list(opt_pass*) (passes.c:2700) by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088) by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172) by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325) 96 bytes in 4 blocks are definitely lost in loss record 104 of 205 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D76447: xmalloc (xmalloc.c:147) by 0x4E35E29: symbol_table::add_cgraph_duplication_hook(void (*)(cgraph_node*, cgraph_node*, void*), void*) (cgraph.c:453) by 0x5CA4493: ipa_init() (ipa-reference.c:437) by 0x5CA47D1: generate_summary() (ipa-reference.c:551) by 0x5CA4E70: propagate() (ipa-reference.c:684) by 0x5CA640E: (anonymous namespace)::pass_ipa_reference::execute(function*) (ipa-reference.c:1178) by 0x5223CC1: execute_one_pass(opt_pass*) (passes.c:2306) by 0x5224B9B: execute_ipa_pass_list(opt_pass*) (passes.c:2700) by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088) by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172) by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325) appear to be due to th hooks never being removed. My patch hacks in a removal of them into ipa_reference_c_finalize, but I suspect there's a cleaner place to put this. gcc/ChangeLog: PR jit/63854 * ipa-prop.c (ipa_register_cgraph_hooks): Guard insertion of ipa_add_new_function on function_insertion_hook_holder being non-NULL. * ipa-reference.c (ipa_reference_c_finalize): Remove node_removal_hook_holder and node_duplication_hook_holder if they've been added to symtab. * toplev.c (toplev::finalize): Call ipa_reference_c_finalize before cgraph_c_finalize so that the former can access "symtab". --- gcc/ipa-prop.c | 3 ++- gcc/ipa-reference.c | 11 +++ gcc/toplev.c| 4 +++- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index f87243c..a8238ac 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -3552,7 +3552,8 @@ ipa_register_cgraph_hooks (void) if (!node_duplication_hook_holder) node_duplication_hook_holder = symtab->add_cgraph_duplication_hook (&ipa_node_duplication_hook, NULL); - function_insertion_hook_holder = + if (!function_insertion_hook_holder) +function_insertion_hook_holder = symtab->add_cgraph_insertion_hook (&ipa_add_new_function, NULL); } diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c index 1ce06d1..b046f9e 100644 --- a/gcc/ipa-reference.c +++ b/gcc/ipa-reference.c @@ -1198,4 +1198,15 @@ ipa_reference_c_finalize (void) bitmap_obstack_release (&optimization_summary_obstack); ipa_init_p = false; } + + if (node_removal_hook_holder) +{ + symtab->remove_cgraph_removal_hook (node_removal_hook_holder); + node_removal_hook_holder = NULL; +} + if (node_duplication_hook_holder) +{ + symtab->remove_cgraph_duplication_hook (node_duplication_hook_holder); + node_duplication_hook_holder = NULL; +} } diff --git a/gcc/toplev.c b/gcc/toplev.c index c0c418c..96e7af9 100644 --- a/gcc
[PATCH 09/21] PR jit/63854: Don't leak producer_string in dwarf2out.c
Fix this small per-iteration leak with debuginfo: 424 bytes in 4 blocks are definitely lost in loss record 185 of 230 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75CA7: xmalloc (xmalloc.c:147) by 0x4ECE9E4: gen_producer_string() (dwarf2out.c:19489) by 0x4EDB2C8: dwarf2out_finish(char const*) (dwarf2out.c:24257) by 0x532AD3B: compile_file() (toplev.c:623) by 0x532D9E1: do_compile() (toplev.c:2020) by 0x532DC54: toplev::main(int, char**) (toplev.c:2117) by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615) by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) by 0x401CA4: test_jit (harness.h:190) by 0x401D88: main (harness.h:232) gcc/ChangeLog: PR jit/63854 * dwarf2out.c (dwarf2out_c_finalize): Free producer_string. --- gcc/dwarf2out.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index b16883f..9069f9a 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -24741,6 +24741,8 @@ dwarf2out_c_finalize (void) frame_pointer_fb_offset = 0; frame_pointer_fb_offset_valid = false; base_types.release (); + XDELETEVEC (producer_string); + producer_string = NULL; } #include "gt-dwarf2out.h" -- 1.8.5.3
[PATCH 05/21] PR jit/63854: Fix memory leak of save_decoded_options
This commit fixes this leak from opts-common.c, about 4KB per iteration. ==57820== 18,816 (2,560 direct, 16,256 indirect) bytes in 4 blocks are definitely lost in loss record 907 of 917 ==57820==at 0x4A083AA: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==57820==by 0x59A67CC: xrealloc (xmalloc.c:179) ==57820==by 0x59587C9: decode_cmdline_options_to_array(unsigned int, char const**, unsigned int, cl_decoded_option**, unsigned int*) (opts-common.c:885) ==57820==by 0x4E2ED90: toplev::main(int, char**) (toplev.c:2089) ==57820==by 0x4E43186: gcc::jit::playback::context::compile() (jit-playback.c:1615) ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() (jit-recording.c:861) ==57820==by 0x401CA4: test_jit (harness.h:190) ==57820==by 0x401D88: main (harness.h:232) gcc/ChangeLog: PR jit/63854 * toplev.c (toplev::finalize): Clean up save_decoded_options. --- gcc/toplev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/toplev.c b/gcc/toplev.c index 876279f..291b84d 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -2173,6 +2173,8 @@ toplev::finalize (void) finalize_options_struct (&global_options); finalize_options_struct (&global_options_set); + XDELETEVEC (save_decoded_options); + /* Clean up the context (and pass_manager etc). */ delete g; g = NULL; -- 1.8.5.3
[PATCH 18/21] PR jit/63854: Add "long-term" allocator to gcc::context
Some places in the startup code use char * values that can sometimes be string literals, and can sometimes be dynamically built using xstrdup or concat. This isn't a problem for cc1 etc since this is only called once, but for libgccjit they are small per-invocation leaks. There's no clean way to release them: retrofitting logic to decide if we're dealing with a string literal vs a dynamically-allocated buffer (and if something else is pointing to said buffer) is messy and error-prone; they are also unconnected to the GC. Hence the cleanest way to fix the leak is to add a new "long-term" allocator, to gcc::context. This gives back buffers that will persist until the gcc::context is cleaned away. Using this fixes these leaks seen via valgrind: 28 bytes in 4 blocks are definitely lost in loss record 13 of 209 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D76417: xmalloc (xmalloc.c:147) by 0x5D76509: xstrdup (xstrdup.c:34) by 0x532CC38: process_options() (toplev.c:1316) by 0x532DFF4: do_compile() (toplev.c:1976) by 0x532E3B0: toplev::main(int, char**) (toplev.c:2117) by 0x4DE7983: gcc::jit::playback::context::compile() (jit-playback.c:1646) by 0x4DD791A: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5E12: gcc_jit_context_compile (libgccjit.c:2014) by 0x401CA4: test_jit (harness.h:190) by 0x401D88: main (harness.h:232) 108 bytes in 12 blocks are definitely lost in loss record 135 of 241 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75E07: xmalloc (xmalloc.c:147) by 0x5D7098B: concat (concat.c:147) by 0x563A342: build_common_builtin_nodes() (tree.c:10145) by 0x4DC91FC: jit_langhook_init() (dummy-frontend.c:128) by 0x532D3F4: lang_dependent_init(char const*) (toplev.c:1790) by 0x532D9C8: do_compile() (toplev.c:2007) by 0x532DCAC: toplev::main(int, char**) (toplev.c:2118) by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615) by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) by 0x401DC4: test_jit (harness.h:190) 108 bytes in 12 blocks are definitely lost in loss record 136 of 241 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D75E07: xmalloc (xmalloc.c:147) by 0x5D7098B: concat (concat.c:147) by 0x563A3B6: build_common_builtin_nodes() (tree.c:10151) by 0x4DC91FC: jit_langhook_init() (dummy-frontend.c:128) by 0x532D3F4: lang_dependent_init(char const*) (toplev.c:1790) by 0x532D9C8: do_compile() (toplev.c:2007) by 0x532DCAC: toplev::main(int, char**) (toplev.c:2118) by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615) by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) by 0x401DC4: test_jit (harness.h:190) 136 bytes in 4 blocks are definitely lost in loss record 129 of 209 at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D76417: xmalloc (xmalloc.c:147) by 0x532BE28: init_asm_output(char const*) (toplev.c:936) by 0x532DB34: lang_dependent_init(char const*) (toplev.c:1795) by 0x532E0CC: do_compile() (toplev.c:2006) by 0x532E3B0: toplev::main(int, char**) (toplev.c:2117) by 0x4DE7983: gcc::jit::playback::context::compile() (jit-playback.c:1646) by 0x4DD791A: gcc::jit::recording::context::compile() (jit-recording.c:861) by 0x4DD5E12: gcc_jit_context_compile (libgccjit.c:2014) by 0x401CA4: test_jit (harness.h:190) by 0x401D88: main (harness.h:232) gcc/ChangeLog: PR jit/63854 * context.c (gcc::context::context): Initialize m_longterm_obstack. (gcc::context::~context): Free m_longterm_obstack. (gcc::context::longterm_xalloc): New method. (gcc::context::longterm_xstrdup): New method. * context.h: Include obstack.h (gcc::context::longterm_xalloc): New method. (gcc::context::longterm_xallocvec): New method (gcc::context::longterm_xstrdup): New method. (gcc::context::m_longterm_obstack): New field. * toplev.c (init_asm_output): Use g->longterm_xallocvec rather than XNEWVEC when allocating "dumpname" to avoid a leak. (process_options): Likewise, use g->longterm_xstrdup rather than xstrdup when allocating "name" (and hence aux_base_name). * tree.c: Include context.h. (build_common_builtin_nodes): When writing dynamically-allocated entries into built_in_names, use g->longterm_xstrdup to take a copy of the buffer acquired by concat to avoid a leak. --- gcc/context.c | 18 ++ gcc/context.h | 46 ++ gcc/toplev.c | 4 ++-- gcc/tree.c| 17 +++-- 4 files changed, 77 insertions(+), 8 dele
[PATCH 14/21] PR jit/63854: Fix leak of paths within jump threading
Paths are allocated as: vec *path = new vec (); i.e. the vec itself is on the heap, so a mere: path->release (); is merely releasing the buffer the vec holds, not the vec itself. This patch updates the two sites that release paths to also delete them, fixing leaks like this seen by valgrind: 160 bytes in 20 blocks are definitely lost in loss record 165 of 241 at 0x4A06965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x556E5AF: thread_across_edge(gimple_statement_base*, edge_def*, bool, vec*, tree_node * (*)(gimple_statement_base*, gimple_statement_base*)) (tree-ssa-threadedge.c:1124) by 0x547B73B: dom_opt_dom_walker::thread_across_edge(edge_def*) (tree-ssa-dom.c:1184) by 0x5480183: dom_opt_dom_walker::after_dom_children(basic_block_def*) (tree-ssa-dom.c:2015) by 0x5C1C5F7: dom_walker::walk(basic_block_def*) (domwalk.c:218) by 0x547AE83: (anonymous namespace)::pass_dominator::execute(function*) (tree-ssa-dom.c:919) by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306) by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358) by 0x5223865: execute_pass_list_1(opt_pass*) (passes.c:2359) by 0x52238A2: execute_pass_list(function*, opt_pass*) (passes.c:2369) by 0x4E4888F: cgraph_node::expand() (cgraphunit.c:1773) by 0x4E48F29: expand_all_functions() (cgraphunit.c:1909) [Space-wise, a vec is just a pointer, hence the 8 bytes per block seen in the valgrind report] gcc/ChangeLog: PR jit/63854 * tree-ssa-threadedge.c (thread_across_edge): Don't just release "path", delete it. * tree-ssa-threadupdate.c (delete_jump_thread_path): Likewise. --- gcc/tree-ssa-threadedge.c | 1 + gcc/tree-ssa-threadupdate.c | 1 + 2 files changed, 2 insertions(+) diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index d5b9696..9ee1cae 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -1149,6 +1149,7 @@ thread_across_edge (gimple dummy_cond, through the vector entries. */ gcc_assert (path->length () == 0); path->release (); + delete path; /* A negative status indicates the target block was deemed too big to duplicate. Just quit now rather than trying to use the block as diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 151ed83..5cb9f86 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2481,6 +2481,7 @@ delete_jump_thread_path (vec *path) for (unsigned int i = 0; i < path->length (); i++) delete (*path)[i]; path->release(); + delete path; } /* Register a jump threading opportunity. We queue up all the jump -- 1.8.5.3
[PATCH 21/21] PR jit/63854: Fix leaks in test-fuzzer.c
gcc/testsuite/ChangeLog: PR jit/63854 * jit.dg/test-fuzzer.c (fuzzer_init): Free malloced buffers. (make_random_function): Free ff->locals. --- gcc/testsuite/jit.dg/test-fuzzer.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/testsuite/jit.dg/test-fuzzer.c b/gcc/testsuite/jit.dg/test-fuzzer.c index f363f8f..b501792 100644 --- a/gcc/testsuite/jit.dg/test-fuzzer.c +++ b/gcc/testsuite/jit.dg/test-fuzzer.c @@ -105,6 +105,11 @@ fuzzer_init (fuzzer *f, gcc_jit_context *ctxt, unsigned int seed) for (i = 0; i < num_funcs; i++) f->funcs[f->num_funcs++] = make_random_function (f); + + /* Now clean out f. */ + free (f->types); + free (f->funcs); + free (f->globals); } /* Get random int in inclusive range [min, max]. */ @@ -309,6 +314,7 @@ make_random_function (fuzzer *f) gcc_jit_function *result = ff->fn; + free (ff->locals); free (ff->params); free (ff); -- 1.8.5.3
Re: [PING][PATCH][AARCH64]Fix PR63424 by adding v2di3 pattern
On 10/11/14 16:55, Renlin Li wrote: On 06/11/14 15:00, Renlin Li wrote: Hi all, Dose anybody have time to review this? Kind regards, Renlin Li On 31/10/14 14:51, Renlin Li wrote: Hi all, This is a patch which will fix PR63424. It implements signed/unsigned max/min pattern for V2DI mode in terms of vcondv2div2di pattern. In this particular case, VEC_COND_EXPR (V2DImode) is generated as aarch64 target supports it (vcond for VALL). The VEC_COND_EXPR will further folded into MIN_EXPR/MAX_EXPR in dom pass unconditionally. Later in expand pass, the compiler tries to expand min_expr using standard RTL operation. It fails, because aarch64 target don't have minv2di3 pattern implemented. It then tries to generate conditional move and compare&branch sequence, all fails. At last it falls into libfunc call, no luck either. An ICE to complain about this. aarch64-none-elf toolchain has been tested on the model, no regressions. Is it Okay for trunk? gcc/ChangeLog: 2014-10-31 Renlin Li PR target/63424 * config/aarch64/aarch64-simd.md (v2di3): New. gcc/testsuite/ChangeLog: 2014-10-31 Renlin Li PR target/63424 * gcc.target/aarch64/pr63424.c: New. Hi, Dose anybody have time to review this? Thank you so much! Regards, Renlin Li Ping again. Regards, Renlin Li
[PATCH] Fix for fold_negate_expr (PR sanitizer/63879)
The problem here is that negate_expr_p returns true (as it should) for UINT_MAX(OVF), but fold_negate_expr didn't actually fold it, and that is a no-no; if negate_expr_p is true, fold_negate_expr must not return NULL_TREE. I added the following hunk for bootstrap and regtest for better coverage: @@ -752,6 +755,12 @@ fold_negate_expr (location_t loc, tree t) break; } +#ifdef ENABLE_CHECKING + if (negate_expr_p (t)) +internal_error ("fold_negate_expr should never return NULL_TREE " + "if negate_expr_p is true"); +#endif + return NULL_TREE; } Fixed by adjusting fold_negate_expr so that it folds non-trapping wrapping constants. The SANITIZE_SI_OVERFLOW check is important to not regress -Woverflow warnings (ug). negate_expr_p hunk is to match the NEGATE_EXPR case in fold_negate_expr. Bootstrapped/regtested on power8-linux, ok for trunk? 2014-11-19 Marek Polacek PR sanitizer/63879 * fold-const.c (negate_expr_p) : Return !TYPE_OVERFLOW_SANITIZED. (fold_negate_expr) : Fold when overflow does not trap and when overflow wraps, or when SANITIZE_SI_OVERFLOW is 0. * c-c++-common/ubsan/pr63879-1.c: New test. * c-c++-common/ubsan/pr63879-2.c: New test. diff --git gcc/fold-const.c gcc/fold-const.c index f6fb8af..719e06e 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -408,9 +408,11 @@ negate_expr_p (tree t) && TYPE_OVERFLOW_WRAPS (type)); case FIXED_CST: -case NEGATE_EXPR: return true; +case NEGATE_EXPR: + return !TYPE_OVERFLOW_SANITIZED (type); + case REAL_CST: /* We want to canonicalize to positive real constants. Pretend that only negative ones can be easily negated. */ @@ -555,7 +557,8 @@ fold_negate_expr (location_t loc, tree t) tem = fold_negate_const (t, type); if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) || (!TYPE_OVERFLOW_TRAPS (type) - && (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0)) + && TYPE_OVERFLOW_WRAPS (type)) + || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0) return tem; break; diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-1.c gcc/testsuite/c-c++-common/ubsan/pr63879-1.c index e69de29..2035849 100644 --- gcc/testsuite/c-c++-common/ubsan/pr63879-1.c +++ gcc/testsuite/c-c++-common/ubsan/pr63879-1.c @@ -0,0 +1,23 @@ +/* PR sanitizer/63879 */ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=undefined" } */ + +struct A +{ + int inode; +} * a; +int b, c; +void +fn1 () +{ + int d = 0; + while (b) +{ + if (a->inode) +d++; + a = 0; +} + c = d - 1; + for (; c >= 0; c--) +; +} diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-2.c gcc/testsuite/c-c++-common/ubsan/pr63879-2.c index e69de29..34eb8e7 100644 --- gcc/testsuite/c-c++-common/ubsan/pr63879-2.c +++ gcc/testsuite/c-c++-common/ubsan/pr63879-2.c @@ -0,0 +1,13 @@ +/* PR sanitizer/63879 */ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=undefined" } */ + +int a; +void +fn1 () +{ + int b = 2; + for (; a;) +while (b >= 0) + b--; +} Marek
Re: [PATCH 01/21] PR jit/63854: Fix memory leak within gcc_options
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Valgrind shows this fixes ~1 KB of leak per iteration (on x86_64) by > plugging these leaks allocated at opts.c lines 286 and 289: > > ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 875 of > 917 > ==57820==at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==57820==by 0x59A6747: xmalloc (xmalloc.c:147) > ==57820==by 0x595542A: init_options_struct(gcc_options*, gcc_options*) > (opts.c:286) > ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081) > ==57820==by 0x4E43186: gcc::jit::playback::context::compile() > (jit-playback.c:1615) > ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() > (jit-recording.c:861) > ==57820==by 0x401CA4: test_jit (harness.h:190) > ==57820==by 0x401D88: main (harness.h:232) > ==57820== > ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 876 of > 917 > ==57820==at 0x4A081D4: calloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==57820==by 0x59A6780: xcalloc (xmalloc.c:162) > ==57820==by 0x595543E: init_options_struct(gcc_options*, gcc_options*) > (opts.c:289) > ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081) > ==57820==by 0x4E43186: gcc::jit::playback::context::compile() > (jit-playback.c:1615) > ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() > (jit-recording.c:861) > ==57820==by 0x401CA4: test_jit (harness.h:190) > ==57820==by 0x401D88: main (harness.h:232) Ok. Thanks, Richard. > gcc/ChangeLog: > PR jit/63854 > * opts.c (finalize_options_struct): New. > * opts.h (finalize_options_struct): New. > * toplev.c (toplev::finalize): Call finalize_options_struct > on global_options and global_options_set. > --- > gcc/opts.c | 8 > gcc/opts.h | 1 + > gcc/toplev.c | 3 +++ > 3 files changed, 12 insertions(+) > > diff --git a/gcc/opts.c b/gcc/opts.c > index d22882b..dabd3c6 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -307,6 +307,14 @@ init_options_struct (struct gcc_options *opts, struct > gcc_options *opts_set) >targetm_common.option_init_struct (opts); > } > > +/* Release any allocations owned by OPTS. */ > + > +void > +finalize_options_struct (struct gcc_options *opts) > +{ > + XDELETEVEC (opts->x_param_values); > +} > + > /* If indicated by the optimization level LEVEL (-Os if SIZE is set, > -Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT > to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language > diff --git a/gcc/opts.h b/gcc/opts.h > index f694082..c3ec942 100644 > --- a/gcc/opts.h > +++ b/gcc/opts.h > @@ -325,6 +325,7 @@ extern void decode_cmdline_options_to_array (unsigned int > argc, > extern void init_options_once (void); > extern void init_options_struct (struct gcc_options *opts, > struct gcc_options *opts_set); > +extern void finalize_options_struct (struct gcc_options *opts); > extern void decode_cmdline_options_to_array_default_mask (unsigned int argc, > const char **argv, > struct > cl_decoded_option **decoded_options, > diff --git a/gcc/toplev.c b/gcc/toplev.c > index 2e48047..4b4e568 100644 > --- a/gcc/toplev.c > +++ b/gcc/toplev.c > @@ -2169,4 +2169,7 @@ toplev::finalize (void) >ipa_cp_c_finalize (); >ipa_reference_c_finalize (); >params_c_finalize (); > + > + finalize_options_struct (&global_options); > + finalize_options_struct (&global_options_set); > } > -- > 1.8.5.3 >
Re: [PATCH 02/21] PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Valgrind shows this fixes ~4 KB of leak per iteration (on x86_64) by > plugging this leak allocated at reginfo.c:1327: > gcc_obstack_init (&valid_mode_changes_obstack); Ok. Thanks, Richard. > ==57820== 16,256 bytes in 4 blocks are definitely lost in loss record 906 of > 917 > ==57820==at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==57820==by 0x59A6747: xmalloc (xmalloc.c:147) > ==57820==by 0x30958842DB: _obstack_begin (obstack.c:184) > ==57820==by 0x51C1EC1: init_subregs_of_mode() (reginfo.c:1327) > ==57820==by 0x50D2A38: init_costs() (ira-costs.c:2181) > ==57820==by 0x50D74A8: ira_costs() (ira-costs.c:2211) > ==57820==by 0x50D1326: ira_build() (ira-build.c:3459) > ==57820==by 0x50C909C: (anonymous > namespace)::pass_ira::execute(function*) (ira.c:5227) > ==57820==by 0x51884F7: execute_one_pass(opt_pass*) (passes.c:2269) > ==57820==by 0x5188B75: execute_pass_list_1(opt_pass*) (passes.c:2321) > ==57820==by 0x5188B87: execute_pass_list_1(opt_pass*) (passes.c:2322) > ==57820==by 0x5188BC8: execute_pass_list(function*, opt_pass*) > (passes.c:2332) > > gcc/ChangeLog: > PR jit/63854 > * reginfo.c (finish_subregs_of_mode): Replace obstack_finish with > obstack_free when cleaning up valid_mode_changes_obstack. > --- > gcc/reginfo.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/reginfo.c b/gcc/reginfo.c > index efe23cd..c2daf22 100644 > --- a/gcc/reginfo.c > +++ b/gcc/reginfo.c > @@ -1343,7 +1343,7 @@ void > finish_subregs_of_mode (void) > { >XDELETEVEC (valid_mode_changes); > - obstack_finish (&valid_mode_changes_obstack); > + obstack_free (&valid_mode_changes_obstack, NULL); > } > > /* Free all data attached to the structure. This isn't a destructor because > -- > 1.8.5.3 >
Re: [PATCH][AARCH64]Fix PR63424 by adding v2di3 pattern
On 10/31/2014 03:51 PM, Renlin Li wrote: > +(define_expand "v2di3" > + [(parallel [ > +(set (match_operand:V2DI 0 "register_operand" "") > + (MAXMIN:V2DI (match_operand:V2DI 1 "register_operand" "") > + (match_operand:V2DI 2 "register_operand" ""))) > +(clobber (reg:CC CC_REGNUM))])] > + "TARGET_SIMD" There's no clobber of CC_REGNUM, so you can take that out. Otherwise it looks good. r~
Re: [PATCH 06/21] PR jit/63854: Fix leak of opts_obstack
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > This was leaking about 4KB per iteration: Ok. Thanks, Richard. > 16,256 bytes in 4 blocks are definitely lost in loss record 233 of 239 >at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x5D75C17: xmalloc (xmalloc.c:147) >by 0x30958842DB: _obstack_begin (obstack.c:184) >by 0x5D1D317: init_options_struct(gcc_options*, gcc_options*) (opts.c:279) >by 0x532DB0C: toplev::main(int, char**) (toplev.c:2081) >by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615) >by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) >by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) >by 0x401CA4: test_jit (harness.h:190) >by 0x401D88: main (harness.h:232) > > gcc/ChangeLog: > PR jit/63854 > * toplev.c (toplev::finalize): Free opts_obstack. > --- > gcc/toplev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/gcc/toplev.c b/gcc/toplev.c > index 291b84d..0d617c2 100644 > --- a/gcc/toplev.c > +++ b/gcc/toplev.c > @@ -2178,4 +2178,6 @@ toplev::finalize (void) >/* Clean up the context (and pass_manager etc). */ >delete g; >g = NULL; > + > + obstack_free (&opts_obstack, NULL); > } > -- > 1.8.5.3 >
Re: [PATCH 07/21] PR jit/63854: Fix leak of optimization_summary_obstack
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > This was leaking ~4KB per iteration: Ok. Thanks, Richard. > 16,256 bytes in 4 blocks are definitely lost in loss record 234 of 239 >at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x5D75C17: xmalloc (xmalloc.c:147) >by 0x30958842DB: _obstack_begin (obstack.c:184) >by 0x4DFECDE: bitmap_obstack_initialize(bitmap_obstack*) (bitmap.c:338) >by 0x5CA3C1D: ipa_init() (ipa-reference.c:431) >by 0x5CA3FB1: generate_summary() (ipa-reference.c:551) >by 0x5CA4650: propagate() (ipa-reference.c:684) >by 0x5CA5BEE: (anonymous > namespace)::pass_ipa_reference::execute(function*) (ipa-reference.c:1178) >by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306) >by 0x5224427: execute_ipa_pass_list(opt_pass*) (passes.c:2700) >by 0x4E495C1: ipa_passes() (cgraphunit.c:2088) >by 0x4E498E0: symbol_table::compile() (cgraphunit.c:2172) > > gcc/ChangeLog: > PR jit/63854 > * ipa-reference.c (ipa_reference_c_finalize): Release > optimization_summary_obstack. > --- > gcc/ipa-reference.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c > index b421f63..1ce06d1 100644 > --- a/gcc/ipa-reference.c > +++ b/gcc/ipa-reference.c > @@ -1193,5 +1193,9 @@ make_pass_ipa_reference (gcc::context *ctxt) > void > ipa_reference_c_finalize (void) > { > - ipa_init_p = false; > + if (ipa_init_p) > +{ > + bitmap_obstack_release (&optimization_summary_obstack); > + ipa_init_p = false; > +} > } > -- > 1.8.5.3 >
Re: [PATCH 08/21] PR jit/63854: Add ira_costs_c_finalize
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > this_target_ira_int->free_ira_costs () is called by ira_init_costs, > but this isn't called after the compile, causing noise when running > under valgrind. > > This patch adds a ira_costs_c_finalize function to clean this up, > and calls it from toplev::finalize (and hence this isn't called by > cc1/cc1plus/etc, just by libgccjit.so). > > Doing so eliminates about 27KB of "leak" from the valgrind report. Ok. Thanks, Richard. > gcc/ChangeLog: > PR jit/63854 > * ira-costs.c (ira_costs_c_finalize): New function. > * ira.h (ira_costs_c_finalize): New prototype. > * toplev.c (toplev::finalize): Call ira_costs_c_finalize. > --- > gcc/ira-costs.c | 6 ++ > gcc/ira.h | 3 +++ > gcc/toplev.c| 1 + > 3 files changed, 10 insertions(+) > > diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c > index 122815b..2dabead 100644 > --- a/gcc/ira-costs.c > +++ b/gcc/ira-costs.c > @@ -2356,3 +2356,9 @@ ira_adjust_equiv_reg_cost (unsigned regno, int cost) >else > regno_equiv_gains[regno] += cost; > } > + > +void > +ira_costs_c_finalize (void) > +{ > + this_target_ira_int->free_ira_costs (); > +} > diff --git a/gcc/ira.h b/gcc/ira.h > index b294ea1..d62656c 100644 > --- a/gcc/ira.h > +++ b/gcc/ira.h > @@ -199,4 +199,7 @@ extern bool ira_bad_reload_regno (int, rtx, rtx); > > extern void ira_adjust_equiv_reg_cost (unsigned, int); > > +/* ira-costs.c */ > +extern void ira_costs_c_finalize (void); > + > #endif /* GCC_IRA_H */ > diff --git a/gcc/toplev.c b/gcc/toplev.c > index 0d617c2..0181a3f 100644 > --- a/gcc/toplev.c > +++ b/gcc/toplev.c > @@ -2168,6 +2168,7 @@ toplev::finalize (void) >gcse_c_finalize (); >ipa_cp_c_finalize (); >ipa_reference_c_finalize (); > + ira_costs_c_finalize (); >params_c_finalize (); > >finalize_options_struct (&global_options); > -- > 1.8.5.3 >
Re: [PATCH 11/21] PR jit/63854: Fix leak of "avail" within tree-ssa-pre.c
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Fix this leak: Ok. Thanks, Richard. > 120 bytes in 5 blocks are definitely lost in loss record 141 of 227 >at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x5D75D8F: xrealloc (xmalloc.c:177) >by 0x550DEBA: void va_heap::reserve(vec vl_embed>*&, unsigned int, bool) (vec.h:310) >by 0x550D509: vec::reserve(unsigned int, > bool) (vec.h:1428) >by 0x550DA3E: vec::reserve_exact(unsigned > int) (vec.h:1448) >by 0x550D167: vec::safe_grow(unsigned int) > (vec.h:1576) >by 0x55076FE: do_regular_insertion(basic_block_def*, basic_block_def*) > (tree-ssa-pre.c:3209) >by 0x5508379: insert_aux(basic_block_def*) (tree-ssa-pre.c:3526) >by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536) >by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536) >by 0x55084C0: insert() (tree-ssa-pre.c:3559) >by 0x550C5BC: (anonymous namespace)::pass_pre::execute(function*) > (tree-ssa-pre.c:4805) > > gcc/ChangeLog: > PR jit/63854 > * tree-ssa-pre.c (do_regular_insertion): Convert "avail" from > vec<> to auto_vec<> to fix a leak. > --- > gcc/tree-ssa-pre.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c > index ea99198..027dc1c 100644 > --- a/gcc/tree-ssa-pre.c > +++ b/gcc/tree-ssa-pre.c > @@ -3202,7 +3202,7 @@ do_regular_insertion (basic_block block, basic_block > dom) >bool new_stuff = false; >vec exprs; >pre_expr expr; > - vec avail = vNULL; > + auto_vec avail; >int i; > >exprs = sorted_array_from_bitmap_set (ANTIC_IN (block)); > -- > 1.8.5.3 >
Re: [PATCH 12/21] PR jit/63854: Add a valgrind suppresion file
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Valgrind complains about uninitialized data within sparseset_bit_p. > Provide a suppression file to silence these warnings. > > Valgrind requires suppression files for C++ code to use the mangled > names, so we do that here. There is --enable-valgrind-annotations to get the same effect by GCC telling valgrind about this (and more). Richard. > contrib/ChangeLog > PR jit/63854 > * valgrind.supp: New. > --- > contrib/valgrind.supp | 11 +++ > 1 file changed, 11 insertions(+) > create mode 100644 contrib/valgrind.supp > > diff --git a/contrib/valgrind.supp b/contrib/valgrind.supp > new file mode 100644 > index 000..ec9ff02 > --- /dev/null > +++ b/contrib/valgrind.supp > @@ -0,0 +1,11 @@ > +{ > + suppress-uninit-cond-with-sparseset_bit_p > + Memcheck:Cond > + fun:_ZL15sparseset_bit_pP13sparseset_defm > +} > + > +{ > + suppress-uninit-use-with-sparseset_bit_p > + Memcheck:Value8 > + fun:_ZL15sparseset_bit_pP13sparseset_defm > +} > -- > 1.8.5.3 >
Re: [PATCH 16/21] PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Valgrind showed a per-iteration leak of pass_ipa_pta (and presumably > pass_omp_simd_clone): Ok. Thanks, Richard. > 704 (352 direct, 352 indirect) bytes in 4 blocks are definitely lost in loss > record 198 of 241 >at 0x4A06965: operator new(unsigned long) (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x555A55D: make_pass_ipa_pta(gcc::context*) > (tree-ssa-structalias.c:7425) >by 0x5219FDA: gcc::pass_manager::pass_manager(gcc::context*) > (pass-instances.def:141) >by 0x4E4F5D1: gcc::context::context() (context.c:37) >by 0x532C07B: general_init(char const*) (toplev.c:1212) >by 0x532DB5E: toplev::main(int, char**) (toplev.c:2074) >by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615) >by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) >by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) >by 0x401DC4: test_jit (harness.h:190) >by 0x401EA8: main (harness.h:232) > > Investigation showed that ~pass_manager wasn't deleting these > passes since for some reason GCC_PASS_LISTS didn't contain > all_late_ipa_passes and so wasn't calling delete_pass_tree on it. > > Add it to GCC_PASS_LISTS, fixing the leak. > > gcc/ChangeLog: > PR jit/63854 > * pass_manager.h (GCC_PASS_LISTS): Add all_late_ipa_passes. > --- > gcc/pass_manager.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h > index 82857a9..e6658ad 100644 > --- a/gcc/pass_manager.h > +++ b/gcc/pass_manager.h > @@ -29,6 +29,7 @@ struct register_pass_info; >DEF_PASS_LIST (all_lowering_passes) \ >DEF_PASS_LIST (all_small_ipa_passes) \ >DEF_PASS_LIST (all_regular_ipa_passes) \ > + DEF_PASS_LIST (all_late_ipa_passes) \ >DEF_PASS_LIST (all_passes) > > #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST, > -- > 1.8.5.3 >
Re: [PATCH 04/21] PR jit/63854: Fix memory leak within bb-reorder.c
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Valgrind showed this leaking 200 bytes per iteration in one of > my testcases: Ok. Thanks, Richard. > 1,000 bytes in 5 blocks are definitely lost in loss record 200 of 241 >at 0x4A083AA: realloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x5D75C5C: xrealloc (xmalloc.c:179) >by 0x4E10734: void > va_heap::reserve(vec vl_embed>*&, unsigned int, bool) (vec.h:310 > ) >by 0x4E105BB: vec::reserve(unsigned > int, bool) (vec.h:1428) >by 0x4E15B37: vec vl_ptr>::safe_push(basic_block_def* const&) (vec.h:1537) >by 0x5B61F44: find_rarely_executed_basic_blocks_and_crossing_edges() > (bb-reorder.c:1614) >by 0x5B63E90: (anonymous > namespace)::pass_partition_blocks::execute(function*) (bb-reorder.c:2711) >by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306) >by 0x52237C4: execute_pass_list_1(opt_pass*) (passes.c:2358) >by 0x52237F5: execute_pass_list_1(opt_pass*) (passes.c:2359) >by 0x5223832: execute_pass_list(function*, opt_pass*) (passes.c:2369) >by 0x4E4884F: cgraph_node::expand() (cgraphunit.c:1773) > > Fix is trivial. > > gcc/ChangeLog: > PR jit/63854 > * bb-reorder.c > (find_rarely_executed_basic_blocks_and_crossing_edges): Convert > local bbs_in_hot_partition from vec<> to auto_vec<>. > --- > gcc/bb-reorder.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c > index 1f7c3ee..4ca3bb2 100644 > --- a/gcc/bb-reorder.c > +++ b/gcc/bb-reorder.c > @@ -1582,7 +1582,7 @@ find_rarely_executed_basic_blocks_and_crossing_edges > (void) >edge e; >edge_iterator ei; >unsigned int cold_bb_count = 0; > - vec bbs_in_hot_partition = vNULL; > + auto_vec bbs_in_hot_partition; > >/* Mark which partition (hot/cold) each basic block belongs in. */ >FOR_EACH_BB_FN (bb, cfun) > -- > 1.8.5.3 >
Re: [PATCH 05/21] PR jit/63854: Fix memory leak of save_decoded_options
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > This commit fixes this leak from opts-common.c, about 4KB per iteration. Ok. Thanks, Richard. > ==57820== 18,816 (2,560 direct, 16,256 indirect) bytes in 4 blocks are > definitely lost in loss record 907 of 917 > ==57820==at 0x4A083AA: realloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==57820==by 0x59A67CC: xrealloc (xmalloc.c:179) > ==57820==by 0x59587C9: decode_cmdline_options_to_array(unsigned int, char > const**, unsigned int, cl_decoded_option**, unsigned int*) (opts-common.c:885) > ==57820==by 0x4E2ED90: toplev::main(int, char**) (toplev.c:2089) > ==57820==by 0x4E43186: gcc::jit::playback::context::compile() > (jit-playback.c:1615) > ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() > (jit-recording.c:861) > ==57820==by 0x401CA4: test_jit (harness.h:190) > ==57820==by 0x401D88: main (harness.h:232) > > gcc/ChangeLog: > PR jit/63854 > * toplev.c (toplev::finalize): Clean up save_decoded_options. > --- > gcc/toplev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/gcc/toplev.c b/gcc/toplev.c > index 876279f..291b84d 100644 > --- a/gcc/toplev.c > +++ b/gcc/toplev.c > @@ -2173,6 +2173,8 @@ toplev::finalize (void) >finalize_options_struct (&global_options); >finalize_options_struct (&global_options_set); > > + XDELETEVEC (save_decoded_options); > + >/* Clean up the context (and pass_manager etc). */ >delete g; >g = NULL; > -- > 1.8.5.3 >
Re: [PATCH 20/21] PR jit/63854: Fix leak in ipa-icf.c
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > ipa-icf.c was leaking 16 bytes per iteration in the jit testcases here: Ok. Tahnks, Richard. > 80 bytes in 5 blocks are definitely lost in loss record 94 of 199 >at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x5D764B7: xmalloc (xmalloc.c:147) >by 0x5C962FF: ipa_icf::sem_item_optimizer::get_group_by_hash(unsigned int, > ipa_icf::sem_item_type) (ipa-icf.c:1503) >by 0x5C96C8C: ipa_icf::sem_item_optimizer::build_hash_based_classes() > (ipa-icf.c:1723) >by 0x5C96765: ipa_icf::sem_item_optimizer::execute() (ipa-icf.c:1616) >by 0x5C98E81: ipa_icf::ipa_icf_driver() (ipa-icf.c:2373) >by 0x5C993C8: ipa_icf::pass_ipa_icf::execute(function*) (ipa-icf.c:2421) >by 0x5223CCD: execute_one_pass(opt_pass*) (passes.c:2306) >by 0x5224BA7: execute_ipa_pass_list(opt_pass*) (passes.c:2700) >by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088) >by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172) >by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325) > > by not freeing the allocation here: > > congruence_class_group *item = XNEW (congruence_class_group); > > in sem_item_optimizer::get_group_by_hash. > > Fix it. > > gcc/ChangeLog: > PR jit/63854 > * ipa-icf.c (sem_item_optimizer::~sem_item_optimizer): Free each > congruence_class_group *. > --- > gcc/ipa-icf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c > index 2d5fcf5..b0ba82e 100644 > --- a/gcc/ipa-icf.c > +++ b/gcc/ipa-icf.c > @@ -1322,6 +1322,7 @@ sem_item_optimizer::~sem_item_optimizer () > delete (*it)->classes[i]; > >(*it)->classes.release (); > + free (*it); > } > >m_items.release (); > -- > 1.8.5.3 >
Re: [PATCH 09/21] PR jit/63854: Don't leak producer_string in dwarf2out.c
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Fix this small per-iteration leak with debuginfo: Ok. Thanks, Richard. > 424 bytes in 4 blocks are definitely lost in loss record 185 of 230 >at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x5D75CA7: xmalloc (xmalloc.c:147) >by 0x4ECE9E4: gen_producer_string() (dwarf2out.c:19489) >by 0x4EDB2C8: dwarf2out_finish(char const*) (dwarf2out.c:24257) >by 0x532AD3B: compile_file() (toplev.c:623) >by 0x532D9E1: do_compile() (toplev.c:2020) >by 0x532DC54: toplev::main(int, char**) (toplev.c:2117) >by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615) >by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861) >by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014) >by 0x401CA4: test_jit (harness.h:190) >by 0x401D88: main (harness.h:232) > > gcc/ChangeLog: > PR jit/63854 > * dwarf2out.c (dwarf2out_c_finalize): Free producer_string. > --- > gcc/dwarf2out.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index b16883f..9069f9a 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -24741,6 +24741,8 @@ dwarf2out_c_finalize (void) >frame_pointer_fb_offset = 0; >frame_pointer_fb_offset_valid = false; >base_types.release (); > + XDELETEVEC (producer_string); > + producer_string = NULL; > } > > #include "gt-dwarf2out.h" > -- > 1.8.5.3 >
Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)
On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote: > > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because > > they are never used for integer arithmetics (and there is no way > > to represent all their values in RTL if not using CONST_WIDE_INT). > > As the following testcase shows, simplify_immed_subreg can be called > > with such modes though, e.g. trying to forward propagate a CONST_VECTOR > > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear > > in the IL directly) into a SUBREG_REG. > > So we have (subreg:OI (reg:V4DF ...) ...) for example? What do we > end doing with that OI mode subreg? (why can't we simply use the > V4DF one?) propagate_rtx_1 is called on (subreg:OI (reg:V8DI 89) 0) with old_rtx (reg:V8DI 89) and new_rtx (const_vector:V8DI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ]) Seems we throw away the result in that case though, because gen_lowpart_common doesn't like to return low parts of VOIDmode constants larger if the mode is larger than double int: 1353 innermode = GET_MODE (x); 1354 if (CONST_INT_P (x) 1355 && msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT) 1356innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0); 1357 else if (innermode == VOIDmode) 1358innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0); we hit the last if and as mode is wider than innermode, we return NULL later on. > > The following patch instead of ICE handles the most common cases (all 0 > > and all 1 CONST_VECTORs) and returns NULL otherwise. > > > > Before wide-int got merged, the testcase worked, though the code didn't > > bother checking anything, just created 0 or constm1_rtx for the two cases > > that could happen and if something else appeared, could just return what > > matched low TImode (or DImode for -m32). > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Looks ok to me. Not sure if the zero/all-ones case really happens that > much to be worth special-casing - I think you could use > fixed_wide_int and simply see if the result is representable > in a CONST_INT/CONST_DOUBLE? Can you try if that works? It looks like > the patch may be smaller for that. So perhaps something like this? Don't know how much more inefficient it is compared to what it used to do before. Or just keep the existing code and just remove the assert and instead return NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT? At least during propagation that will make zero change. Though, in that case I have still doubts about the current code handling right modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64). If TARGET_SUPPORTS_WIDE_INT == 0, we still silently throw away the upper bits, don't we? BTW, to Mike, the assert has been misplaced, there was first buffer overflow and only after that the assert. 2014-11-19 Jakub Jelinek PR target/63910 * simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider than MAX_BITSIZE_MODE_ANY_INT. * gcc.target/i386/pr63910.c: New test. --- gcc/simplify-rtx.c.jj 2014-11-19 09:17:15.491327992 +0100 +++ gcc/simplify-rtx.c 2014-11-19 12:28:16.223808178 +0100 @@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute int units = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT; - HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; - wide_int r; + const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE ++ HOST_BITS_PER_WIDE_INT - 1) + / HOST_BITS_PER_WIDE_INT; + HOST_WIDE_INT tmp[tmpsize]; + typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int; + imm_int r; for (u = 0; u < units; u++) { @@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute tmp[u] = buf; base += HOST_BITS_PER_WIDE_INT; } - gcc_assert (GET_MODE_PRECISION (outer_submode) - <= MAX_BITSIZE_MODE_ANY_INT); - r = wide_int::from_array (tmp, units, - GET_MODE_PRECISION (outer_submode)); + r = imm_int::from_array (tmp, units, +GET_MODE_PRECISION (outer_submode)); +#if TARGET_SUPPORTS_WIDE_INT == 0 + if (wi::min_precision (r, SIGNED) > HOST_BITS_PER_DOUBLE_INT) + return NULL_RTX; +#endif elems[elem] = immed_wide_int_const (r, outer_submode); } break; --- gcc/testsuite/gcc.target/i386/pr63910.c.jj 2014-1
Re: [PATCH 14/21] PR jit/63854: Fix leak of paths within jump threading
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm wrote: > Paths are allocated as: >vec *path = new vec (); > i.e. the vec itself is on the heap, so a mere: > path->release (); > is merely releasing the buffer the vec holds, not the vec itself. Ok. Thanks, RIchard. > This patch updates the two sites that release paths to also delete them, > fixing leaks like this seen by valgrind: > 160 bytes in 20 blocks are definitely lost in loss record 165 of 241 >at 0x4A06965: operator new(unsigned long) (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >by 0x556E5AF: thread_across_edge(gimple_statement_base*, edge_def*, bool, > vec*, tree_node > * (*)(gimple_statement_base*, gimple_statement_base*)) > (tree-ssa-threadedge.c:1124) >by 0x547B73B: dom_opt_dom_walker::thread_across_edge(edge_def*) > (tree-ssa-dom.c:1184) >by 0x5480183: dom_opt_dom_walker::after_dom_children(basic_block_def*) > (tree-ssa-dom.c:2015) >by 0x5C1C5F7: dom_walker::walk(basic_block_def*) (domwalk.c:218) >by 0x547AE83: (anonymous namespace)::pass_dominator::execute(function*) > (tree-ssa-dom.c:919) >by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306) >by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358) >by 0x5223865: execute_pass_list_1(opt_pass*) (passes.c:2359) >by 0x52238A2: execute_pass_list(function*, opt_pass*) (passes.c:2369) >by 0x4E4888F: cgraph_node::expand() (cgraphunit.c:1773) >by 0x4E48F29: expand_all_functions() (cgraphunit.c:1909) > > [Space-wise, a vec is just a pointer, hence the 8 bytes > per block seen in the valgrind report] > > gcc/ChangeLog: > PR jit/63854 > * tree-ssa-threadedge.c (thread_across_edge): Don't just release > "path", delete it. > * tree-ssa-threadupdate.c (delete_jump_thread_path): Likewise. > --- > gcc/tree-ssa-threadedge.c | 1 + > gcc/tree-ssa-threadupdate.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c > index d5b9696..9ee1cae 100644 > --- a/gcc/tree-ssa-threadedge.c > +++ b/gcc/tree-ssa-threadedge.c > @@ -1149,6 +1149,7 @@ thread_across_edge (gimple dummy_cond, > through the vector entries. */ >gcc_assert (path->length () == 0); >path->release (); > + delete path; > >/* A negative status indicates the target block was deemed too big to > duplicate. Just quit now rather than trying to use the block as > diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c > index 151ed83..5cb9f86 100644 > --- a/gcc/tree-ssa-threadupdate.c > +++ b/gcc/tree-ssa-threadupdate.c > @@ -2481,6 +2481,7 @@ delete_jump_thread_path (vec *path) >for (unsigned int i = 0; i < path->length (); i++) > delete (*path)[i]; >path->release(); > + delete path; > } > > /* Register a jump threading opportunity. We queue up all the jump > -- > 1.8.5.3 >
Re: [PATCH] Fix for fold_negate_expr (PR sanitizer/63879)
On Wed, 19 Nov 2014, Marek Polacek wrote: > The problem here is that negate_expr_p returns true (as it should) > for UINT_MAX(OVF), but fold_negate_expr didn't actually fold it, > and that is a no-no; if negate_expr_p is true, fold_negate_expr must > not return NULL_TREE. I added the following hunk for bootstrap and > regtest for better coverage: > > @@ -752,6 +755,12 @@ fold_negate_expr (location_t loc, tree t) >break; > } > > +#ifdef ENABLE_CHECKING > + if (negate_expr_p (t)) > +internal_error ("fold_negate_expr should never return NULL_TREE " > + "if negate_expr_p is true"); > +#endif > + >return NULL_TREE; > } > > Fixed by adjusting fold_negate_expr so that it folds non-trapping > wrapping constants. The SANITIZE_SI_OVERFLOW check is important > to not regress -Woverflow warnings (ug). > > negate_expr_p hunk is to match the NEGATE_EXPR case in fold_negate_expr. > > Bootstrapped/regtested on power8-linux, ok for trunk? Ok. Thanks, Richard. > 2014-11-19 Marek Polacek > > PR sanitizer/63879 > * fold-const.c (negate_expr_p) : Return > !TYPE_OVERFLOW_SANITIZED. > (fold_negate_expr) : Fold when overflow > does not trap and when overflow wraps, or when SANITIZE_SI_OVERFLOW > is 0. > > * c-c++-common/ubsan/pr63879-1.c: New test. > * c-c++-common/ubsan/pr63879-2.c: New test. > > diff --git gcc/fold-const.c gcc/fold-const.c > index f6fb8af..719e06e 100644 > --- gcc/fold-const.c > +++ gcc/fold-const.c > @@ -408,9 +408,11 @@ negate_expr_p (tree t) > && TYPE_OVERFLOW_WRAPS (type)); > > case FIXED_CST: > -case NEGATE_EXPR: >return true; > > +case NEGATE_EXPR: > + return !TYPE_OVERFLOW_SANITIZED (type); > + > case REAL_CST: >/* We want to canonicalize to positive real constants. Pretend > that only negative ones can be easily negated. */ > @@ -555,7 +557,8 @@ fold_negate_expr (location_t loc, tree t) >tem = fold_negate_const (t, type); >if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) > || (!TYPE_OVERFLOW_TRAPS (type) > - && (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0) > return tem; >break; > > diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-1.c > gcc/testsuite/c-c++-common/ubsan/pr63879-1.c > index e69de29..2035849 100644 > --- gcc/testsuite/c-c++-common/ubsan/pr63879-1.c > +++ gcc/testsuite/c-c++-common/ubsan/pr63879-1.c > @@ -0,0 +1,23 @@ > +/* PR sanitizer/63879 */ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=undefined" } */ > + > +struct A > +{ > + int inode; > +} * a; > +int b, c; > +void > +fn1 () > +{ > + int d = 0; > + while (b) > +{ > + if (a->inode) > +d++; > + a = 0; > +} > + c = d - 1; > + for (; c >= 0; c--) > +; > +} > diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-2.c > gcc/testsuite/c-c++-common/ubsan/pr63879-2.c > index e69de29..34eb8e7 100644 > --- gcc/testsuite/c-c++-common/ubsan/pr63879-2.c > +++ gcc/testsuite/c-c++-common/ubsan/pr63879-2.c > @@ -0,0 +1,13 @@ > +/* PR sanitizer/63879 */ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=undefined" } */ > + > +int a; > +void > +fn1 () > +{ > + int b = 2; > + for (; a;) > +while (b >= 0) > + b--; > +} > > Marek > > -- Richard Biener SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)
On Wed, 19 Nov 2014, Jakub Jelinek wrote: > On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote: > > > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because > > > they are never used for integer arithmetics (and there is no way > > > to represent all their values in RTL if not using CONST_WIDE_INT). > > > As the following testcase shows, simplify_immed_subreg can be called > > > with such modes though, e.g. trying to forward propagate a CONST_VECTOR > > > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can > > > appear > > > in the IL directly) into a SUBREG_REG. > > > > So we have (subreg:OI (reg:V4DF ...) ...) for example? What do we > > end doing with that OI mode subreg? (why can't we simply use the > > V4DF one?) > > propagate_rtx_1 is called on > (subreg:OI (reg:V8DI 89) 0) > with old_rtx > (reg:V8DI 89) > and new_rtx > (const_vector:V8DI [ > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > ]) > > Seems we throw away the result in that case though, because > gen_lowpart_common doesn't like to return low parts of VOIDmode constants > larger if the mode is larger than double int: > 1353innermode = GET_MODE (x); > 1354if (CONST_INT_P (x) > 1355&& msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT) > 1356 innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0); > 1357else if (innermode == VOIDmode) > 1358 innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0); > we hit the last if and as mode is wider than innermode, we return NULL later > on. > > > > The following patch instead of ICE handles the most common cases (all 0 > > > and all 1 CONST_VECTORs) and returns NULL otherwise. > > > > > > Before wide-int got merged, the testcase worked, though the code didn't > > > bother checking anything, just created 0 or constm1_rtx for the two cases > > > that could happen and if something else appeared, could just return what > > > matched low TImode (or DImode for -m32). > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > Looks ok to me. Not sure if the zero/all-ones case really happens that > > much to be worth special-casing - I think you could use > > fixed_wide_int and simply see if the result is representable > > in a CONST_INT/CONST_DOUBLE? Can you try if that works? It looks like > > the patch may be smaller for that. > > So perhaps something like this? Don't know how much more inefficient it is > compared to what it used to do before. Yes, that looks good. > Or just keep the existing code and just remove the assert and instead return > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT? At > least during propagation that will make zero change. > Though, in that case I have still doubts about the current code handling right > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64). If TARGET_SUPPORTS_WIDE_INT > == 0, we still silently throw away the upper bits, don't we? Well - not with your added check, no? I'd say the patch is ok. Thanks, Richard. > BTW, to Mike, the assert has been misplaced, there was first buffer overflow > and only after that the assert. > > 2014-11-19 Jakub Jelinek > > PR target/63910 > * simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider > than MAX_BITSIZE_MODE_ANY_INT. > > * gcc.target/i386/pr63910.c: New test. > > --- gcc/simplify-rtx.c.jj 2014-11-19 09:17:15.491327992 +0100 > +++ gcc/simplify-rtx.c2014-11-19 12:28:16.223808178 +0100 > @@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute > int units > = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1) > / HOST_BITS_PER_WIDE_INT; > - HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / > HOST_BITS_PER_WIDE_INT]; > - wide_int r; > + const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE > + + HOST_BITS_PER_WIDE_INT - 1) > + / HOST_BITS_PER_WIDE_INT; > + HOST_WIDE_INT tmp[tmpsize]; > + typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int; > + imm_int r; > > for (u = 0; u < units; u++) > { > @@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute > tmp[u] = buf; > base += HOST_BITS_PER_WIDE_INT; > } > - gcc_assert (GET_MODE_PRECISION (outer_submode) > - <= MAX_BITSIZE_MODE_ANY_INT); > - r = wide_int::from_array (tmp, units, > - GET_MODE_PRECISION (outer_submode)); > + r = imm_int::from_array (tmp, units, > + GET_MODE_PRECISION (outer_sub
Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)
On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote: > > So perhaps something like this? Don't know how much more inefficient it is > > compared to what it used to do before. > > Yes, that looks good. > > > Or just keep the existing code and just remove the assert and instead return > > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT? At > > least during propagation that will make zero change. > > Though, in that case I have still doubts about the current code handling > > right > > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than > > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64). If TARGET_SUPPORTS_WIDE_INT > > == 0, we still silently throw away the upper bits, don't we? > > Well - not with your added check, no? For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok. Not sure about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up) precision? Mike? Jakub
[C++ Patch] PR 55425
Hi, today I wanted to simply close this Bug as fixed, but then I noticed that Richard Smith in the audit trail argued, correctly in my opinion, that we should accept things like: constexpr const char* x() { return __func__; } in C++11 mode too, because the "as if" local variable specification of __func__ in the Standard doesn't matter for the purpose of constexpr, because otherwise in C++11 no constexpr function would be valid: the Standard does *not* say that __func__ is only predefined if it is used. That said, I think a quick way to accept __func__, __FUNCTION__, and __PRETTY_FUNCTION__ is checking whether DECL_ARTIFICIAL (decl) is true on the decl. Tested x86_64-linux. Thanks, Paolo. /// /cp 2014-11-19 Paolo Carlini PR c++/55425 * constexpr.c (constexpr_fn_retval): Accept __func__, __FUNCTION__, and __PRETTY_FUNCTION__. /testsuite 2014-11-19 Paolo Carlini PR c++/55425 * g++.dg/cpp0x/constexpr-__func__.C Index: cp/constexpr.c === --- cp/constexpr.c (revision 217755) +++ cp/constexpr.c (working copy) @@ -616,9 +616,14 @@ constexpr_fn_retval (tree body) return break_out_target_exprs (TREE_OPERAND (body, 0)); case DECL_EXPR: - if (TREE_CODE (DECL_EXPR_DECL (body)) == USING_DECL) - return NULL_TREE; - return error_mark_node; + { + tree decl = DECL_EXPR_DECL (body); + if (TREE_CODE (decl) == USING_DECL + /* Accept __func__, __FUNCTION__, and __PRETTY_FUNCTION__. */ + || DECL_ARTIFICIAL (decl)) + return NULL_TREE; + return error_mark_node; + } case CLEANUP_POINT_EXPR: return constexpr_fn_retval (TREE_OPERAND (body, 0)); Index: testsuite/g++.dg/cpp0x/constexpr-__func__.C === --- testsuite/g++.dg/cpp0x/constexpr-__func__.C (revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-__func__.C (working copy) @@ -0,0 +1,6 @@ +// PR c++/55425 +// { dg-do compile { target c++11 } } + +constexpr const char* x() { return __func__; } +constexpr const char* y() { return __FUNCTION__; } +constexpr const char* z() { return __PRETTY_FUNCTION__; }
Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)
On Wed, 19 Nov 2014, Jakub Jelinek wrote: > On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote: > > > So perhaps something like this? Don't know how much more inefficient it > > > is > > > compared to what it used to do before. > > > > Yes, that looks good. > > > > > Or just keep the existing code and just remove the assert and instead > > > return > > > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT? At > > > least during propagation that will make zero change. > > > Though, in that case I have still doubts about the current code handling > > > right > > > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than > > > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64). If > > > TARGET_SUPPORTS_WIDE_INT > > > == 0, we still silently throw away the upper bits, don't we? > > > > Well - not with your added check, no? > > For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok. Not sure > about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or > is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up) > precision? Mike? It can represent any - well, the RTX at least. Code then using "simple" wide_int may wreck then though. Richard.
Re: [PATCH 4/4] Data structure is used for inline_summary struct.
On 11/18/2014 11:25 PM, Martin Jambor wrote: On Tue, Nov 18, 2014 at 07:59:26PM +0100, Jan Hubicka wrote: Hi, On Tue, Nov 18, 2014 at 04:39:00PM +0100, Jan Hubicka wrote: On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote: b) with GTY, we cannot call destructor Everything in symbol table is expecitely memory managed (i.e. enver left to be freed by garbage collector). It resists in GTY only to allow linking garbage collected object from them and to get PCH working. Well, if I understand the intent correctly, summaries are for stuff that is not in the symbol table. For example jump functions are a Correct. vector of structures possibly containing trees, so everything has to be in garbage collected memory. When an edge is removed, it is necessary to be notified about it immediately, for example to decrement rdesc_refcount (you might argue that that should be done in a separate hook and not from within a summary class but then you start to rely on hook invocation ordering so I think it is better to eventually use the summaries for it too). I do not see why ctors/dtors can not do the reference counting. In fact this is how refcounting is done usually anyway? Well, when there is no garbage collection involved then yes, that is how you normally do it but in the GC case, there is the question of what is the appropriate time to call destructor on garbage collected data (like jump functions)? I still fail to see problem here. Summaries are explicitly managed- they are constructed at summary construction time or when new callgarph node is introduced/duplicated. They are destroyed when callgarph node is destroyed or whole summary is ddestroyed. It is job of the summary datastructure to call proper ctors/dtors, not job of garbage collector that provides the underlying memory management. I do not think that all summaries (in the meaning of a description of one particular symbol table node or call graph edge) are explicitely managed. For example ipa_edge_args or ipa_agg_replacement_value (which my alignment patch changes to ipcp_transformation_summary) are allocated in GC memory because they contain trees. If you have datastructure that points to something that is not explicitly managed (i.e. tree expression), you just can not have non-trivial constructor on that datastructure, because that is freed transparently by gty that don't do destruction... I admit to not being particularly bright today but that seems to be exactly my point. Well, in your case you have datastructure jump_function that contain a pointer to tree (EXPR). What I am trying to explain is that I see no reson why jump_function needs to be POD. I never said that the summary object needs to be a POD, I only said I liked the possibility of storing very simple objects (without wrapping them in classes with constructors and destructors). That is of course nothing more than my personal preference. The tree pointed to by EXPR pointer can not have a dtor by itself because GGC will not call it upon freeing. It is true that jump_function lives in GGC memory (to make pointer to expr work) but it never gets removed by ggc_collect because it is always pointed to by the summary datastructure. There are two ways to free the jump_function datastructure. 1) removing the symbol node it is attached to. Here the symtab code will call removal hook that was registered by container template. The container will call destructor of jump_function and the ggc_free its memory 2) removing the summary. In this case I would again expect the container template to walk all summaries and free them. So even if your structure lives in GGC memory it is not really garbage collected and thus the lack of machinery to call dtors at a time ggc decides to free something is not a problem? In fact looking at struct default_hashmap_traits, I see: /* Called to dispose of the key and value before marking the entry as deleted. */ template static void remove (T &v) { v.~T (); } Now I see, I should have read your previous email more carefully, by explicitely managed you mean that destructors will be called explicitely by the summary infrastructure. I was wondering how you wanted to rip the summaries out of GGC memory. Well, I suppose that would work, and since explicit calls to destructors are basically the counterpart of placement new that we already plan to use, it might be actually be the proper C++ thing to do. (I am not sure I like it though, for all other purposes the summary objects will look like managed by the garbage collector and only we who read this thread will know that the lifetime of the object would be decoupled from the allocation-span of its memory). Thanks for the clarification, Martin Hello. I tried to come up with ctor/dtor solution for types passes to symbol_summary template class. Example: struct inline_summary { inline_summary (cgraph_node *node); inline_summary (
[PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs
Hi, It has come to my attention that we create suboptimal code for the `__call_stub_fp_' variant of the MIPS16 call stubs. These stubs are used for outgoing calls made from MIPS16 code to standard MIPS code that return floating-point results and may also pass floating-point arguments. This can be illustrated with this small program: $ cat foo.c double bar (double d); int main (int argc, char **argv) { return bar (argc); } $ mips-linux-gnu-gcc -O2 -mips16 -c foo.c $ mips-linux-gnu-objdump -dr foo.o foo.o: file format elf32-tradbigmips Disassembly of section .mips16.call.fp.bar: <__call_stub_fp_bar>: 0: 44856000mtc1a1,$f12 4: 44846800mtc1a0,$f13 8: 03e09021moves2,ra c: 0c00jal 0 <__call_stub_fp_bar> c: R_MIPS_26bar 10: nop 14: 4403mfc1v1,$f0 18: 0248jr s2 1c: 44020800mfc1v0,$f1 Disassembly of section .text.startup: : 0: f100 64c4 save32,ra,s2 4: 1800 jal 0 4: R_MIPS16_26 __mips16_floatsidf 8: 6500nop a: 67a3movea1,v1 c: 1800 jal 0 c: R_MIPS16_26 bar 10: 6782movea0,v0 12: 67a3movea1,v1 14: 1800 jal 0 14: R_MIPS16_26 __mips16_fix_truncdfsi 18: 6782movea0,v0 1a: f100 6444 restore 32,ra,s2 1e: e8a0jrc ra -- as you can see in `__call_stub_fp_bar' above the jump delay slot remains empty because the move to $s2 instruction cannot be scheduled there due to a data dependency on $ra. However there is no need to save $ra last and since the MIPS IV ISA there are no coprocessor move delay slots so the last MTC1 instruction could be scheduled there instead. So here is a change to avoid this empty delay slot. With this change in place we get this code instead: $ mips-linux-gnu-objdump -dr foo.o foo.o: file format elf32-tradbigmips Disassembly of section .mips16.call.fp.bar: <__call_stub_fp_bar>: 0: 03e09021moves2,ra 4: 44856000mtc1a1,$f12 8: 0c00jal 0 <__call_stub_fp_bar> 8: R_MIPS_26bar c: 44846800mtc1a0,$f13 10: 4403mfc1v1,$f0 14: 0248jr s2 18: 44020800mfc1v0,$f1 Disassembly of section .text.startup: : 0: f100 64c4 save32,ra,s2 4: 1800 jal 0 4: R_MIPS16_26 __mips16_floatsidf 8: 6500nop a: 67a3movea1,v1 c: 1800 jal 0 c: R_MIPS16_26 bar 10: 6782movea0,v0 12: 67a3movea1,v1 14: 1800 jal 0 14: R_MIPS16_26 __mips16_fix_truncdfsi 18: 6782movea0,v0 1a: f100 6444 restore 32,ra,s2 1e: e8a0jrc ra -- as you can see the last MTC1 instruction has now been placed in the delay slot. For ISAs that do have a coprocessor move delay slot there is no gain, but no loss either, both delay slots remain empty due to data dependencies (coprocessor move delay slots): $ mips-linux-gnu-gcc -O2 -mips1 -mips16 -c foo.c $ mips-linux-gnu-objdump -dr foo.o foo.o: file format elf32-tradbigmips Disassembly of section .mips16.call.fp.bar: <__call_stub_fp_bar>: 0: 03e09021moves2,ra 4: 44856000mtc1a1,$f12 8: 44846800mtc1a0,$f13 c: 0c00jal 0 <__call_stub_fp_bar> c: R_MIPS_26bar 10: nop 14: 4403mfc1v1,$f0 18: 44020800mfc1v0,$f1 1c: 0248jr s2 20: nop Disassembly of section .text.startup: : 0: 63fcaddiu sp,-32 2: 6772movev1,s2 4: 6207sw ra,28(sp) 6: 1800 jal 0 6: R_MIPS16_26 __mips16_floatsidf a: d306sw v1,24(sp) c: 67a3movea1,v1 e: 1800 jal 0 e: R_MIPS16_26 bar 12: 6782movea0,v0 14: 67a3movea1,v1 16: 1800 jal 0 16: R_MIPS16_26 __mips16_fix_truncdfsi 1a: 6782movea0,v0 1c: 9606lw a2,24(sp) 1e: 9707lw a3,28(sp) 20: 6556moves2,a2 22: ef00jr a3 24: 6304addiu sp,32 26: 6500nop -- though I think this consideration is actually academic as no legacy MIPS16 processors have ever be
[PATCH] Fix PR63677 and a regression from alias-improvements
PR63677 shows that late complete unrolling can expose quite some memory CSE opportunities (and followup simplifications) which we fail to exploit because currently DOMs ability to do memory CSE is quite limited (it requires the redundant load to optimize to directly follow the load or store). This is actually a regression from before alias-improvements, thus to the time where we had multiple virtual operand symbols which made us handle some cases better than now with a single memory operand symbol. The following patch makes DOM use the alias walker to fix that. The easiest way to do that is to no longer hash virtual operands and use the expression hash lookup result as "candidate" if the expression involves memory. We can then verify if the candidate can be used by using the alias walker to walk to the candidate. If there is an intermediate possibly aliasing stmt we simply replace the hashtable entry and set up enough info to be able to rewind to the previous state during unwinding. This means the DOM implementation of memory CSE is a tad bit more efficient than that of SCCVN. But it is also very simplistic as it requires 1:1 requivalence for the memory access structure (as compared to SCCVN which can look through various kinds of aggregate copies or initializations as well as handling accessing the same memory location with different memory access structures). This is also the reason that the patch doesn't fix PR63864 which requires us to look through aggregate copies. To fix PR63677 I also had to teach DOM the trick of propagating &a + CST as &MEM[&a, CST] which is used by all other propagators. Given how easy this was to restore pre-alias-improvements state I wonder why I didn't think of this before... Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Ok for trunk? Thanks, Richard. 2014-11-19 Richard Biener PR tree-optimization/63677 * tree-ssa-dom.c: Include gimplify.h for unshare_expr. (avail_exprs_stack): Make a vector of pairs. (struct hash_expr_elt): Replace stmt member with vop member. (expr_elt_hasher::equal): Simplify. (initialize_hash_element): Adjust. (initialize_hash_element_from_expr): Likewise. (dom_opt_dom_walker::thread_across_edge): Likewise. (record_cond): Likewise. (dom_opt_dom_walker::before_dom_children): Likewise. (print_expr_hash_elt): Likewise. (remove_local_expressions_from_table): Restore previous state if requested. (record_equivalences_from_stmt): Record &x + CST as constant &MEM[&x, CST] for further propagation. (vuse_eq): New function. (lookup_avail_expr): For loads use the alias oracle to see whether a candidate from the expr hash is usable. (avail_expr_hash): Do not hash VUSEs. * gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase. * gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise. Index: gcc/tree-ssa-dom.c === *** gcc/tree-ssa-dom.c.orig 2014-11-19 13:28:46.123015743 +0100 --- gcc/tree-ssa-dom.c 2014-11-19 13:58:33.122937543 +0100 *** along with GCC; see the file COPYING3. *** 66,71 --- 66,72 #include "tree-ssa-threadedge.h" #include "tree-ssa-dom.h" #include "inchash.h" + #include "gimplify.h" /* This file implements optimizations on the dominator tree. */ *** struct edge_info *** 139,145 marker. */ typedef struct expr_hash_elt * expr_hash_elt_t; ! static vec avail_exprs_stack; /* Structure for entries in the expression hash table. */ --- 140,146 marker. */ typedef struct expr_hash_elt * expr_hash_elt_t; ! static vec > avail_exprs_stack; /* Structure for entries in the expression hash table. */ *** struct expr_hash_elt *** 151,158 /* The expression (rhs) we want to record. */ struct hashable_expr expr; ! /* The stmt pointer if this element corresponds to a statement. */ ! gimple stmt; /* The hash value for RHS. */ hashval_t hash; --- 152,160 /* The expression (rhs) we want to record. */ struct hashable_expr expr; ! /* The virtual operand associated with the nearest dominating stmt ! loading from or storing to expr. */ ! tree vop; /* The hash value for RHS. */ hashval_t hash; *** expr_elt_hasher::hash (const value_type *** 187,196 inline bool expr_elt_hasher::equal (const value_type &p1, const compare_type &p2) { - gimple stmt1 = p1->stmt; const struct hashable_expr *expr1 = &p1->expr; const struct expr_hash_elt *stamp1 = p1->stamp; - gimple stmt2 = p2->stmt; const struct hashable_expr *expr2 = &p2->expr; const struct expr_hash_elt *stamp2 = p2->stamp; --- 189,196 *** expr_elt_hasher::equal (const value_type *** 198,222 if (stamp1 == stamp2) return true; !
[PATCH, ARM] attribute target (thumb,arm) [0/6]
Hello Ramana, Here is the attribute revisited after your comments, so - thumb1 is now supported - -mflip-thump option added for testing. - inlining is allowed between modes. This set of patches was tested on rev#217709 as: no regressions with: arm-sim/ arm-sim/-march=armv7-a arm-sim/-mthumb arm-sim/-mthumb/-march=armv7-a a few artifacts, all of them analyzed, with arm-sim/-mflip-thumb/ arm-sim/-mflip-thumb//-march=armv7-a arm-sim/-mflip-thumb//-mthumb arm-sim/-mflip-thumb//-mthumb/-march=armv7-a Artifacts are analyzed, they are mostly the fault of the testsuite being unable to mix modes in the expected results (e.g thumb[1,2] or arm tests predicates, mix setjmp/longjump between modes...) The support is split as followed: [1/6] Reorganized arm_option_override to dynamically redefine the flags depending on the attribute mode. [2/6] Reorganized macro settings to be set/unset for each #pragma targets [3/6] Change ARM_DECLARE_FUNCTION_NAME into a function [4/6] Implement hooks to support attribute target [5/6] Implement #pragma target [6/6] Add -mflip-thumb option for testing I think I missed the stage3, Anyway would it be OK for stage1 when it reopens ? Christian
RE: [PATCH] MIPS/GCC: Unconditional jump generation bug fix
On Tue, 18 Nov 2014, Matthew Fortune wrote: > > > I admit to being a bit more nervous about 4.9 but the test coverage > > > seems thorough enough. I guess I would have been less concerned if the > > > optimisation was still just tied to TARGET_MICROMIPS for the 4.9 > > branch. > > > > > > Catherine, what do you think? > > > > > This is okay for 4.9 IMO. > > OK FWIW we've been using this change since Oct 2012 with no issues (as I noted it was meant to be included with the original microMIPS support submission, but was lost in transit) and also GAS has code to relax out-of-range branches to jumps in non-PIC standard MIPS code under the same condition this RTL insn uses, so even if a wrong branch slipped through here (which it doesn't), then GAS would fix it up. See gas/config/tc-mips.c (md_apply_fix) for the relaxation piece if interested. Maciej
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
> -Original Message- > From: Matthew Fortune [mailto:matthew.fort...@imgtec.com] > Sent: Tuesday, November 18, 2014 9:42 AM > To: Andrew Bennett; gcc-patches@gcc.gnu.org > Cc: Moore, Catherine; Rozycki, Maciej > Subject: RE: [PATCH] If using branch likelies in MIPS sync code fill the delay > slot with a nop > > > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing > > when using the -mfix-r1 option. This is due to the fact that the > > delay slot of the branch instruction that checks if the atomic > > operation was not successful can be filled with an operation that > > returns the output result. For the branch likely case this operation > > can not go in the delay slot because it wont execute when the atomic > > operation was successful and therefore the wrong result will be > > returned. This patch forces a nop to be placed in the delay slot if a > > branch > likely is used. > > The fix is as simple as possible; it does cause a second nop to be > > introduced after the branch instruction in the branch likely case. As > > this option is rarely used, it makes more sense to keep the code > > readable than trying to optimise it. > > > > The atomic tests now pass successfully. The ChangeLog and patch are > > below. > > > > Ok to commit? > > I'm OK with just making the fix-r1 case safe rather than also complicating > the normal code path to remove the normal delay slot NOP in this special > case. I'd like to see what Catherine thinks though. To remove the second > NOP I believe we would have to add a !TARGET_FIX_R1 to the condition > of the normal delay slot NOP. This seems a bit convoluted when the real > reason is because branch likely is in use. To make use of the > mips_branch_likely flag it would have to be set earlier in: > mips_output_sync_loop and also get set in mips_sync_loop_insns. > > If Catherine thinks getting rid of the second NOP is important enough then > please base it on mips_branch_likely and fix the callers. > Yes, removing the second NOP is worth the additional effort.
[PATCH, ARM] attribute target (thumb,arm) [1/6]
In preparation of the target attribute, reorganize ´arm_option_override´ into 3 entities: arm_option_override_internal_p arm_option_check_internal arm_option_param_internal Also define and use TREE_TARGET macros instead of file-scope variables in the machine description. Thanks, Christian 2014-09-23 Christian Bruel * config/arm/arm.h (arm_option_override): Reoganized and split. (arm_option_params_internal); New function. (arm_option_check_internal): New function. (arm_option_override_internal): New function. (restrict_default): New boolean. (thumb_code, thumb1_code): Remove. * config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros. (TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise. (thumb_code, thumb1_code): Remove. * config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag. diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c --- a/gcc/gcc/config/arm/arm.c 2014-11-18 08:35:38.0 +0100 +++ b/gcc/gcc/config/arm/arm.c 2014-11-18 09:11:19.0 +0100 @@ -744,6 +744,9 @@ const struct arm_fpu_desc *arm_fpu_desc; rtx thumb_call_via_label[14]; static int thumb_call_reg_needed; +/* Remember default option is used. */ +static bool restrict_default; + /* Bit values used to identify processor capabilities. */ #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */ #define FL_ARCH3M (1 << 1)/* Extended multiply */ @@ -886,12 +889,6 @@ int arm_tune_wbuf = 0; /* Nonzero if tuning for Cortex-A9. */ int arm_tune_cortex_a9 = 0; -/* Nonzero if generating Thumb instructions. */ -int thumb_code = 0; - -/* Nonzero if generating Thumb-1 instructions. */ -int thumb1_code = 0; - /* Nonzero if we should define __THUMB_INTERWORK__ in the preprocessor. XXX This is a bit of a hack, it's intended to help work around @@ -2601,6 +2598,136 @@ arm_gimplify_va_arg_expr (tree valist, t return std_gimplify_va_arg_expr (valist, type, pre_p, post_p); } +/* Check any incompatible options that the user has specified. */ +static void +arm_option_check_internal (struct gcc_options *opts) +{ + /* Make sure that the processor choice does not conflict with any of the + other command line choices. */ + if (TREE_TARGET_ARM (opts) && !(insn_flags & FL_NOTM)) +error ("target CPU does not support ARM mode"); + + /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done + from here where no function is being compiled currently. */ + if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) && TREE_TARGET_ARM (opts)) +warning (0, "enabling backtrace support is only meaningful when compiling for the Thumb"); + + if (TREE_TARGET_ARM (opts) && TARGET_CALLEE_INTERWORKING) +warning (0, "enabling callee interworking support is only meaningful when compiling for the Thumb"); + + /* If this target is normally configured to use APCS frames, warn if they + are turned off and debugging is turned on. */ + if (TREE_TARGET_ARM (opts) + && write_symbols != NO_DEBUG + && !TARGET_APCS_FRAME + && (TARGET_DEFAULT & MASK_APCS_FRAME)) +warning (0, "-g with -mno-apcs-frame may not give sensible debugging"); + + /* iWMMXt unsupported under Thumb mode. */ + if (TREE_TARGET_THUMB (opts) && TARGET_IWMMXT) +error ("iWMMXt unsupported under Thumb mode"); + + if (TARGET_HARD_TP && TREE_TARGET_THUMB1 (opts)) +error ("can not use -mtp=cp15 with 16-bit Thumb"); + + if (TREE_TARGET_THUMB (opts) && TARGET_VXWORKS_RTP && flag_pic) +{ + error ("RTP PIC is incompatible with Thumb"); + flag_pic = 0; +} + + /* We only support -mslow-flash-data on armv7-m targets. */ + if (target_slow_flash_data + && ((!(arm_arch7 && !arm_arch_notm) && !arm_arch7em) + || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON))) +error ("-mslow-flash-data only supports non-pic code on armv7-m targets"); +} + +/* Check any params depending on attributes that the user has specified. */ +static void +arm_option_params_internal (struct gcc_options *opts) +{ + /* If we are not using the default (ARM mode) section anchor offset + ranges, then set the correct ranges now. */ + if (TREE_TARGET_THUMB1 (opts)) +{ + /* Thumb-1 LDR instructions cannot have negative offsets. + Permissible positive offset ranges are 5-bit (for byte loads), + 6-bit (for halfword loads), or 7-bit (for word loads). + Empirical results suggest a 7-bit anchor range gives the best + overall code size. */ + targetm.min_anchor_offset = 0; + targetm.max_anchor_offset = 127; +} + else if (TREE_TARGET_THUMB2 (opts)) +{ + /* The minimum is set such that the total size of the block + for a particular anchor is 248 + 1 + 4095 bytes, which is + divisible by eight, ensuring natural spacing of anchors. */ + targetm.min_anchor_offset = -248; + targetm.max_anchor_offset = 4095;
Re: [PING^4][PATCH] Warn about unclosed pragma omp declare target.
As omp target and offloading support is committed to trunk, I think it's reasonable to add some new warnings. On 06 Nov 15:27, Ilya Tocar wrote: > Ping. > On 30 Oct 18:31, Ilya Tocar wrote: > > Ping. > > On 20 Oct 19:26, Ilya Tocar wrote: > > > Ping. > > > > > > On 02 Oct 17:38, Ilya Tocar wrote: > > > > Ping. > > > > On 15 Aug 16:26, Ilya Tocar wrote: > > > > > Ping. > > > > > > > > > > On 29 Jul 18:45, Ilya Tocar wrote: > > > > > > Hi, > > > > > > > > > > > > As discussed here in > > > > > > https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html > > > > > > Gcc should complain about pragma omp declare target without > > > > > > corresponding pragma omp end declare target. This patch adds a > > > > > > warning > > > > > > for those cases. > > > > > > Bootstraps/passes make-check. > > > > > > Ok for trunk? > > > > > > > > > > > > ChangeLog: > > > > > > > > > > > > 2014-07-29 Ilya Tocar > > > > > > > > > > > > * c-decl.c (omp_declare_target_location_stack): New. > > > > > > * c-lang.h (omp_declare_target_location_stack): Declare. > > > > > > * c-parser.c (warn_unclosed_pragma_omp_target): New. > > > > > > (c_parser_translation_unit): Call it. > > > > > > (c_parser_omp_declare_target): Remeber location. > > > > > > (c_parser_omp_end_declare_target): Forget location. > > > > > > > > > > > > And ChangeLog for testsuite: > > > > > > > > > > > > 2014-07-29 Ilya Tocar > > > > > > > > > > > > * gcc.dg/gomp//target-3.c: New testcase. > > > > > > > > > > > > --- > > > > > > gcc/c/c-decl.c | 3 +++ > > > > > > gcc/c/c-lang.h | 3 +++ > > > > > > gcc/c/c-parser.c | 22 +- > > > > > > gcc/testsuite/gcc.dg/gomp/target-3.c | 33 > > > > > > + > > > > > > 4 files changed, 60 insertions(+), 1 deletion(-) > > > > > > create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c > > > > > > > > > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c > > > > > > index 2a4b439..2dd5b2c 100644 > > > > > > --- a/gcc/c/c-decl.c > > > > > > +++ b/gcc/c/c-decl.c > > > > > > @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = > > > > > > VOIDmode; > > > > > > /* If non-zero, implicit "omp declare target" attribute is added > > > > > > into the > > > > > > attribute lists. */ > > > > > > int current_omp_declare_target_attribute; > > > > > > + > > > > > > +/* Holds locations of currently open "omp declare target" pragmas. > > > > > > */ > > > > > > +vec omp_declare_target_location_stack; > > > > > > > > > > > > /* Each c_binding structure describes one binding of an identifier > > > > > > to > > > > > > a decl. All the decls in a scope - irrespective of namespace - > > > > > > are > > > > > > diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h > > > > > > index e974906..cef995c 100644 > > > > > > --- a/gcc/c/c-lang.h > > > > > > +++ b/gcc/c/c-lang.h > > > > > > @@ -59,4 +59,7 @@ struct GTY(()) language_function { > > > > > > attribute lists. */ > > > > > > extern GTY(()) int current_omp_declare_target_attribute; > > > > > > > > > > > > +/* Holds locations of currently open "omp declare target" pragmas. > > > > > > */ > > > > > > +extern vec omp_declare_target_location_stack; > > > > > > + > > > > > > #endif /* ! GCC_C_LANG_H */ > > > > > > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c > > > > > > index e32bf04..0b96fe9 100644 > > > > > > --- a/gcc/c/c-parser.c > > > > > > +++ b/gcc/c/c-parser.c > > > > > > @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd > > > > > > (c_parser *, enum pragma_context); > > > > > > static tree c_parser_array_notation (location_t, c_parser *, tree, > > > > > > tree); > > > > > > static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, > > > > > > bool); > > > > > > > > > > > > +static void warn_unclosed_pragma_omp_target (); > > > > > > + > > > > > > /* Parse a translation unit (C90 6.7, C99 6.9). > > > > > > > > > > > > translation-unit: > > > > > > @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser) > > > > > > } > > > > > >while (c_parser_next_token_is_not (parser, CPP_EOF)); > > > > > > } > > > > > > + > > > > > > + warn_unclosed_pragma_omp_target (); > > > > > > } > > > > > > > > > > > > /* Parse an external declaration (C90 6.7, C99 6.9). > > > > > > @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser > > > > > > *parser, tree fndecl, tree parms, > > > > > > static void > > > > > > c_parser_omp_declare_target (c_parser *parser) > > > > > > { > > > > > > + location_t loc = c_parser_peek_token (parser)->location; > > > > > >c_parser_skip_to_pragma_eol (parser); > > > > > >current_omp_declare_target_attribute++; > > > > > > + omp_declare_target_location_stack.safe_push (loc); > > > > > > } > > > > > > > > > > > > static void > > > > > > @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser > > > > > > *
[PATCH, ARM] attribute target (thumb,arm) [2/6]
In preparation of the pragma target reorganize ´TARGET_CPU_CPP_BUILTINS´ to redefine mode dependent macros based on current thumb_p. Thanks, Christian 2014-09-23 Christian Bruel * config/arm/arm-c.c (cpp_def_or_undef): New functions. (arm_cpp_builtins): Likewise. * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Move mode dependant macros to arm_cpp_builtins. * config/arm/arm-protos.h (arm_cpp_builtins): Declare. diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm-c.c c/gcc/gcc/config/arm/arm-c.c --- b/gcc/gcc/config/arm/arm-c.c 2014-11-05 14:34:49.0 +0100 +++ c/gcc/gcc/config/arm/arm-c.c 2014-11-13 11:16:59.0 +0100 @@ -42,3 +42,73 @@ arm_lang_object_attributes_init (void) { arm_lang_output_object_attributes_hook = arm_output_c_attributes; } + +/* Define or undefine macro. */ + +static void +cpp_def_or_undef (struct cpp_reader *in, const char *str, bool def_p) +{ + if (def_p) +cpp_define (in, str); + else +cpp_undef (in, str); +} + +/* Define or undefine macros based on the current target. If the user does + #pragma GCC target, we need to adjust the macros dynamically. */ + +void +arm_cpp_builtins (struct cpp_reader *in, bool thumb_p) +{ + bool target_32bit_p = !thumb_p || arm_arch_thumb2; + bool thumb2_p = thumb_p && arm_arch_thumb2; + bool have_ldrex_p = (arm_arch6 && !thumb_p) || arm_arch7; + bool have_ldrexbh_p = (arm_arch6k && !thumb_p) || arm_arch7; + bool have_ldrexd_p = ((arm_arch6k && !thumb_p) || arm_arch7) +&& arm_arch_notm; + + int arm_feature_ldrex = (have_ldrex_p ? 4 : 0) +| (have_ldrexbh_p ? 3 : 0) | (have_ldrexd_p ? 8 : 0); + + cpp_def_or_undef (in, "__thumb__", thumb_p); + if (arm_arch_thumb2) +cpp_def_or_undef (in, "__thumb2__", thumb_p); + if (TARGET_BIG_END) +cpp_def_or_undef (in, "__THUMBEB__", thumb_p); + else +cpp_def_or_undef (in, "__THUMBEL__", thumb_p); + + cpp_def_or_undef (in, "__ARM_32BIT_STATE", target_32bit_p); /* TARGET_32BIT */ + + if (arm_arch5e && (arm_arch_notm || arm_arch7)) /* TARGET_ARM_QBIT */ +cpp_def_or_undef (in, "__ARM_FEATURE_QBIT", target_32bit_p); + + if (arm_arch6 && (arm_arch_notm || arm_arch7))/* TARGET_ARM_SAT */ +cpp_def_or_undef (in, "__ARM_FEATURE_SAT", target_32bit_p); + + if (arm_arch5e && (arm_arch_notm || arm_arch7em)) /* TARGET_DSP_MULTIPLY */ +cpp_def_or_undef (in, "__ARM_FEATURE_DSP", target_32bit_p); + + if (arm_arch6 && (arm_arch_notm || arm_arch7em)) /* TARGET_INT_SIMD */ +cpp_def_or_undef (in, "__ARM_FEATURE_SIMD32", target_32bit_p); + + /* TARGET_IDIV */ + cpp_def_or_undef (in, "__ARM_ARCH_EXT_IDIV__", + ((!thumb_p && arm_arch_arm_hwdiv) + || (thumb2_p && arm_arch_thumb_hwdiv))); + + cpp_def_or_undef (in, "__ARM_FEATURE_IDIV", + ((!thumb_p && arm_arch_arm_hwdiv) + || (thumb2_p && arm_arch_thumb_hwdiv))); + + if (arm_feature_ldrex) + cpp_define_formatted (in, "__ARM_FEATURE_LDREX=%d", arm_feature_ldrex); + else + cpp_undef (in, "__ARM_FEATURE_LDREX"); + + cpp_def_or_undef (in, "__ARM_FEATURE_CLZ", + ((TARGET_ARM_ARCH >= 5 && !thumb_p) || TARGET_ARM_ARCH_ISA_THUMB >=2)); + + cpp_def_or_undef (in, "__ARM_ASM_SYNTAX_UNIFIED__", inline_asm_unified); +} + diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm.h c/gcc/gcc/config/arm/arm.h --- b/gcc/gcc/config/arm/arm.h 2014-11-13 12:16:50.0 +0100 +++ c/gcc/gcc/config/arm/arm.h 2014-11-13 12:16:37.0 +0100 @@ -48,29 +48,12 @@ extern char arm_arch_name[]; #define TARGET_CPU_CPP_BUILTINS() \ do \ { \ - if (TARGET_DSP_MULTIPLY) \ - builtin_define ("__ARM_FEATURE_DSP"); \ -if (TARGET_ARM_QBIT)\ - builtin_define ("__ARM_FEATURE_QBIT"); \ -if (TARGET_ARM_SAT)\ - builtin_define ("__ARM_FEATURE_SAT"); \ if (TARGET_CRYPTO)\ builtin_define ("__ARM_FEATURE_CRYPTO"); \ if (unaligned_access)\ builtin_define ("__ARM_FEATURE_UNALIGNED"); \ if (TARGET_CRC32)\ builtin_define ("__ARM_FEATURE_CRC32"); \ - if (TARGET_32BIT)\ - builtin_define ("__ARM_32BIT_STATE"); \ - if (TARGET_ARM_FEATURE_LDREX)\ - builtin_define_with_int_value ( \ - "__ARM_FEATURE_LDREX", TARGET_ARM_FEATURE_LDREX); \ - if ((TARGET_ARM_ARCH >= 5 && !TARGET_THUMB) \ - || TARGET_ARM_ARCH_ISA_THUMB >=2) \ - builtin_define ("__ARM_FEATURE_CLZ"); \ - if (TARGET_INT_SIMD) \ - builtin_define ("__ARM_FEATURE_SIMD32"); \ -\ builtin_define_with_int_value (\ "__ARM_SIZEOF_MINIMAL_ENUM",\ flag_short_enums ? 1 : 4);\ @@ -89,10 +72,6 @@ extern char arm_arch_name[]; if (arm_arch_notm)\ builtin_define ("__ARM_ARCH_ISA_ARM"); \ builtin_define ("__APCS_32__"); \ - if (TARGET_THUMB)\ - builtin_define ("__thumb__"); \ - if (TARGET_THUMB2)\ - builtin_define ("__thumb2__"); \ if (TARGET_ARM_ARCH_ISA_THUMB
[PATCH, ARM] attribute target (thumb,arm) [3/6]
Re-implement ARM_DECLARE_FUNCTION_NAME as a function. That will make changed related to unified/divided and mode directives easier to insert. Thanks Christian 2014-09-23 Christian Bruel * config/arm/arm-protos.h (arm_declare_function_name): Declare. (is_called_in_ARM_mode): Remove. * config/arm/arm.c (is_called_in_ARM_mode): Declare static bool. (arm_declare_function_name): Moved from from ARM_DECLARE_FUNCTION_NAME. * config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Call arm_declare_function_name. diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.c d/gcc/gcc/config/arm/arm.c --- c/gcc/gcc/config/arm/arm.c 2014-11-18 08:52:48.0 +0100 +++ d/gcc/gcc/config/arm/arm.c 2014-11-18 08:51:50.0 +0100 @@ -26422,6 +26422,23 @@ thumb_pop (FILE *f, unsigned long mask) fprintf (f, "}\n"); } +/* Return nonzero if FUNC must be entered in ARM mode. */ +static bool +is_called_in_ARM_mode (tree func) +{ + gcc_assert (TREE_CODE (func) == FUNCTION_DECL); + + /* Ignore the problem about functions whose address is taken. */ + if (TARGET_CALLEE_INTERWORKING && TREE_PUBLIC (func)) +return true; + +#ifdef ARM_PE + return lookup_attribute ("interfacearm", DECL_ATTRIBUTES (func)) != NULL_TREE; +#else + return false; +#endif +} + /* Generate code to return from a thumb function. If 'reg_containing_return_addr' is -1, then the return address is actually on the stack, at the stack pointer. */ @@ -26857,22 +26874,6 @@ thumb_far_jump_used_p (void) return 0; } -/* Return nonzero if FUNC must be entered in ARM mode. */ -int -is_called_in_ARM_mode (tree func) -{ - gcc_assert (TREE_CODE (func) == FUNCTION_DECL); - - /* Ignore the problem about functions whose address is taken. */ - if (TARGET_CALLEE_INTERWORKING && TREE_PUBLIC (func)) -return TRUE; - -#ifdef ARM_PE - return lookup_attribute ("interfacearm", DECL_ATTRIBUTES (func)) != NULL_TREE; -#else - return FALSE; -#endif -} /* Given the stack offsets and register mask in OFFSETS, decide how many additional registers to push instead of subtracting a constant @@ -32386,6 +32387,25 @@ arm_is_constant_pool_ref (rtx x) && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0))); } +void +arm_declare_function_name (FILE *stream, const char *name, tree decl) +{ + if (TARGET_THUMB) +{ + if (is_called_in_ARM_mode (decl) + || (TARGET_THUMB1 && !TARGET_THUMB1_ONLY + && cfun->is_thunk)) + fprintf (stream, "\t.code 32\n"); + else if (TARGET_THUMB1) + fprintf (stream, "\t.code\t16\n\t.thumb_func\n"); + else + fprintf (stream, "\t.thumb\n\t.thumb_func\n"); +} + + if (TARGET_POKE_FUNCTION_NAME) +arm_poke_function_name (stream, (const char *) name); +} + /* If MEM is in the form of [base+offset], extract the two parts of address and set to BASE and OFFSET, otherwise return false after clearing BASE and OFFSET. */ @@ -32506,4 +32526,5 @@ arm_sched_fusion_priority (rtx_insn *ins *pri = tmp; return; } + #include "gt-arm.h" diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.h d/gcc/gcc/config/arm/arm.h --- c/gcc/gcc/config/arm/arm.h 2014-11-13 12:16:37.0 +0100 +++ d/gcc/gcc/config/arm/arm.h 2014-11-13 12:19:45.0 +0100 @@ -2184,23 +2184,7 @@ extern int making_const_table; ? 1 : 0) #define ARM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL) \ - do \ -{ \ - if (TARGET_THUMB) \ -{ \ - if (is_called_in_ARM_mode (DECL) \ - || (TARGET_THUMB1 && !TARGET_THUMB1_ONLY \ - && cfun->is_thunk)) \ -fprintf (STREAM, "\t.code 32\n") ; \ - else if (TARGET_THUMB1) \ - fprintf (STREAM, "\t.code\t16\n\t.thumb_func\n") ; \ - else \ - fprintf (STREAM, "\t.thumb\n\t.thumb_func\n") ; \ -} \ - if (TARGET_POKE_FUNCTION_NAME) \ -arm_poke_function_name (STREAM, (const char *) NAME); \ -} \ - while (0) + arm_declare_function_name ((STREAM), (NAME), (DECL)); /* For aliases of functions we use .thumb_set instead. */ #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL1, DECL2) \ diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm-protos.h d/gcc/gcc/config/arm/arm-protos.h --- c/gcc/gcc/config/arm/arm-protos.h 2014-11-13 11:16:17.0 +0100 +++ d/gcc/gcc/config/arm/arm-protos.h 2014-11-13 12:21:36.0 +0100 @@ -30,6 +30,7 @@ extern void arm_load_pic_register (unsig extern int arm_volatile_func (void); extern void arm_expand_prologue (void); extern void arm_expand_epilogue (bool); +extern void arm_declare_function_name (FILE *, const char *, tree); extern void thumb2_expand_return (bool); extern const char *arm_strip_name_encoding (const char *); extern void arm_asm_output_labelref (FILE *, const char *); @@ -178,9 +179,6 @@ extern const char *thumb1_unexpanded_epi exter
Re: Stream out default optimization nodes
On Tue, Nov 18, 2014 at 9:29 AM, Jan Hubicka wrote: >> On Tue, Nov 18, 2014 at 9:27 AM, Jan Hubicka wrote: >> >> https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00473.html >> >> >> >> /export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/xg++ >> >> -B/export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/ >> >> -B/usr/5.0.0/x86_64-unknown-linux-gnu/bin/ -nostdinc++ >> >> -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs >> >> -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs >> >> >> >> -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu >> >> >> >> -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include >> >> >> >> -I/export/gnu/import/git/gcc-test-profiled/src-trunk/libstdc++-v3/libsupc++ >> >> -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs >> >> -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs >> >> -g -O2 -flto=jobserver -frandom-seed=1 -fprofile-use -DIN_GCC >> >> -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall >> >> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute >> >> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros >> >> -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE >> >> -static-libstdc++ -static-libgcc -o build/genmatch \ >> >> build/genmatch.o ../libcpp/libcpp.a ../libiberty/libiberty.a >> >> build/errors.o build/vec.o build/hash-table.o >> >> .././libiberty/libiberty.a >> >> ../../src-trunk/libcpp/lex.c: In function âend_directiveâ: >> >> ../../src-trunk/libcpp/lex.c:442:43: error: >> >> â__builtin_ia32_pcmpestri128â needs isa option -m32 -msse4.2 >> >>index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); >> >>^ >> >> make[7]: *** [/tmp/ccTC6Hk9.ltrans9.ltrans.o] Error 1 >> > >> > Indeed, it looks like the bug is that search_line_sse42 gets inlined int >> > end_directive that is compiled w/o SSE support. Probably something that >> > happened previously, too, just led to compiling the function with >> > SSE4.2 >> > >> > I will need to setup -m32 LTO bootstrap enviornment... >> > >> >> This is -m64 LTO, not -m32. > OK then the message seems bogus, too. I will try to reproduce it. > I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63963 -- H.J.
[PATCH, ARM] attribute target (thumb,arm) [4/6]
Implements and document the hooks to support target_attributes. The emission of blx is handled directly for armv5 to overcome a bug with the current binutils that fails with calls to a static symbol in a different section. (e.g .text -> .text.startup) in different modes. (ref https://sourceware.org/bugzilla/show_bug.cgi?id=17505) Regtests included Thanks Christian 2014-09-23 Christian Bruel * config/arm/arm.opt (mthumb): Save. * config/arm/arm.h (arm_valid_target_attribute_tree): Declare. (arm_reset_previous_fndecl, arm_change_mode_p): Likewise. (SWITCHABLE_TARGET): Define. * config/arm/arm.c (arm_reset_previous_fndecl): New functions. (arm_valid_target_attribute_tree, arm_change_mode_p): Likewise. (arm_valid_target_attribute_p): Likewise. (arm_set_current_function, arm_can_inline_p): Likewise. (arm_valid_target_attribute_rec): Likewise. (arm_previous_fndecl): New variable. (TARGET_SET_CURRENT_FUNCTION, TARGET_OPTION_VALID_ATTRIBUTE_P): Define. (TARGET_CAN_INLINE_P): Define. (arm_asm_trampoline_template): Emit mode. (arm_file_start): Don't set unified syntax. (arm_declare_function_name): Set unified syntax and mode. (arm_option_override): Init target_option_default_node. and target_option_current_node. * config/arm/arm.md (*call_value_symbol): Set mode when possible. (*call_symbol): Likewise. * doc/extend.texi: Document ARM target and pragma attribute. * doc/invoke.texi: Likewise. diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN d/gcc/gcc/config/arm/arm.c e/gcc/gcc/config/arm/arm.c --- d/gcc/gcc/config/arm/arm.c 2014-11-18 08:51:50.0 +0100 +++ e/gcc/gcc/config/arm/arm.c 2014-11-18 09:05:57.0 +0100 @@ -80,6 +80,7 @@ #include "opts.h" #include "dumpfile.h" #include "gimple-expr.h" +#include "target-globals.h" #include "builtins.h" #include "tm-constrs.h" #include "rtl-iter.h" @@ -258,6 +259,9 @@ static tree arm_build_builtin_va_list (v static void arm_expand_builtin_va_start (tree, rtx); static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *); static void arm_option_override (void); +static void arm_set_current_function (tree); +static bool arm_can_inline_p (tree, tree); +static bool arm_valid_target_attribute_p (tree, tree, tree, int); static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode); static bool arm_cannot_copy_insn_p (rtx_insn *); static int arm_issue_rate (void); @@ -400,6 +404,9 @@ static const struct attribute_spec arm_a #undef TARGET_ASM_FUNCTION_EPILOGUE #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue +#undef TARGET_CAN_INLINE_P +#define TARGET_CAN_INLINE_P arm_can_inline_p + #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE arm_option_override @@ -412,6 +419,12 @@ static const struct attribute_spec arm_a #undef TARGET_SCHED_ADJUST_COST #define TARGET_SCHED_ADJUST_COST arm_adjust_cost +#undef TARGET_SET_CURRENT_FUNCTION +#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function + +#undef TARGET_OPTION_VALID_ATTRIBUTE_P +#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p + #undef TARGET_SCHED_REORDER #define TARGET_SCHED_REORDER arm_sched_reorder @@ -3205,6 +3218,11 @@ arm_option_override (void) /* Register global variables with the garbage collector. */ arm_add_gc_roots (); + + /* Save the initial options in case the user does function specific + options. */ + target_option_default_node = target_option_current_node += build_target_option_node (&global_options); } static void @@ -3358,13 +3376,20 @@ arm_warn_func_return (tree decl) static void arm_asm_trampoline_template (FILE *f) { + if (TARGET_UNIFIED_ASM) +fprintf (f, "\t.syntax unified\n"); + else +fprintf (f, "\t.syntax divided\n"); + if (TARGET_ARM) { + fprintf (f, "\t.arm\n"); asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", STATIC_CHAIN_REGNUM, PC_REGNUM); asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", PC_REGNUM, PC_REGNUM); } else if (TARGET_THUMB2) { + fprintf (f, "\t.thumb\n"); /* The Thumb-2 trampoline is similar to the arm implementation. Unlike 16-bit Thumb, we enter the stub in thumb mode. */ asm_fprintf (f, "\tldr.w\t%r, [%r, #4]\n", @@ -26874,6 +26899,23 @@ thumb_far_jump_used_p (void) return 0; } +/* Check that FUNC is called with a different mode. */ + +bool +arm_change_mode_p (tree func) +{ + if (TREE_CODE (func) != FUNCTION_DECL) +return false; + + tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (func); + + if (!callee_tree) +callee_tree = target_option_default_node; + + struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); + + return (TREE_TARGET_THUMB (callee_opts) != TARGET_THUMB); +} /* Given the stack offsets and register mask in OFFSETS, decide how many additional registers to push instead of subtracting a constant @@ -28448,9 +28490,6 @@ arm_file_start (void) { int val; - if (TARGET
[PATCH] Fix sanitizer/63690
As the testcase shows, we should really check first that we have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE. On an invalid testcase we might get e.g. STRING_CST. Bootstrapped/regtested on power8-linux, ok for trunk? 2014-11-19 Marek Polacek PR sanitizer/63690 * ubsan.c (instrument_object_size): Check for MEM_REF. * gcc.dg/ubsan/pr63690.c: New test. diff --git gcc/testsuite/gcc.dg/ubsan/pr63690.c gcc/testsuite/gcc.dg/ubsan/pr63690.c index e69de29..bcf520a 100644 --- gcc/testsuite/gcc.dg/ubsan/pr63690.c +++ gcc/testsuite/gcc.dg/ubsan/pr63690.c @@ -0,0 +1,9 @@ +/* PR sanitizer/63690 */ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=undefined" } */ + +void +foo (void) +{ + (*"c")++; +} diff --git gcc/ubsan.c gcc/ubsan.c index 7d1e341..ad5665f 100644 --- gcc/ubsan.c +++ gcc/ubsan.c @@ -1539,7 +1539,13 @@ instrument_object_size (gimple_stmt_iterator *gsi, bool is_lhs) return; bool decl_p = DECL_P (inner); - tree base = decl_p ? inner : TREE_OPERAND (inner, 0); + tree base; + if (decl_p) +base = inner; + else if (TREE_CODE (inner) == MEM_REF) +base = TREE_OPERAND (inner, 0); + else +return; tree ptr = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (t)), t); while (TREE_CODE (base) == SSA_NAME) Marek
[patch, asan, testsuite] Adjust asan expected backtrace
The attached patch fixes 23 asan testsuite failures on x86_64-apple-darwin14, where the backtrace features e.g. wrap_malloc instead of interceptor_malloc. So I adjusted the dg-output patterns to match. OK to commit to trunk? 2014-11-19 Francois-Xavier Coudert * c-c++-common/asan/heap-overflow-1.c: Ajust dg-output. * c-c++-common/asan/memcmp-1.c: Likewise. * c-c++-common/asan/strncpy-overflow-1.c: Likewise. * c-c++-common/asan/use-after-free-1.c: Likewise. asan.diff Description: Binary data
Re: [PATCH 17/21] PR jit/63854: Fix leaking vec in jit
On Wed, Nov 19, 2014 at 05:46:17AM -0500, David Malcolm wrote: > This fixes various leaks of vec buffers seen via valgrind within jit > (both recording and playback). > > Various vec<> within jit::recording are converted to auto_vec<>. > > Various playback::wrapper subclasses containing vec<> gain a finalizer > so they can release the vec when they are collected. It seems kind of tempting to just use a virtual dtor as the finalizer, but I'm not sure how to make that work nicely with your overriding of operator new. Trev > > gcc/jit/ChangeLog: > PR jit/63854 > * jit-playback.c (gcc::jit::playback::compound_type::set_fields): > Convert param from const vec & to > const auto_vec *. > (gcc::jit::playback::context::new_function_type): Convert param > "param_types" from vec * to const auto_vec *. > (gcc::jit::playback::context::new_function): Convert param > "params" from vec * to const auto_vec *. > (gcc::jit::playback::context::build_call): Convert param "args" > from vec to const auto_vec *. > (gcc::jit::playback::context::new_call): Likewise. > (gcc::jit::playback::context::new_call_through_ptr): Likewise. > (wrapper_finalizer): New function. > (gcc::jit::playback::wrapper::operator new): Call the finalizer > variant of ggc_internal_cleared_alloc, supplying > wrapper_finalizer. > (gcc::jit::playback::function::finalizer): New. > (gcc::jit::playback::block::finalizer): New. > (gcc::jit::playback::source_file::finalizer): New. > (gcc::jit::playback::source_line::finalizer): New. > > * jit-playback.h > (gcc::jit::playback::context::new_function_type): Convert param > "param_types" from vec * to const auto_vec *. > (gcc::jit::playback::context::new_function): Convert param > "params" from vec * to const auto_vec *. > (gcc::jit::playback::context::new_call): Convert param > "args" from vec to const auto_vec *. > (gcc::jit::playback::context::new_call_through_ptr): Likewise. > (gcc::jit::playback::context::build_call): Likewise. > (gcc::jit::playback::context): Convert fields "m_functions", > "m_source_files", "m_cached_locations" from vec to auto_vec. > (gcc::jit::playback::wrapper::finalizer): New virtual function. > (gcc::jit::playback::compound_type::set_fields): Convert param fro > const vec & to > const auto_vec *. > (gcc::jit::playback::function::finalizer): New. > (gcc::jit::playback::block::finalizer): New. > (gcc::jit::playback::source_file::finalizer): New. > (gcc::jit::playback::source_line::finalizer): New. > > * jit-recording.c > (gcc::jit::recording::function_type::replay_into): Convert local > from a vec into an auto_vec. > (gcc::jit::recording::fields::replay_into): Likewise. > (gcc::jit::recording::function::replay_into): Likewise. > (gcc::jit::recording::call::replay_into): Likewise. > (gcc::jit::recording::call_through_ptr::replay_into): Likewise. > > * jit-recording.h (gcc::jit::recording::context): Convert fields > "m_mementos", "m_compound_types", "m_functions" from vec<> to > auto_vec <>. > (gcc::jit::recording::function_type::get_param_types): Convert > return type from vec to const vec &. > (gcc::jit::recording::function_type): Convert field > "m_param_types" from a vec<> to an auto_vec<>. > (gcc::jit::recording::fields): Likewise for field "m_fields". > (gcc::jit::recording::function::get_params): Convert return type > from vec to const vec &. > (gcc::jit::recording::function): Convert fields "m_params", > "m_locals", "m_blocks" from vec<> to auto_vec<>. > (gcc::jit::recording::block): Likewise for field "m_statements". > vec<> to auto_vec<>. > (gcc::jit::recording::call): Likewise for field "m_args". > (gcc::jit::recording::call_through_ptr): Likewise. > --- > gcc/jit/jit-playback.c | 73 > + > gcc/jit/jit-playback.h | 27 -- > gcc/jit/jit-recording.c | 16 +-- > gcc/jit/jit-recording.h | 26 +- > 4 files changed, 100 insertions(+), 42 deletions(-) > > diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c > index 285a3ef..8fdfa29 100644 > --- a/gcc/jit/jit-playback.c > +++ b/gcc/jit/jit-playback.c > @@ -285,15 +285,15 @@ new_compound_type (location *loc, > } > > void > -playback::compound_type::set_fields (const vec &fields) > +playback::compound_type::set_fields (const auto_vec > *fields) > { >/* Compare with c/c-decl.c: finish_struct. */ >tree t = as_tree (); > >tree fieldlist = NULL; > - for (unsigned i = 0; i < fields.length (); i++) > + for (unsigned i = 0; i < fields->length (); i++) > { > - field *f = fields[i]; > + field *f = (*fields)[i]; >DECL_CONTEXT (f->as_t
[PATCH, ARM] attribute target (thumb,arm) [5/6]
Implements the hooks for #pragma GCC target A test included to check that macros were correctly defined/undefined on pragma regions. Thanks Christian 2014-09-23 Christian Bruel * config/arm/arm.h (REGISTER_TARGET_PRAGMAS): Call arm_register_target_pragmas. * config/arm/arm-protos.h (arm_register_target_pragmas): Declare. * config/arm/arm-c.c (arm_register_target_pragmas): New function. (arm_pragma_target_parse): Likewise. diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-c.c f/gcc/gcc/config/arm/arm-c.c --- e/gcc/gcc/config/arm/arm-c.c 2014-11-13 13:03:27.0 +0100 +++ f/gcc/gcc/config/arm/arm-c.c 2014-11-13 13:53:45.0 +0100 @@ -20,9 +20,12 @@ #include "system.h" #include "coretypes.h" #include "tm.h" -#include "tm_p.h" #include "tree.h" +#include "tm_p.h" #include "c-family/c-common.h" +#include "target.h" +#include "target-def.h" +#include "c-family/c-pragma.h" /* Output C specific EABI object attributes. These can not be done in arm.c because they require information from the C frontend. */ @@ -112,3 +115,78 @@ arm_cpp_builtins (struct cpp_reader *in, cpp_def_or_undef (in, "__ARM_ASM_SYNTAX_UNIFIED__", inline_asm_unified); } +/* Hook to validate the current #pragma GCC target and set the arch custom + mode state. If ARGS is NULL, then POP_TARGET is used to reset + the options. */ +static bool +arm_pragma_target_parse (tree args, tree pop_target) +{ + tree prev_tree = build_target_option_node (&global_options); + tree cur_tree; + struct cl_target_option *prev_opt; + struct cl_target_option *cur_opt; + bool cur_mode, prev_mode; + + if (! args) +{ + cur_tree = ((pop_target) ? pop_target : target_option_default_node); + cl_target_option_restore (&global_options, +TREE_TARGET_OPTION (cur_tree)); +} + else +{ + cur_tree = arm_valid_target_attribute_tree (args, &global_options); + if (cur_tree == NULL_TREE) + { + cl_target_option_restore (&global_options, +TREE_TARGET_OPTION (prev_tree)); + return false; + } +} + + target_option_current_node = cur_tree; + arm_reset_previous_fndecl (); + + /* Figure out the previous mode. */ + prev_opt = TREE_TARGET_OPTION (prev_tree); + cur_opt = TREE_TARGET_OPTION (cur_tree); + + gcc_assert (prev_opt); + gcc_assert (cur_opt); + + cur_mode = TARGET_THUMB_P (cur_opt->x_target_flags); + prev_mode = TARGET_THUMB_P (prev_opt->x_target_flags); + + if (prev_mode != cur_mode) +{ + /* For the definitions, ensure all newly defined macros are considered + as used for -Wunused-macros. There is no point warning about the + compiler predefined macros. */ + cpp_options *cpp_opts = cpp_get_options (parse_in); + unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros; + cpp_opts->warn_unused_macros = 0; + + /* Update macros. */ + arm_cpp_builtins (parse_in, cur_mode); + + cpp_opts->warn_unused_macros = saved_warn_unused_macros; +} + + return true; +} + +/* Register target pragmas. We need to add the hook for parsing #pragma GCC + option here rather than in arm.c since it will pull in various preprocessor + functions, and those are not present in languages like fortran without a + preprocessor. */ + +void +arm_register_target_pragmas (void) +{ + /* Update pragma hook to allow parsing #pragma GCC target. */ + targetm.target_option.pragma_parse = arm_pragma_target_parse; + +#ifdef REGISTER_SUBTARGET_PRAGMAS + REGISTER_SUBTARGET_PRAGMAS (); +#endif +} diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm.h f/gcc/gcc/config/arm/arm.h --- e/gcc/gcc/config/arm/arm.h 2014-11-13 13:48:59.0 +0100 +++ f/gcc/gcc/config/arm/arm.h 2014-11-13 13:48:01.0 +0100 @@ -2097,7 +2097,8 @@ extern int making_const_table; c_register_pragma (0, "long_calls", arm_pr_long_calls); \ c_register_pragma (0, "no_long_calls", arm_pr_no_long_calls); \ c_register_pragma (0, "long_calls_off", arm_pr_long_calls_off); \ - arm_lang_object_attributes_init(); \ + arm_lang_object_attributes_init(); \ + arm_register_target_pragmas(); \ } while (0) /* Condition code information. */ diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-protos.h f/gcc/gcc/config/arm/arm-protos.h --- e/gcc/gcc/config/arm/arm-protos.h 2014-11-13 13:07:47.0 +0100 +++ f/gcc/gcc/config/arm/arm-protos.h 2014-11-13 13:49:50.0 +0100 @@ -306,6 +306,7 @@ extern const char *arm_rewrite_selected_ /* Defined in gcc/common/config/arm-c.c. */ extern void arm_lang_object_attributes_init(void); +extern void arm_register_target_pragmas (void); extern void arm_cpp_builtins (struct cpp_reader *, bool); extern bool arm_is_constant_pool_ref (rtx); 2014-09-23 Christian Bruel * gcc.target/arm/pragma_attribute.c: N
[PATCH, ARM] attribute target (thumb,arm) [6/6]
Implement the -mflip-thump option. Undocumented for internal testing only. This option artificially inserts alternative attribute thumb/modes on functions. This close the patch set. Thanks for your review, Christian 2014-09-23 Christian Bruel * config/arm/arm.c (add_attribute, arm_insert_attributes): New functions (TARGET_INSERT_ATTRIBUTES): Define. (thumb_flipper): New var. * config/arm/arm.opt (-mflip-thumb): New switch. diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.c g/gcc/gcc/config/arm/arm.c --- f/gcc/gcc/config/arm/arm.c 2014-11-18 13:16:44.0 +0100 +++ g/gcc/gcc/config/arm/arm.c 2014-11-19 14:04:22.0 +0100 @@ -218,6 +218,7 @@ static void arm_encode_section_info (tre static void arm_file_end (void); static void arm_file_start (void); +static void arm_insert_attributes (tree, tree *); static void arm_setup_incoming_varargs (cumulative_args_t, machine_mode, tree, int *, int); @@ -370,6 +371,9 @@ static const struct attribute_spec arm_a #undef TARGET_ATTRIBUTE_TABLE #define TARGET_ATTRIBUTE_TABLE arm_attribute_table +#undef TARGET_INSERT_ATTRIBUTES +#define TARGET_INSERT_ATTRIBUTES arm_insert_attributes + #undef TARGET_ASM_FILE_START #define TARGET_ASM_FILE_START arm_file_start #undef TARGET_ASM_FILE_END @@ -29403,6 +29407,56 @@ arm_valid_target_attribute_tree (tree ar return t; } +/* True if -mflip-thumb should next add an attribute for the default + mode, false if it should next add an attribute for the opposite mode. */ +static GTY(()) bool thumb_flipper = TARGET_THUMB; + +static void +add_attribute (const char * mode, tree *attributes) +{ + size_t len = strlen (mode); + tree value = build_string (len, mode); + + TREE_TYPE (value) = build_array_type (char_type_node, + build_index_type (size_int (len))); + + *attributes = tree_cons (get_identifier ("target"), + build_tree_list (NULL_TREE, value), + *attributes); +} + +/* For testing. Insert thumb or arm modes alternatively on functions. */ + +static void +arm_insert_attributes (tree fndecl, tree * attributes) +{ + const char *mode; + + if (TREE_CODE (fndecl) != FUNCTION_DECL || DECL_EXTERNAL(fndecl) + || DECL_BUILT_IN (fndecl) || DECL_ARTIFICIAL (fndecl)) + return; + + /* Nested definitions must inherit mode. */ + if (current_function_decl) + { + mode = TARGET_THUMB ? "thumb" : "arm"; + add_attribute (mode, attributes); + return; + } + + if (! TARGET_FLIP_THUMB) +return; + + /* If there is already a setting don't change it. */ + if (lookup_attribute ("target", *attributes) != NULL) +return; + + mode = thumb_flipper ? "thumb" : "arm"; + add_attribute (mode, attributes); + + thumb_flipper = !thumb_flipper; +} + /* Hook to validate attribute((target("string"))). */ static bool diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.opt g/gcc/gcc/config/arm/arm.opt --- f/gcc/gcc/config/arm/arm.opt 2014-11-13 14:05:34.0 +0100 +++ g/gcc/gcc/config/arm/arm.opt 2014-11-19 13:59:46.0 +0100 @@ -122,6 +122,10 @@ Enum(float_abi_type) String(softfp) Valu EnumValue Enum(float_abi_type) String(hard) Value(ARM_FLOAT_ABI_HARD) +mflip-thumb +Target Report Var(TARGET_FLIP_THUMB) +Switch ARM/Thumb modes on alternating functions for compiler testing + mfp16-format= Target RejectNegative Joined Enum(arm_fp16_format_type) Var(arm_fp16_format) Init(ARM_FP16_FORMAT_NONE) Specify the __fp16 floating-point format
Re: [PATCH] Fix sanitizer/63690
On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote: > As the testcase shows, we should really check first that we > have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE. > On an invalid testcase we might get e.g. STRING_CST. Well, addr_object_size should handle STRING_CSTs just fine, so why not handle STRING_CSTs in there too (similarly to decl_p = true case). Supposedly you want decl_p = true for them and need to tweak if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base)) but otherwise it should work. On the other side, int foo (int x) { return "foobar"[x]; } is already handled by -fsanitize=bounds and for some reason gimple_assign_load_p doesn't consider handled components with STRING_CST as base as loads so for -fsanitize=object-size we aren't called for that. Jakub
Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library
On 13 Nov 20:56, Joseph Myers wrote: > On Thu, 13 Nov 2014, Ilya Enkovich wrote: > > > > You can leave it as a single library - it's just that imposes libgcc-like > > > constraints on what the library does and how it does things, so as to be > > > usable for arbitrary programs built with MPX (e.g. using > > > reserved-namespace names such as __write when available - direct syscalls > > > may also be a possibility in some cases). > > > > I really don't want to make it libgcc-like and complicate its further > > development. What is the reason for that? Is it because library is > > linked by -mmpx? Can it be avoided if library is linked only at > > '-fcheck-pointer-bounds -mmpx' combination? If single -mmpx is used > > What's -fcheck-pointer-bounds? I don't see any documentation of it in > invoke.texi. Any options intended for users to use need documenting there > - did a documentation patch get missed out from the series of patches > posted / committed? > > Documentation might help understand the intent of the feature. I think > there are two natural possibilities: > > * Use as a lightweight hardening feature enabled by default across a whole > distribution (or for a subset of the distribution considered > security-critical), by people who don't know the detailed requirements of > individual programs built with the option. In that case there are > libgcc-like requirements (as another issue there, it might be problematic > to cause arbitrary previously unthreaded programs to link with libpthread; > at least, performance issues seem possible). Outputting reports to files > determined by environment variables is probably not a sensible approach in > this case. You might want reports to syslog, or might want MPX-detected > issues to be intercepted by crash reporters such as apport so they can > collect relevant information and offer to report it to the distributor. > > * Use as a debugging tool, by people who understand the requirements of > their application / library and can tell whether the things the MPX > library is documented as doing might be a problem in their case. For > this, libgcc-like requirements don't apply and output to files seems more > sensible. I think if we compare runtime libraries for these two usage models then we don't find much in common. Thus it should be two independent runtime libraries. This posted library corresponds to the second scenario and is to be used mainly for debugging. > > If it's intended only as the latter - and this is documented - then you > don't have the libgcc-like requirements, and there's no point in having > multiple libraries. If it's intended for both, that points the way to > separate libraries (where the debugging-tool library would be used in that > use case, but the crash-reporter-interception case would expect programs > to have been linked with only the minimal library). Would it be enough to mention implicit link and resulting pthread dependency in -fcheck-pointer-bounds description in invoke.texi? > > -- > Joseph S. Myers > jos...@codesourcery.com Here is an updated version. I removed printf usage in a signal handler, added symbol versioning, added thread safety comments for static vars, added static link support and -static-libmpx option (will update corresponding documentation patch). Does it look OK? Thanks, Ilya -- 2014-11-19 Ilya Enkovich * Makefile.def: Add libmpx. * configure.ac: Add libmpx. * Makefile.in: Regenerate. * configure: Regenerate. gcc/ 2014-11-19 Ilya Enkovich * gcc.c (LIBMPX_LIBS): New. (LIBMPX_SPEC): New. (MPX_SPEC): New. (LINK_COMMAND_SPEC): Add MPX_SPEC. * c-family/c.opt (static-libmpx): New. libmpx/ 2014-11-19 Ilya Enkovich Initial commit. diff --git a/Makefile.def b/Makefile.def index 40bbca9..4a535d2 100644 --- a/Makefile.def +++ b/Makefile.def @@ -128,6 +128,9 @@ target_modules = { module= libsanitizer; bootstrap=true; lib_path=.libs; raw_cxx=true; }; +target_modules = { module= libmpx; + bootstrap=true; + lib_path=.libs; }; target_modules = { module= libvtv; bootstrap=true; lib_path=.libs; diff --git a/configure.ac b/configure.ac index b27fb1d..ccb119b 100644 --- a/configure.ac +++ b/configure.ac @@ -162,6 +162,7 @@ target_libraries="target-libgcc \ target-libstdc++-v3 \ target-libsanitizer \ target-libvtv \ + target-libmpx \ target-libssp \ target-libquadmath \ target-libgfortran \ @@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then fi fi + +# Disable libmpx on unsupported systems. +if test -d ${srcdir}/libmpx; then +if test x$enable_libmpx = x; then + AC_MSG_CHECKING([for libmpx support]) + if (srcdir=${
Re: [PATCH] Fix sanitizer/63690
On Wed, Nov 19, 2014 at 03:12:05PM +0100, Jakub Jelinek wrote: > On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote: > > As the testcase shows, we should really check first that we > > have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE. > > On an invalid testcase we might get e.g. STRING_CST. > > Well, addr_object_size should handle STRING_CSTs just fine, so > why not handle STRING_CSTs in there too (similarly to decl_p = true > case). > Supposedly you want decl_p = true for them and need to tweak > if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base)) > but otherwise it should work. > On the other side, > int > foo (int x) > { > return "foobar"[x]; > } > is already handled by -fsanitize=bounds and for some reason > gimple_assign_load_p doesn't consider handled components with > STRING_CST as base as loads so for -fsanitize=object-size > we aren't called for that. The patch is ok as is. Jakub
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
> Yes, removing the second NOP is worth the additional effort. The updated patch is below. Ok to commit? Regards, Andrew diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 02268f3..368c6f0 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands) This will sometimes be a delayed branch; see the write code below for details. */ mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, mem, NULL); - mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL); + + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot + of the branch if it is a branch likely because they will not execute when + the atomic operation is successful, so place a NOP in there instead. */ + + mips_multi_add_insn ("beq%?\t%0,%.,1b%~", at, NULL); /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != newval) @@ -13005,7 +13010,7 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands) mips_multi_copy_insn (tmp3_insn); mips_multi_set_operand (mips_multi_last_index (), 0, newval); } - else if (!(required_oldval && cmp)) + else if (!(required_oldval && cmp) && !mips_branch_likely) mips_multi_add_insn ("nop", NULL); /* CMP = 1 -- either standalone or in a delay slot. */ @@ -13029,12 +13034,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands) const char * mips_output_sync_loop (rtx_insn *insn, rtx *operands) { - mips_process_sync_loop (insn, operands); - /* Use branch-likely instructions to work around the LL/SC R1 errata. */ mips_branch_likely = TARGET_FIX_R1; + mips_process_sync_loop (insn, operands); + mips_push_asm_switch (&mips_noreorder); mips_push_asm_switch (&mips_nomacro); mips_push_asm_switch (&mips_noat); @@ -13056,6 +13061,9 @@ mips_output_sync_loop (rtx_insn *insn, rtx *operands) unsigned int mips_sync_loop_insns (rtx_insn *insn, rtx *operands) { + /* Use branch-likely instructions to work around the LL/SC R1 + errata. */ + mips_branch_likely = TARGET_FIX_R1; mips_process_sync_loop (insn, operands); return mips_multi_num_insns; }
Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]
On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel wrote: > I think I missed the stage3, Anyway would it be OK for stage1 when it > reopens ? Since you submitted this well during stage1 and given that these patches address comments from earlier in the review process we should aim to get these in for 5.0. If at some point during the review process it looks risky and there needs to be significant rework we can always stop. It's on the list of patches to be reviewed and I will find some dedicated time later this week to set down and review / play with the patches in an attempt to move this forward as it is a reasonably large chunk of work. Thanks for continuing to work on these patches and addressing the earlier review comments. regards Ramana > > Christian >
[PATCH] gimple_{build_assign,assign_set_rhs}_with_ops* cleanup
Hi! This patch: 1) adds unary op overload to gimple_build_assign_with_ops so that many callers don't need to pass NULL_TREE as the last argument explicitly 2) adds unary op overload to gimple_assign_set_rhs_with_ops for similar reasons 3) renames gimple_assign_set_rhs_with_ops_1 to gimple_assign_set_rhs_with_ops so it becomes ternary op overload to the existing binary op one and 2) added unary op and adjusts all users that I found. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-11-19 Jakub Jelinek * gimple.h (gimple_build_assign_with_ops): Add unary arg overload. (gimple_assign_set_rhs_with_ops_1): Renamed to ... (gimple_assign_set_rhs_with_ops): ... this. Adjust binary arg inline overload to use it. Add unary arg overload. * gimple.c (gimple_build_assign_with_ops): New unary arg overload. (gimple_assign_set_rhs_from_tree): Use gimple_assign_set_rhs_with_ops instead of gimple_assign_set_rhs_with_ops_1. (gimple_assign_set_rhs_with_ops_1): Renamed to ... (gimple_assign_set_rhs_with_ops): ... this. * ipa-split.c (split_function): Remove last NULL argument from gimple_build_assign_with_ops call. * tree-ssa-loop-im.c (move_computations_dom_walker::before_dom_children): Likewise. * tsan.c (instrument_builtin_call): Likewise. * tree-vect-stmts.c (vect_init_vector, vectorizable_mask_load_store, vectorizable_conversion, vectorizable_load): Likewise. * tree-vect-loop.c (vect_is_simple_reduction_1, get_initial_def_for_induction): Likewise. * tree-loop-distribution.c (generate_memset_builtin): Likewise. * tree-vect-patterns.c (vect_handle_widen_op_by_const, vect_recog_widen_mult_pattern, vect_operation_fits_smaller_type, vect_recog_over_widening_pattern, vect_recog_rotate_pattern, vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern, vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast, adjust_bool_pattern, vect_recog_bool_pattern): Likewise. * tree-ssa-phiopt.c (conditional_replacement, abs_replacement, neg_replacement): Likewise. * asan.c (build_shadow_mem_access, maybe_create_ssa_name, maybe_cast_to_ptrmode, asan_expand_check_ifn): Likewise. * tree-vect-slp.c (vect_get_constant_vectors): Likewise. * omp-low.c (lower_rec_input_clauses, expand_omp_for_generic, expand_omp_for_static_nochunk, expand_omp_for_static_chunk, simd_clone_adjust): Likewise. * tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Likewise. * gimple-ssa-strength-reduction.c (introduce_cast_before_cand, replace_one_candidate): Likewise. * gimple-builder.c (build_type_cast): Likewise. * tree-ssa-forwprop.c (simplify_rotate): Likewise. (forward_propagate_addr_expr_1): Remove last NULL argument from gimple_assign_set_rhs_with_ops call. (simplify_vector_constructor): Use gimple_assign_set_rhs_with_ops instead of gimple_assign_set_rhs_with_ops_1. * tree-ssa-reassoc.c (maybe_optimize_range_tests): Remove last NULL argument from gimple_build_assign_with_ops call. (repropagate_negates): Remove last NULL argument from gimple_assign_set_rhs_with_ops call. * ubsan.c (ubsan_expand_null_ifn, ubsan_expand_objsize_ifn): Remove last NULL argument from gimple_build_assign_with_ops call. (instrument_bool_enum_load): Likewise. Remove last NULL argument from gimple_assign_set_rhs_with_ops call. * tree-ssa-math-opts.c (build_and_insert_cast, convert_mult_to_fma): Remove last NULL argument from gimple_build_assign_with_ops call. (bswap_replace): Likewise. Use gimple_assign_set_rhs_with_ops instead of gimple_assign_set_rhs_with_ops_1. (convert_plusminus_to_widen): Use gimple_assign_set_rhs_with_ops instead of gimple_assign_set_rhs_with_ops_1. * gimple-fold.c (replace_stmt_with_simplification): Likewise. (rewrite_to_defined_overflow, gimple_build): Remove last NULL argument from gimple_build_assign_with_ops call. * tree-ssa-strlen.c (handle_pointer_plus): Remove last NULL argument from gimple_assign_set_rhs_with_ops call. * tree-vrp.c (simplify_truth_ops_using_ranges, simplify_bit_ops_using_ranges): Remove last NULL argument from gimple_assign_set_rhs_with_ops call. (simplify_float_conversion_using_ranges, simplify_internal_call_using_ranges): Remove last NULL argument from gimple_build_assign_with_ops call. --- gcc/gimple.h.jj 2014-11-19 10:45:17.780769013 +0100 +++ gcc/gimple.h2014-11-19 10:52:52.118574835 +0100 @@ -1180,6 +1180,8 @@ gimple gimple_build_assign_with_ops (enu tree, tree, tree CXX_MEM_STAT_IN
Re: [PATCH] Fix sanitizer/63690
On Wed, Nov 19, 2014 at 03:12:05PM +0100, Jakub Jelinek wrote: > On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote: > > As the testcase shows, we should really check first that we > > have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE. > > On an invalid testcase we might get e.g. STRING_CST. > > Well, addr_object_size should handle STRING_CSTs just fine, so > why not handle STRING_CSTs in there too (similarly to decl_p = true > case). > Supposedly you want decl_p = true for them and need to tweak > if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base)) > but otherwise it should work. > On the other side, > int > foo (int x) > { > return "foobar"[x]; > } > is already handled by -fsanitize=bounds and for some reason Right - I've checked that this is the case. I could mention that in the original submission. > gimple_assign_load_p doesn't consider handled components with > STRING_CST as base as loads so for -fsanitize=object-size > we aren't called for that. Thanks for review, Marek
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index > 02268f3..368c6f0 100644 > --- a/gcc/config/mips/mips.c > +++ b/gcc/config/mips/mips.c > @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx > *operands) > This will sometimes be a delayed branch; see the write code below > for details. */ >mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, > mem, NULL); > - mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL); > + > + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the > delay slot > + of the branch if it is a branch likely because they will not > execute when > + the atomic operation is successful, so place a NOP in there > + instead. */ Please rephrase the comment along the lines of my previous suggestion. This wording is too complex IMO. Matthew
RE: [PATCH][AArch64] Adjust generic move costs
Hi Jiong, Can you commit this please? 2014-11-19 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (generic_regmove_cost): Increase FP move cost (PR61915). --- gcc/config/aarch64/aarch64.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index de53c94..cd30724 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -218,8 +218,10 @@ __extension__ static const struct cpu_regmove_cost generic_regmove_cost = { NAMED_PARAM (GP2GP, 1), - NAMED_PARAM (GP2FP, 2), - NAMED_PARAM (FP2GP, 2), + /* Avoid the use of slow int<->fp moves for spilling by setting + their cost higher than memmov_cost. */ + NAMED_PARAM (GP2FP, 5), + NAMED_PARAM (FP2GP, 5), NAMED_PARAM (FP2FP, 2) }; -- 1.9.1
[patch, committed]
Committed as pre-approved by Jakub in PR62132. It took me three commits (revisions ) to get it right. I apologize for that, and I’ll stop coding for the day :( 2014-11-19 Francois-Xavier Coudert PR sanitizer/62132 * c-c++-common/asan/misalign-1.c: Pass -fno-omit-frame-pointer on darwin, adjust dg-output. * c-c++-common/asan/misalign-2.c: Likewise. Index: c-c++-common/asan/misalign-1.c === --- c-c++-common/asan/misalign-1.c (revision 217694) +++ c-c++-common/asan/misalign-1.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run { target { ilp32 || lp64 } } } */ /* { dg-options "-O2" } */ +/* { dg-additional-options "-fno-omit-frame-pointer" { target *-*-darwin* } } */ /* { dg-shouldfail "asan" } */ struct S { int i; } __attribute__ ((packed)); @@ -38,5 +39,5 @@ main () /* { dg-output "ERROR: AddressSanitizer:\[^\n\r]*on address\[^\n\r]*" } */ /* { dg-output "0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r)" } */ /* { dg-output "\[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread T0\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#0 0x\[0-9a-f\]+ (in _*foo(\[^\n\r]*misalign-1.c:10|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*misalign-1.c:34|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */ +/* { dg-output "#0 0x\[0-9a-f\]+ (in _*foo(\[^\n\r]*misalign-1.c:1\[01]|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*misalign-1.c:3\[45]|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */ Index: c-c++-common/asan/misalign-2.c === --- c-c++-common/asan/misalign-2.c (revision 217694) +++ c-c++-common/asan/misalign-2.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run { target { ilp32 || lp64 } } } */ /* { dg-options "-O2" } */ +/* { dg-additional-options "-fno-omit-frame-pointer" { target *-*-darwin* } } */ /* { dg-shouldfail "asan" } */ struct S { int i; } __attribute__ ((packed)); @@ -38,5 +39,5 @@ main () /* { dg-output "ERROR: AddressSanitizer:\[^\n\r]*on address\[^\n\r]*" } */ /* { dg-output "0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r)" } */ /* { dg-output "\[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread T0\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#0 0x\[0-9a-f\]+ (in _*baz(\[^\n\r]*misalign-2.c:22|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*misalign-2.c:34|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */ +/* { dg-output "#0 0x\[0-9a-f\]+ (in _*baz(\[^\n\r]*misalign-2.c:2\[23]|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*misalign-2.c:3\[45]|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */
Re: [PATCH] gimple_{build_assign,assign_set_rhs}_with_ops* cleanup
On Wed, 19 Nov 2014, Jakub Jelinek wrote: > Hi! > > This patch: > 1) adds unary op overload to gimple_build_assign_with_ops >so that many callers don't need to pass NULL_TREE as the last >argument explicitly > 2) adds unary op overload to gimple_assign_set_rhs_with_ops for similar >reasons > 3) renames gimple_assign_set_rhs_with_ops_1 to >gimple_assign_set_rhs_with_ops so it becomes ternary op overload >to the existing binary op one and 2) added unary op > and adjusts all users that I found. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2014-11-19 Jakub Jelinek > > * gimple.h (gimple_build_assign_with_ops): Add unary arg overload. > (gimple_assign_set_rhs_with_ops_1): Renamed to ... > (gimple_assign_set_rhs_with_ops): ... this. Adjust binary arg > inline overload to use it. Add unary arg overload. > * gimple.c (gimple_build_assign_with_ops): New unary arg overload. > (gimple_assign_set_rhs_from_tree): Use > gimple_assign_set_rhs_with_ops instead of > gimple_assign_set_rhs_with_ops_1. > (gimple_assign_set_rhs_with_ops_1): Renamed to ... > (gimple_assign_set_rhs_with_ops): ... this. > * ipa-split.c (split_function): Remove last NULL argument > from gimple_build_assign_with_ops call. > * tree-ssa-loop-im.c > (move_computations_dom_walker::before_dom_children): Likewise. > * tsan.c (instrument_builtin_call): Likewise. > * tree-vect-stmts.c (vect_init_vector, vectorizable_mask_load_store, > vectorizable_conversion, vectorizable_load): Likewise. > * tree-vect-loop.c (vect_is_simple_reduction_1, > get_initial_def_for_induction): Likewise. > * tree-loop-distribution.c (generate_memset_builtin): Likewise. > * tree-vect-patterns.c (vect_handle_widen_op_by_const, > vect_recog_widen_mult_pattern, vect_operation_fits_smaller_type, > vect_recog_over_widening_pattern, vect_recog_rotate_pattern, > vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern, > vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast, > adjust_bool_pattern, vect_recog_bool_pattern): Likewise. > * tree-ssa-phiopt.c (conditional_replacement, abs_replacement, > neg_replacement): Likewise. > * asan.c (build_shadow_mem_access, maybe_create_ssa_name, > maybe_cast_to_ptrmode, asan_expand_check_ifn): Likewise. > * tree-vect-slp.c (vect_get_constant_vectors): Likewise. > * omp-low.c (lower_rec_input_clauses, expand_omp_for_generic, > expand_omp_for_static_nochunk, expand_omp_for_static_chunk, > simd_clone_adjust): Likewise. > * tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Likewise. > * gimple-ssa-strength-reduction.c (introduce_cast_before_cand, > replace_one_candidate): Likewise. > * gimple-builder.c (build_type_cast): Likewise. > * tree-ssa-forwprop.c (simplify_rotate): Likewise. > (forward_propagate_addr_expr_1): Remove last NULL argument > from gimple_assign_set_rhs_with_ops call. > (simplify_vector_constructor): Use gimple_assign_set_rhs_with_ops > instead of gimple_assign_set_rhs_with_ops_1. > * tree-ssa-reassoc.c (maybe_optimize_range_tests): Remove last NULL > argument from gimple_build_assign_with_ops call. > (repropagate_negates): Remove last NULL argument from > gimple_assign_set_rhs_with_ops call. > * ubsan.c (ubsan_expand_null_ifn, ubsan_expand_objsize_ifn): Remove > last NULL argument from gimple_build_assign_with_ops call. > (instrument_bool_enum_load): Likewise. Remove last NULL argument > from gimple_assign_set_rhs_with_ops call. > * tree-ssa-math-opts.c (build_and_insert_cast, convert_mult_to_fma): > Remove last NULL argument from gimple_build_assign_with_ops call. > (bswap_replace): Likewise. Use gimple_assign_set_rhs_with_ops instead > of gimple_assign_set_rhs_with_ops_1. > (convert_plusminus_to_widen): Use gimple_assign_set_rhs_with_ops > instead of gimple_assign_set_rhs_with_ops_1. > * gimple-fold.c (replace_stmt_with_simplification): Likewise. > (rewrite_to_defined_overflow, gimple_build): Remove last NULL argument > from gimple_build_assign_with_ops call. > * tree-ssa-strlen.c (handle_pointer_plus): Remove last NULL argument > from gimple_assign_set_rhs_with_ops call. > * tree-vrp.c (simplify_truth_ops_using_ranges, > simplify_bit_ops_using_ranges): Remove last NULL argument from > gimple_assign_set_rhs_with_ops call. > (simplify_float_conversion_using_ranges, > simplify_internal_call_using_ranges): Remove last NULL argument from > gimple_build_assign_with_ops call. > > --- gcc/gimple.h.jj 2014-11-19 10:45:17.780769013 +0100 > +++ gcc/gimple.h 2014-11-19 10:52:52.118574835 +0100 > @@ -1180,6 +1180,8 @@ gimple g
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
On Wed, 19 Nov 2014, Matthew Fortune wrote: > > @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx > > *operands) > > This will sometimes be a delayed branch; see the write code below > > for details. */ > >mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, > > mem, NULL); > > - mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL); > > + > > + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the > > delay slot > > + of the branch if it is a branch likely because they will not > > execute when > > + the atomic operation is successful, so place a NOP in there > > + instead. */ > > Please rephrase the comment along the lines of my previous suggestion. > This wording is too complex IMO. Also I'm not sure if it's an enforced rule for GCC sources (it is for GDB for example; someone please chime in if I'm missing something here), but can you take the opportunity and limit the span of these comment lines a bit, like to 74 or maybe even 72 columns, similarly to the rule for ChangeLog entries, to make them more readable. Thanks. Maciej
Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)
On Thu, Nov 13, 2014 at 7:57 PM, Ulrich Weigand wrote: > Richard Henderson wrote: >> On 11/12/2014 09:41 PM, Ulrich Weigand wrote: >> > * optabs.c (prepare_operand): Gracefully fail if the mode of X >> > does not match the operand mode expected by the insn pattern. >> >> This is ok. > > I've checked this in now, thanks. > >> I wondered whether s390 benefit from following i386 in allowing >> mov<>cc to accept immediate operands pre-reload. This could then >> be re-used by cstorecc4 to allow any CC mode to expand to LOCR >> when available. > > Well, we already compile > > int test (long x, long y) > { > return x < y; > } > > to > > cgr %r2,%r3 # 44*cmpdi_ccs/1[length = 4] > lhi %r1,0 # 43*movsi_zarch/1 [length = 4] > lhi %r2,1 # 7 *movsi_zarch/1 [length = 4] > locrnl %r2,%r1 # 45*movsicc/2 [length = 4] > lgfr%r2,%r2 # 17*extendsidi2/1 [length = 4] > br %r14# 50return [length = 2] > > with -march=z196 or higher. The LOCR is not introduced via > cstorecc4, but instead directly by noce_emit_cmove via > emit_conditional_move. It doesn't seem to matter that the > patterns don't accept immediate operands; those are loaded > into registers by maybe_legitimize_operand, which already > tries a copy_to_mode_reg. This probably caused bootstrap on s390x-linux to fail as in PR63952 (last checked with rev. 217714). Richard. > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU/Linux compilers and toolchain > ulrich.weig...@de.ibm.com >
Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]
On 11/19/2014 03:18 PM, Ramana Radhakrishnan wrote: On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel wrote: I think I missed the stage3, Anyway would it be OK for stage1 when it reopens ? Since you submitted this well during stage1 and given that these patches address comments from earlier in the review process we should aim to get these in for 5.0. If at some point during the review process it looks risky and there needs to be significant rework we can always stop. It's on the list of patches to be reviewed and I will find some dedicated time later this week to set down and review / play with the patches in an attempt to move this forward as it is a reasonably large chunk of work. Thanks, also I forgot to mention that you need https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01231.html to play with the attribute. Will be part of the same shot. Christian Thanks for continuing to work on these patches and addressing the earlier review comments. regards Ramana Christian
[COMMITTED 1/3] Make TARGET_STATIC_CHAIN allow a function type
As opposed to always being a decl. This is a prerequisite to allowing the static chain to be loaded for indirect calls. * targhooks.c (default_static_chain): Remove check for DECL_STATIC_CHAIN. * config/moxie/moxie.c (moxie_static_chain): Likewise. * config/i386/i386.c (ix86_static_chain): Allow decl or type as the first argument. * config/xtensa/xtensa.c (xtensa_static_chain): Change the name of the unused first parameter. * doc/tm.texi (TARGET_STATIC_CHAIN): Document the first parameter may be a type. * target.def (static_chain): Likewise. --- gcc/ChangeLog | 13 + gcc/config/i386/i386.c | 19 +-- gcc/config/moxie/moxie.c | 5 + gcc/config/xtensa/xtensa.c | 2 +- gcc/doc/tm.texi| 2 +- gcc/target.def | 6 +++--- gcc/targhooks.c| 5 + 7 files changed, 33 insertions(+), 19 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3166e03..3b41de2 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -27356,13 +27356,10 @@ ix86_minimum_alignment (tree exp, machine_mode mode, This is a register, unless all free registers are used by arguments. */ static rtx -ix86_static_chain (const_tree fndecl, bool incoming_p) +ix86_static_chain (const_tree fndecl_or_type, bool incoming_p) { unsigned regno; - if (!DECL_STATIC_CHAIN (fndecl)) -return NULL; - if (TARGET_64BIT) { /* We always use R10 in 64-bit mode. */ @@ -27370,13 +27367,23 @@ ix86_static_chain (const_tree fndecl, bool incoming_p) } else { - tree fntype; + const_tree fntype, fndecl; unsigned int ccvt; /* By default in 32-bit mode we use ECX to pass the static chain. */ regno = CX_REG; - fntype = TREE_TYPE (fndecl); + if (TREE_CODE (fndecl_or_type) == FUNCTION_DECL) + { + fntype = TREE_TYPE (fndecl_or_type); + fndecl = fndecl_or_type; + } + else + { + fntype = fndecl_or_type; + fndecl = NULL; + } + ccvt = ix86_get_callcvt (fntype); if ((ccvt & IX86_CALLCVT_FASTCALL) != 0) { diff --git a/gcc/config/moxie/moxie.c b/gcc/config/moxie/moxie.c index d4688d9..148d26b 100644 --- a/gcc/config/moxie/moxie.c +++ b/gcc/config/moxie/moxie.c @@ -528,13 +528,10 @@ moxie_arg_partial_bytes (cumulative_args_t cum_v, /* Worker function for TARGET_STATIC_CHAIN. */ static rtx -moxie_static_chain (const_tree fndecl, bool incoming_p) +moxie_static_chain (const_tree ARG_UNUSED (fndecl_or_type), bool incoming_p) { rtx addr, mem; - if (!DECL_STATIC_CHAIN (fndecl)) -return NULL; - if (incoming_p) addr = plus_constant (Pmode, arg_pointer_rtx, 2 * UNITS_PER_WORD); else diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c index 159a1a7..a0025a5 100644 --- a/gcc/config/xtensa/xtensa.c +++ b/gcc/config/xtensa/xtensa.c @@ -3626,7 +3626,7 @@ xtensa_function_value_regno_p (const unsigned int regno) expressions that denote where they are stored. */ static rtx -xtensa_static_chain (const_tree ARG_UNUSED (fndecl), bool incoming_p) +xtensa_static_chain (const_tree ARG_UNUSED (fndecl_or_type), bool incoming_p) { rtx base = incoming_p ? arg_pointer_rtx : stack_pointer_rtx; return gen_frame_mem (Pmode, plus_constant (Pmode, base, diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 0d3a9fd..5b9da47 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -3462,7 +3462,7 @@ If the static chain is passed in memory, these macros should not be defined; instead, the @code{TARGET_STATIC_CHAIN} hook should be used. @end defmac -@deftypefn {Target Hook} rtx TARGET_STATIC_CHAIN (const_tree @var{fndecl}, bool @var{incoming_p}) +@deftypefn {Target Hook} rtx TARGET_STATIC_CHAIN (const_tree @var{fndecl_or_type}, bool @var{incoming_p}) This hook replaces the use of @code{STATIC_CHAIN_REGNUM} et al for targets that may use different static chain locations for different nested functions. This may be required if the target has function diff --git a/gcc/target.def b/gcc/target.def index bc5160d..dc48ae6 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4516,8 +4516,8 @@ false for naked functions. The default implementation always returns true.", bool, (void), hook_bool_void_true) -/* Return an rtx for the static chain for FNDECL. If INCOMING_P is true, - then it should be for the callee; otherwise for the caller. */ +/* Return an rtx for the static chain for FNDECL_OR_TYPE. If INCOMING_P + is true, then it should be for the callee; otherwise for the caller. */ DEFHOOK (static_chain, "This hook replaces the use of @code{STATIC_CHAIN_REGNUM} et al for\n\ @@ -4539,7 +4539,7 @@ will be at an offset from the frame pointer.\n\ The variables @code{stack_pointer_rtx}, @code{frame_pointer_rtx}, and\n\ @code{arg_pointer_rtx} will have been initiali
[COMMITTED 0/3] Allow front end use of the static chain
This is the middle-end and C front end parts of the gccgo related patch set first posted here. https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01009.html I've re-tested these patches in isolation and have committed them to mainline. I'm currently submitting the Go portions for the upstream review, and hope to have to libffi merge complete soon. I think it's important that the change to Go using the static chain happen before gcc5, as it represents an ABI change. Our next opportunity to fix the bulk of the non-x86 targets would be gcc6 in a year's time. r~ Richard Henderson (3): Make TARGET_STATIC_CHAIN allow a function type Allow the front-end to create calls with a static chain Allow the static chain to be set from C gcc/ChangeLog| 27 ++ gcc/c-family/c-common.c | 2 ++ gcc/c-family/c-common.h | 2 +- gcc/c/c-parser.c | 40 ++ gcc/calls.c | 14 -- gcc/config/i386/i386.c | 19 -- gcc/config/moxie/moxie.c | 5 + gcc/config/xtensa/xtensa.c | 2 +- gcc/doc/extend.texi | 13 + gcc/doc/tm.texi | 2 +- gcc/gimple-fold.c| 21 gcc/gimplify.c | 17 +++- gcc/target.def | 6 +++--- gcc/targhooks.c | 5 + gcc/testsuite/ChangeLog | 5 + gcc/testsuite/gcc.dg/cwsc0.c | 16 +++ gcc/testsuite/gcc.dg/cwsc1.c | 46 gcc/tree-cfg.c | 22 +++-- 18 files changed, 222 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c -- 1.9.3
[COMMITTED 3/3] Allow the static chain to be set from C
We need to be able to set the static chain on a few calls within the Go runtime, so expose this with __builtin_call_with_static_chain. * c-family/c-common.c (c_common_reswords): Add __builtin_call_with_static_chain. * c-family/c-common.h (RID_BUILTIN_CALL_WITH_STATIC_CHAIN): New. * c/c-parser.c (c_parser_postfix_expression): Handle it. * doc/extend.texi (__builtin_call_with_static_chain): Document it. * gcc.dg/cwsc0.c: New test. * gcc.dg/cwsc1.c: New test. --- gcc/ChangeLog| 6 ++ gcc/c-family/c-common.c | 2 ++ gcc/c-family/c-common.h | 2 +- gcc/c/c-parser.c | 40 ++ gcc/doc/extend.texi | 13 + gcc/testsuite/ChangeLog | 5 + gcc/testsuite/gcc.dg/cwsc0.c | 16 +++ gcc/testsuite/gcc.dg/cwsc1.c | 46 8 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 839111a..95b6b1b 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -453,6 +453,8 @@ const struct c_common_resword c_common_reswords[] = { "__attribute__", RID_ATTRIBUTE, 0 }, { "__auto_type", RID_AUTO_TYPE, D_CONLY }, { "__bases", RID_BASES, D_CXXONLY }, + { "__builtin_call_with_static_chain", +RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY }, { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY }, { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY }, { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 }, diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index ca6fc8b..7e53923 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -101,7 +101,7 @@ enum rid RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL, RID_CHOOSE_EXPR, RID_TYPES_COMPATIBLE_P, RID_BUILTIN_COMPLEX, RID_BUILTIN_SHUFFLE, RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128, - RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, + RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, RID_BUILTIN_CALL_WITH_STATIC_CHAIN, /* C11 */ RID_ALIGNAS, RID_GENERIC, diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index f90f6af..8a4fd39 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -7414,6 +7414,46 @@ c_parser_postfix_expression (c_parser *parser) = comptypes (e1, e2) ? integer_one_node : integer_zero_node; } break; + case RID_BUILTIN_CALL_WITH_STATIC_CHAIN: + { + vec *cexpr_list; + c_expr_t *e2_p; + tree chain_value; + + c_parser_consume_token (parser); + if (!c_parser_get_builtin_args (parser, + "__builtin_call_with_static_chain", + &cexpr_list, false)) + { + expr.value = error_mark_node; + break; + } + if (vec_safe_length (cexpr_list) != 2) + { + error_at (loc, "wrong number of arguments to " + "%<__builtin_call_with_static_chain%>"); + expr.value = error_mark_node; + break; + } + + expr = (*cexpr_list)[0]; + e2_p = &(*cexpr_list)[1]; + *e2_p = convert_lvalue_to_rvalue (loc, *e2_p, true, true); + chain_value = e2_p->value; + mark_exp_read (chain_value); + + if (TREE_CODE (expr.value) != CALL_EXPR) + error_at (loc, "first argument to " + "%<__builtin_call_with_static_chain%> " + "must be a call expression"); + else if (TREE_CODE (TREE_TYPE (chain_value)) != POINTER_TYPE) + error_at (loc, "second argument to " + "%<__builtin_call_with_static_chain%> " + "must be a pointer type"); + else + CALL_EXPR_STATIC_CHAIN (expr.value) = chain_value; + break; + } case RID_BUILTIN_COMPLEX: { vec *cexpr_list; diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index d10a815..7178c9a 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8913,6 +8913,7 @@ in the Cilk Plus language manual which can be found at @node Other Builtins @section Other Built-in Functions Provided by GCC @cindex built-in functions +@findex __builtin_call_with_static_chain @findex __builtin_fpclassify @findex __builtin_isfinite @findex __builtin_isnormal @@ -9501,6 +9502,18 @@ depending on the arguments' types. For example: @end deftypefn +@deftypefn {Built-in Function} @var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp}) + +The @var{call_exp} expression must be a function call, and the +@var{pointer_exp} expression must be a pointer. The @var{p
[COMMITTED 2/3] Allow the front-end to create calls with a static chain
And, at the same time, allow indirect calls to have a static chain. We'll always eliminate the static chain if we can prove it's unused. * calls.c (prepare_call_address): Allow decl or type for first arg. (expand_call): Pass type to prepare_call_address if no decl. * gimple-fold.c (gimple_fold_call): Eliminate the static chain if the function doesn't use it; fold it otherwise. * gimplify.c (gimplify_call_expr): Gimplify the static chain. * tree-cfg.c (verify_gimple_call): Allow a static chain on indirect function calls. --- gcc/ChangeLog | 8 gcc/calls.c | 14 -- gcc/gimple-fold.c | 21 + gcc/gimplify.c| 17 - gcc/tree-cfg.c| 22 +++--- 5 files changed, 60 insertions(+), 22 deletions(-) diff --git a/gcc/calls.c b/gcc/calls.c index 7f55aaf..c64c0eb 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -197,7 +197,7 @@ static void restore_fixed_argument_area (rtx, rtx, int, int); CALL_INSN_FUNCTION_USAGE information. */ rtx -prepare_call_address (tree fndecl, rtx funexp, rtx static_chain_value, +prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value, rtx *call_fusage, int reg_parm_seen, int sibcallp) { /* Make a valid memory address and copy constants through pseudo-regs, @@ -217,12 +217,13 @@ prepare_call_address (tree fndecl, rtx funexp, rtx static_chain_value, #endif } - if (static_chain_value != 0) + if (static_chain_value != 0 + && (TREE_CODE (fndecl_or_type) != FUNCTION_DECL + || DECL_STATIC_CHAIN (fndecl_or_type))) { rtx chain; - gcc_assert (fndecl); - chain = targetm.calls.static_chain (fndecl, false); + chain = targetm.calls.static_chain (fndecl_or_type, false); static_chain_value = convert_memory_address (Pmode, static_chain_value); emit_move_insn (chain, static_chain_value); @@ -3278,8 +3279,9 @@ expand_call (tree exp, rtx target, int ignore) } after_args = get_last_insn (); - funexp = prepare_call_address (fndecl, funexp, static_chain_value, -&call_fusage, reg_parm_seen, pass == 0); + funexp = prepare_call_address (fndecl ? fndecl : fntype, funexp, +static_chain_value, &call_fusage, +reg_parm_seen, pass == 0); load_register_parameters (args, num_actuals, &call_fusage, flags, pass == 0, &sibcall_failure); diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index acdadcd..4f716b2 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -2723,6 +2723,27 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace) } } + /* Check for indirect calls that became direct calls, and then + no longer require a static chain. */ + if (gimple_call_chain (stmt)) +{ + tree fn = gimple_call_fndecl (stmt); + if (fn && !DECL_STATIC_CHAIN (fn)) + { + gimple_call_set_chain (stmt, NULL); + changed = true; + } + else + { + tree tmp = maybe_fold_reference (gimple_call_chain (stmt), false); + if (tmp) + { + gimple_call_set_chain (stmt, tmp); + changed = true; + } + } +} + if (inplace) return changed; diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 93c06de..c46fb66 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -2432,7 +2432,7 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) } } - /* Finally, gimplify the function arguments. */ + /* Gimplify the function arguments. */ if (nargs > 0) { for (i = (PUSH_ARGS_REVERSED ? nargs - 1 : 0); @@ -2454,6 +2454,21 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) } } + /* Gimplify the static chain. */ + if (CALL_EXPR_STATIC_CHAIN (*expr_p)) +{ + if (fndecl && !DECL_STATIC_CHAIN (fndecl)) + CALL_EXPR_STATIC_CHAIN (*expr_p) = NULL; + else + { + enum gimplify_status t; + t = gimplify_arg (&CALL_EXPR_STATIC_CHAIN (*expr_p), pre_p, + EXPR_LOCATION (*expr_p)); + if (t == GS_ERROR) + ret = GS_ERROR; + } +} + /* Verify the function result. */ if (want_value && fndecl && VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fnptrtype diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 9dd8961..71a828c 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -,22 +,14 @@ verify_gimple_call (gimple stmt) return true; } - /* If there is a static chain argument, this should not be an indirect - call, and the decl should have DECL_STATIC_CHAIN set. */ - if (gimple_call_chain (stmt)) + /* If there is a static chain argument, the call should either be + indirect, or the decl should
[PATCH] Factor and export a const_unop, export const_binop
The following patch is to avoid needlessly recursing to generic_simplify from fold_unary when we ask to perform constant folding. Similar fold_binary is currently entered to constant fold from gimple/generic_simplify and will do extra work in case it isn't able to fold the constant operation. To mitigate that (and still missing the fold_ternary case - to be done as followup in case this gets accepted at this point) - I factor and export a const_unop and provide an exported const_binop which also knows to dispatch to fold_relational_const. These API are intended to provide pure constant folding (that is, all-constant operands, not only a constant result). Bootstrapped and tested on x86_64-unknown-linux-gnu, ok at this point? (it should fix a tiny compile-time regression) I'll followup with a const_ternop (better name?) if ok. Thanks, Richard. 2014-11-19 Richard Biener * fold-const.h (const_unop): Declare. (const_binop): Likewise. * fold-const.c (const_binop): Export overload that expects a type parameter and dispatches to fold_relational_const as well. Check both operand kinds for guarding the transforms. (const_unop): New function, with constant folding from fold_unary_loc. (fold_unary_loc): Dispatch to const_unop for tcc_constant operand. Remove constant folding done there from the simplifications. (fold_binary_loc): Check for constants using CONSTANT_CLASS_P. (fold_negate_expr): Remove dead code from the REAL_CST case. Avoid building garbage in the COMPLEX_CST case. * gimple-match-head.c (gimple_resimplify1): Dispatch to const_unop. (gimple_resimplify2): Dispatch to const_binop. (gimple_simplify): Likewise. Index: gcc/fold-const.c === *** gcc/fold-const.c.orig 2014-11-18 15:16:04.263514898 +0100 --- gcc/fold-const.c2014-11-19 13:22:24.866032427 +0100 *** static bool negate_expr_p (tree); *** 115,121 static tree negate_expr (tree); static tree split_tree (tree, enum tree_code, tree *, tree *, tree *, int); static tree associate_trees (location_t, tree, tree, enum tree_code, tree); - static tree const_binop (enum tree_code, tree, tree); static enum comparison_code comparison_to_compcode (enum tree_code); static enum tree_code compcode_to_comparison (enum comparison_code); static int operand_equal_for_comparison_p (tree, tree, tree); --- 115,120 *** static tree fold_negate_const (tree, tre *** 156,161 --- 155,163 static tree fold_not_const (const_tree, tree); static tree fold_relational_const (enum tree_code, tree, tree, tree); static tree fold_convert_const (enum tree_code, tree, tree); + static tree fold_view_convert_expr (tree, tree); + static bool vec_cst_ctor_to_array (tree, tree *); + /* Return EXPR_LOCATION of T if it is not UNKNOWN_LOCATION. Otherwise, return LOC. */ *** fold_negate_expr (location_t loc, tree t *** 561,570 case REAL_CST: tem = fold_negate_const (t, type); ! /* Two's complement FP formats, such as c4x, may overflow. */ ! if (!TREE_OVERFLOW (tem) || !flag_trapping_math) ! return tem; ! break; case FIXED_CST: tem = fold_negate_const (t, type); --- 563,569 case REAL_CST: tem = fold_negate_const (t, type); ! return tem; case FIXED_CST: tem = fold_negate_const (t, type); *** fold_negate_expr (location_t loc, tree t *** 572,584 case COMPLEX_CST: { ! tree rpart = negate_expr (TREE_REALPART (t)); ! tree ipart = negate_expr (TREE_IMAGPART (t)); ! ! if ((TREE_CODE (rpart) == REAL_CST !&& TREE_CODE (ipart) == REAL_CST) ! || (TREE_CODE (rpart) == INTEGER_CST ! && TREE_CODE (ipart) == INTEGER_CST)) return build_complex (type, rpart, ipart); } break; --- 571,579 case COMPLEX_CST: { ! tree rpart = fold_negate_expr (loc, TREE_REALPART (t)); ! tree ipart = fold_negate_expr (loc, TREE_IMAGPART (t)); ! if (rpart && ipart) return build_complex (type, rpart, ipart); } break; *** const_binop (enum tree_code code, tree a *** 1135,1144 STRIP_NOPS (arg1); STRIP_NOPS (arg2); ! if (TREE_CODE (arg1) == INTEGER_CST) return int_const_binop (code, arg1, arg2); ! if (TREE_CODE (arg1) == REAL_CST) { machine_mode mode; REAL_VALUE_TYPE d1; --- 1130,1139 STRIP_NOPS (arg1); STRIP_NOPS (arg2); ! if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST) return int_const_binop (code, arg1, arg2); ! if (TREE_CODE (arg1) == REAL_CST && TREE_CODE (arg2) == REAL_CST) { machine_mode mode; REAL_VALUE_TYPE d1; *** const