Re: PR 57779 New debug check
... applied this. Thanks, Paolo. // 2013-08-02 Paolo Carlini PR libstdc++/58049 * include/debug/functions.h: Include ; minor formatting changes. (__foreign_iterator_aux4): Declare __l and __ge constexpr. * include/debug/safe_iterator.h (_Safe_iterator<>::operator->): Use __addressof. * include/debug/safe_local_iterator.h (_Safe_local_iterator<>:: operator->): Likewise. Index: include/debug/functions.h === --- include/debug/functions.h (revision 201423) +++ include/debug/functions.h (working copy) @@ -33,6 +33,7 @@ #include // for iterator_traits, categories and // _Iter_base #include // for __is_integer +#include // for __addressof and addressof #if __cplusplus >= 201103L # include // for less and greater_equal # include // for common_type @@ -126,8 +127,8 @@ inline bool __valid_range_aux(const _InputIterator& __first, const _InputIterator& __last, std::__false_type) - { return __valid_range_aux2(__first, __last, - std::__iterator_category(__first)); } +{ return __valid_range_aux2(__first, __last, + std::__iterator_category(__first)); } /** Don't know what these iterators are, or if they are even * iterators (we may get an integral type for InputIterator), so @@ -182,15 +183,14 @@ { typedef typename std::common_type<_PointerType1, _PointerType2>::type _PointerType; - std::less<_PointerType> __l; - std::greater_equal<_PointerType> __ge; + constexpr std::less<_PointerType> __l; + constexpr std::greater_equal<_PointerType> __ge; - return - __l(std::addressof(*__other), - std::addressof(*(__it._M_get_sequence()->_M_base().begin( - || __ge(std::addressof(*__other), - std::addressof(*(__it._M_get_sequence()->_M_base().end() - 1)) + 1); - + return (__l(std::addressof(*__other), + std::addressof(*(__it._M_get_sequence()->_M_base().begin( + || __ge(std::addressof(*__other), + std::addressof(*(__it._M_get_sequence()->_M_base().end() + - 1)) + 1)); } template @@ -205,12 +205,13 @@ // past-the-end iterator. if (__it._M_get_sequence()->_M_base().begin() != __it._M_get_sequence()->_M_base().end()) - if (std::__addressof(*(__it._M_get_sequence()->_M_base().end() - 1)) - - std::__addressof(*(__it._M_get_sequence()->_M_base().begin())) + if (std::addressof(*(__it._M_get_sequence()->_M_base().end() - 1)) + - std::addressof(*(__it._M_get_sequence()->_M_base().begin())) == __it._M_get_sequence()->size() - 1) - return __foreign_iterator_aux4(__it, __other, - std::addressof(*(__it._M_get_sequence()->_M_base().begin())), - std::addressof(*__other)); + return (__foreign_iterator_aux4 + (__it, __other, + std::addressof(*(__it._M_get_sequence()->_M_base().begin())), + std::addressof(*__other))); return true; } @@ -232,8 +233,8 @@ { return __it._M_get_sequence() != __other._M_get_sequence(); } #if __cplusplus >= 201103L - /* This overload detects when passing pointers to the contained elements rather - than using iterators. + /* This overload detects when passing pointers to the contained elements + rather than using iterators. */ template inline bool @@ -271,10 +272,9 @@ _InputIterator __other, std::__false_type) { - return - _Insert_range_from_self_is_safe<_Sequence>::__value - || __foreign_iterator_aux2(__it, __other, - std::__iterator_category(__it)); + return (_Insert_range_from_self_is_safe<_Sequence>::__value + || __foreign_iterator_aux2(__it, __other, +std::__iterator_category(__it))); } template() const @@ -277,7 +276,7 @@ _GLIBCXX_DEBUG_VERIFY(this->_M_dereferenceable(), _M_message(__msg_bad_deref) ._M_iterator(*this, "this")); - return &*_M_current; + return std::__addressof(*_M_current); } // -- Input iterator requirements -- Index: include/debug/safe_local_iterator.h === --- include/debug/safe_local_iterator.h (revision 201423) +++ include/debug/safe_local_iterator.h (working copy) @@ -17
Re: [PING] [C++ Patch] Remove finish_stmt
On 08/02/2013 01:42 AM, Gabriel Dos Reis wrote: 2013/8/1 Paolo Carlini : Hi, gently pinging this small clean-up: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00905.html Thanks! Paolo. It was supposed to provide symmetry and a good place to put any "cleanup" code we might want to run, but that never materialized, and now that we are using C++, we are better abstraction tools. So, it can go. Ah now I see, thanks. I'm going to apply the patch then. Thanks again, Paolo.
Re: [Patch, Fortran, OOP] PR 57306: ICE on valid with class pointer initialization
ping! 2013/7/30 Janus Weil : >> The attached update fixes it, and thus should hopefully be >> regression-free. It also renames 'gfc_class_null_initializer' to >> 'gfc_class_initializer', since it now also does other initializations >> beside EXPR_NULL. >> >> Will do another regtest to make sure it's clean. > > No failures observed. As a test case I'm using now Tobias' extended > version (attached). New ChangeLog below. > > Ok for trunk? > > Cheers, > Janus > > > 2013-07-30 Janus Weil > > PR fortran/57306 > * class.c (gfc_class_null_initializer): Rename to > 'gfc_class_initializer'. Treat non-NULL init-exprs. > * gfortran.h (gfc_class_null_initializer): Update prototype. > * trans-decl.c (gfc_get_symbol_decl): Treat class pointers. > * trans-expr.c (gfc_conv_initializer): Ditto. > (gfc_trans_subcomponent_assign): Renamed gfc_class_null_initializer. > > 2013-07-30 Janus Weil > > PR fortran/57306 > * gfortran.dg/pointer_init_8.f90: New.
Fix ltrans ICE seen while compiling Firefox
Hi, this patch fixes one extra place where we need to consider partial cgraphs and enables firefox to build again. Comitted as obvious, Honza Index: ChangeLog === --- ChangeLog (revision 201430) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-08-01 Jan Hubicka + + * ipa.c (symtab_remove_unreachable_nodes): Nodes in other partitions are + not needed. + 2013-08-01 Uros Bizjak * config/i386/i386.h (MAYBE_NON_Q_CLASS_P): New. Index: ipa.c === --- ipa.c (revision 201430) +++ ipa.c (working copy) @@ -239,6 +239,7 @@ symtab_remove_unreachable_nodes (bool be node->used_as_abstract_origin = false; if (node->symbol.definition && !node->global.inlined_to + && !node->symbol.in_other_partition && (!cgraph_can_remove_if_no_direct_calls_and_refs_p (node) /* Keep around virtual functions for possible devirtualization. */ || (before_inlining_p
Re: Do not use PARM_DECLs in ipa-cp and ipa-prop
Hi, On Thu, Aug 01, 2013 at 05:48:20PM +0200, Jan Hubicka wrote: > > On Thu, Aug 01, 2013 at 03:11:36PM +0200, Jan Hubicka wrote: > > > Hi, > > > this is preparation work to move DECL_ARGUMENTS and DECL_RESULT into > > > function > > > sections during WPA. Even with some work to release unused ones, there > > > are 4M > > > of PARM_DECLs and 2M of RESULT_DECLs streamed during LTO (for 6M of > > > function_decls) making them one of the most common nodes. > > > > > > This patch makes ipa-cp and ipa-prop to not use DECL_ARGUMENTS during WPA > > > stage. this only needed to tamn debug info, move logic doing casts from > > > get_replacement_map to tree_function_versioning and stream move_cost that > > > is > > > computed form parm type. > > > > > > Martin, does this patch look OK? > > > > > > > Generally yes, except that I think you did not convert some dumping in > > ipa-cp.c, at least in functions estimate_local_effects, > > find_more_scalar_values_for_callers_subset and decide_about_value. I > > believe that after your change there should not be a single call to > > ipa_get_param in file ipa-cp.c. > > Hmm, I was aware of that while making the patch but then forgot. > OK, I will convert these too (and the testsuite) and commit. > (alternative I was thining of having ipa_dump_param helper that will > dump "parm 4" in WPA mode and "parm 4 (parmname)" in non-WPA). Do > you think it would be useful? Yes, I think it would be very convenient. > We probably should comonnize syntax of dumps in between ipa-prop > and ipa-inline-analysis predicates (that currently use op4 syntax and call the > parameters operands). Something to track incrementally anyway. > Indeed. Thanks, Martin
[PATCH, Fortran, PR 57987] Do not call cgraph_finalize_function multiple times on finalizers
Hi, when looking at another PR, I found out that inliner refused to even consider __final_test2_T/0 because, according to the dump, "redefined extern inline functions are not considered for inlining." When I looked at why, I realized that the function is "finalized" (by cgraph_finalize_function) twice. Once directly from front-end and second time from un-nesting nested functions. This patch makes sure that the Fortran front-end does not do that. Some details about the patch development over the time is in bugzilla. Bootstrapped and tested on x86_64-linux without any problems, OK for trunk? Thanks, Martin 2013-08-01 Martin Jambor * cgraphunit.c (cgraph_finalize_function): Assert that nested function is not re-finalized. Rename second parameter to no_collect. fortran/ * trans-decl.c (gfc_generate_function_code): Never call cgraph_finalize_function on nested functions. testsuite/ * gfortran.dg/pr57987.f90: New test. Index: src/gcc/cgraphunit.c === --- src.orig/gcc/cgraphunit.c +++ src/gcc/cgraphunit.c @@ -405,17 +405,20 @@ referred_to_p (symtab_node node) } /* DECL has been parsed. Take it, queue it, compile it at the whim of the - logic in effect. If NESTED is true, then our caller cannot stand to have + logic in effect. If NO_COLLECT is true, then our caller cannot stand to have the garbage collector run at the moment. We would need to either create a new GC context, or just not compile right now. */ void -cgraph_finalize_function (tree decl, bool nested) +cgraph_finalize_function (tree decl, bool no_collect) { struct cgraph_node *node = cgraph_get_create_node (decl); if (node->symbol.definition) { + /* Nested functions should only be defined once. */ + gcc_assert (!DECL_CONTEXT (decl) + || TREE_CODE (DECL_CONTEXT (decl)) != FUNCTION_DECL); cgraph_reset_node (node); node->local.redefined_extern_inline = true; } @@ -454,7 +457,7 @@ cgraph_finalize_function (tree decl, boo if (warn_unused_parameter) do_warn_unused_parameter (decl); - if (!nested) + if (!no_collect) ggc_collect (); if (cgraph_state == CGRAPH_STATE_CONSTRUCTION Index: src/gcc/fortran/trans-decl.c === --- src.orig/gcc/fortran/trans-decl.c +++ src/gcc/fortran/trans-decl.c @@ -5640,14 +5640,16 @@ gfc_generate_function_code (gfc_namespac } current_function_decl = old_context; - if (decl_function_context (fndecl) && gfc_option.coarray != GFC_FCOARRAY_LIB - && has_coarray_vars) -/* Register this function with cgraph just far enough to get it - added to our parent's nested function list. - If there are static coarrays in this function, the nested _caf_init - function has already called cgraph_create_node, which also created - the cgraph node for this function. */ -(void) cgraph_create_node (fndecl); + if (decl_function_context (fndecl)) +{ + /* Register this function with cgraph just far enough to get it +added to our parent's nested function list. +If there are static coarrays in this function, the nested _caf_init +function has already called cgraph_create_node, which also created +the cgraph node for this function. */ + if (!has_coarray_vars || gfc_option.coarray != GFC_FCOARRAY_LIB) + (void) cgraph_create_node (fndecl); +} else cgraph_finalize_function (fndecl, true); Index: src/gcc/testsuite/gfortran.dg/pr57987.f90 === --- /dev/null +++ src/gcc/testsuite/gfortran.dg/pr57987.f90 @@ -0,0 +1,24 @@ +! { dg-do compile } +! { dg-options "-O3 -fno-ipa-cp -fdump-ipa-inline" } + +program test + call test2 () +contains + subroutine test2 () +type t + integer, allocatable :: x +end type t + +type t2 + class(t), allocatable :: a +end type t2 + +type(t2) :: one, two + +allocate (two%a) +one = two + end subroutine test2 +end program test + +! { dg-final { scan-ipa-dump-not "redefined extern inline functions are not considered for inlining" "inline" } } +! { dg-final { cleanup-ipa-dump "inline" } }
Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA
Hi All, I've updated MPX Wiki page (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler). I added instrumentation description, programming model description, differences with other checkers, implementation details. What about the first patch? Should I post next patches in the series? Thanks, Ilya 2013/7/29 Ilya Enkovich : > Hi, > > Here is updated version of the patch. I removed redundant > mode_for_bound, added comments to BOUND_TYPE and added -mmpx option. > I also fixed bndmk/bndldx/bndstx constraints to avoid incorrect > register allocation (created two new constraints for that). > > Thanks, > Ilya > > --- > 2013-07-29 Ilya Enkovich > > * mode-classes.def (MODE_BOUND): New. > * tree.def (BOUND_TYPE): New. > * genmodes.c (complete_mode): Support MODE_BOUND. > (BOUND_MODE): New. > (make_bound_mode): New. > * machmode.h (BOUND_MODE_P): New. > * stor-layout.c (int_mode_for_mode): Support MODE_BOUND. > (layout_type): Support BOUND_TYPE. > * tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE. > * tree.c (build_int_cst_wide): Support BOUND_TYPE. > (type_contains_placeholder_1): Likewise. > * tree.h (BOUND_TYPE_P): New. > * varasm.c (output_constant): Support BOUND_TYPE. > * config/i386/constraints.md (B): New. > (Ti): New. > (Tb): New. > * config/i386/i386-modes.def (BND32): New. > (BND64): New. > * config/i386/i386-protos.h (ix86_bnd_prefixed_insn_p): New. > * config/i386/i386.c (isa_opts): Add mmpx. > (regclass_map): Add bound registers. > (dbx_register_map): Likewise. > (dbx64_register_map): Likewise. > (svr4_dbx_register_map): Likewise. > (PTA_MPX): New. > (ix86_option_override_internal) Support MPX ISA. > (ix86_code_end): Add MPX bnd prefix. > (output_set_got): Likewise. > (ix86_output_call_insn): Likewise. > (get_some_local_dynamic_name): Add '!' (MPX bnd) print prefix support. > (ix86_print_operand_punct_valid_p): Likewise. > (ix86_print_operand_address): Support UNSPEC_BNDMK_ADDR and > UNSPEC_BNDMK_ADDR. > (ix86_class_likely_spilled_p): Add bound regs support. > (ix86_hard_regno_mode_ok): Likewise. > (x86_order_regs_for_local_alloc): Likewise. > (ix86_bnd_prefixed_insn_p): New. > * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Fix to new value. > (FIXED_REGISTERS): Add bound registers. > (CALL_USED_REGISTERS): Likewise. > (REG_ALLOC_ORDER): Likewise. > (HARD_REGNO_NREGS): Likewise. > (TARGET_MPX): New. > (VALID_BND_REG_MODE): New. > (FIRST_BND_REG): New. > (LAST_BND_REG): New. > (reg_class): Add BND_REGS. > (REG_CLASS_NAMES): Likewise. > (REG_CLASS_CONTENTS): Likewise. > (BND_REGNO_P): New. > (ANY_BND_REG_P): New. > (BNDmode): New. > (HI_REGISTER_NAMES): Add bound registers. > * config/i386/i386.md (UNSPEC_BNDMK): New. > (UNSPEC_BNDMK_ADDR): New. > (UNSPEC_BNDSTX): New. > (UNSPEC_BNDLDX): New. > (UNSPEC_BNDLDX_ADDR): New. > (UNSPEC_BNDCL): New. > (UNSPEC_BNDCU): New. > (UNSPEC_BNDCN): New. > (UNSPEC_MPX_FENCE): New. > (BND0_REG): New. > (BND1_REG): New. > (type): Add mpxmov, mpxmk, mpxchk, mpxld, mpxst. > (length_immediate): Likewise. > (prefix_0f): Likewise. > (memory): Likewise. > (prefix_rep): Check for bnd prefix. > (BND): New. > (bnd_ptr): New. > (BNDCHECK): New. > (bndcheck): New. > (*jcc_1): Add MPX bnd prefix and fix length. > (*jcc_2): Likewise. > (jump): Likewise. > (simple_return_internal): Likewise. > (simple_return_pop_internal): Likewise. > (*indirect_jump): Add MPX bnd prefix. > (*tablejump_1): Likewise. > (simple_return_internal_long): Likewise. > (simple_return_indirect_internal): Likewise. > (_mk): New. > (*_mk): New. > (mov): New. > (*mov_internal_mpx): New. > (_): New. > (*_): New. > (_ldx): New. > (*_ldx): New. > (_stx): New. > (*_stx): New. > * config/i386/predicates.md (lea_address_operand) Rename to... > (address_no_seg_operand): ... this. > (address_mpx_no_base_operand): New. > (address_mpx_no_index_operand): New. > (bnd_mem_operator): New. > * config/i386/i386.opt (mmpx): New.
Re: PR 57779 New debug check
On Fri, Aug 2, 2013 at 2:36 AM, Paolo Carlini wrote: > ... applied this. thanks.
[PATCH][ARM] Delete gcc.target/arm/neon-for-64bits-2.c test
Hi all, The gcc.target/arm/neon-for-64bits-2.c test is supposed to test that -mneon-for-64bits gets the compiler to use NEON instructions for 64 bit bitwise operations. However, the -mneon-for-64bits flag was only really added as a complement of -mno-neon-for-64bits that explicitly prevents generation of 64 bit. The -mneon-for-64bits option is not supposed to force generation of NEON instructions and therefore the compiler might choose to not use them, making this testcase invalid. This patch removes it. Ok for trunk? Thanks, Kyrill 2013-08-02 Kyrylo Tkachov * gcc.target/arm/neon-for-64bits-2.c: Delete. remove_neon_64bits_test.patch Description: Binary data
Re: [PATCH, Fortran, PR 57987] Do not call cgraph_finalize_function multiple times on finalizers
> Hi, > > when looking at another PR, I found out that inliner refused to even > consider __final_test2_T/0 because, according to the dump, "redefined > extern inline functions are not considered for inlining." When I > looked at why, I realized that the function is "finalized" (by > cgraph_finalize_function) twice. Once directly from front-end and > second time from un-nesting nested functions. > > This patch makes sure that the Fortran front-end does not do that. > Some details about the patch development over the time is in bugzilla. > Bootstrapped and tested on x86_64-linux without any problems, OK for > trunk? OK (I hope that the fortran part is cgraph centric so I can approve it), thanks. Thanks for noticing it - those missed inlines seems quite unforutnate. Honza
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On 1 August 2013 18:32, Teresa Johnson wrote: > Patch 3 of 3 split out from the patch I sent in May that fixes problems with > -freorder-blocks-and-partition, with changes/fixes discussed in that thread. > > See http://gcc.gnu.org/ml/gcc-patches/2013-05/threads.html#00388 for context. > > This patch sanitizes the partitioning to address issues such as edge > weight insanities that sometimes occur due to upstream optimizations, > and ensures that hot blocks are not dominated by cold blocks. This > needs to be resanitized after certain cfg optimizations that may > cause hot blocks previously reached via both hot and cold paths to > only be reached by cold paths. > > The verification code in sanitize_dominator_hotness was contributed by > Steven Bosscher. > > Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that > a profiledbootstrap passed with -freorder-blocks-and-partition enabled > (and with the dwarf version changed to 2 to work around PR57451). > > Ok for trunk? > > (I also included the patch as an attachment since my mailer invariably > messes up the formatting in the pasted version.) > > Thanks, > Teresa > > 2013-08-01 Teresa Johnson > Steven Bosscher > > * cfgrtl.c (fixup_bb_partition): New routine. > (commit_edge_insertions): Invoke fixup_partitions. > (find_partition_fixes): New routine. > (fixup_partitions): Ditto. > (verify_hot_cold_block_grouping): Update comments. > (rtl_verify_edges): Invoke find_partition_fixes. > (rtl_verify_bb_pointers): Update comments. > (rtl_verify_bb_layout): Ditto. > * basic-block.h (fixup_partitions): Declare. > * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. > * bb-reorder.c (sanitize_dominator_hotness): New function. > (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke > sanitize_dominator_hotness. > > Index: cfgrtl.c > === > --- cfgrtl.c (revision 201281) > +++ cfgrtl.c (working copy) > @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e) > } > } > > +/* Called when block BB has been reassigned to a different partition, > + to ensure that the region crossing attributes are updated. */ > + > +static void > +fixup_bb_partition (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + > + /* Now need to make bb's pred edges non-region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->preds) > +{ > + fixup_partition_crossing (e); > +} > + > + /* Possibly need to make bb's successor edges region crossing, > + or remove stale region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->succs) > +{ > + if ((e->flags & EDGE_FALLTHRU) > + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > + && e->dest != EXIT_BLOCK_PTR) > +force_nonfallthru (e); > + else > +fixup_partition_crossing (e); > +} > +} > + > /* Attempt to change code to redirect edge E to TARGET. Don't do that on > expense of adding new instructions or reordering basic blocks. > > @@ -1979,6 +2007,14 @@ commit_edge_insertions (void) > { >basic_block bb; > > + /* Optimization passes that invoke this routine can cause hot blocks > + previously reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification below to fail, > + and lead to now cold code in the hot section. In some cases this > + may only be visible after newly unreachable blocks are deleted, > + which will be done by fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING >verify_flow_info (); > #endif > @@ -2173,6 +2209,101 @@ get_last_bb_insn (basic_block bb) >return end; > } > > +/* Sanity check partition hotness to ensure that basic blocks in > + the cold partition don't dominate basic blocks in the hot partition. > + If FLAG_ONLY is true, report violations as errors. Otherwise > + re-mark the dominated blocks as cold, since this is run after > + cfg optimizations that may make hot blocks previously reached > + by both hot and cold blocks now only reachable along cold paths. */ > + > +vec > +find_partition_fixes (bool flag_only) > +{ > + basic_block bb; > + vec bbs_in_cold_partition = vNULL; > + vec bbs_to_fix = vNULL; > + > + if (!crtl->has_bb_partition) > +return vNULL; I'd push this early return into the callers instead, at most turn it into a gcc_checking_assert to be safe. Both callers, fixup_partitions and rtl_verify_edges, look at ctrl->has_bb_partition already before calling this, so the above should be dead already. Did my mailer somehow swallow the static from find_partition_fixes? > + > + FOR_EACH_BB (bb) > +if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > + bbs_in_cold_partition.safe_push (bb); > + > + if (bbs_in_cold_partition.is_empty ()) > +return vNULL; > + > + bool dom_calculated_here = !
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: > Index: libitm/config/s390/target.h > === > *** libitm/config/s390/target.h.orig > --- libitm/config/s390/target.h > *** [...] > + static inline uint32_t > + htm_begin () > + { > + return __builtin_tbegin_nofloat (NULL); > + } I'm not quite sure I understand why the nofloat variant is sufficient; is this because we don't use FPRs in the abort/restart code of libitm (and the FPRs are known to be clobbered after starting a txn)? There is a bit in the properties passed to _ITM_beginTransaction which states whether the txn has floating point update code (pr_hasNoFloatUpdate). Would this be useful? (BTW, sorry for the late response. I don't read gcc-patches frequently, so please CC me on libitm-related things.)
Re: [PATCH][4.8 backport] S/390: Transactional Execution support
On Thu, 2013-08-01 at 10:55 +0200, Andreas Krebbel wrote: > Torvald, could you please consider adding backports of the following > patches to the 4.8 branch? > > 19/06/13 [patch] libitm: Fix handling of reentrancy in the HTM fastpath > http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01132.html > > 20/06/13 Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath > http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01188.html I believe Jakub already backported these two; at least he told me he was planning to, IIRC. Torvald
[PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK
Hi, while compute_record_mode in stor-layout.c makes sure it assigns BLK mode to structs with flexible arrays, it has no such provisions for zero length arrays (http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Zero-Length.html). I think that in order to avoid problems and surprises like PR 57748 (where this triggered code that was intended for small structures that fit into a scalar mode and ICEd), we should assign both variable array possibilities the same mode. Bootstrapped and tested on x86_64-linux without any problems. OK for trunk and the 4.8 branch? (I'm not sure about the 4.7, this PR does not happen there despite the wrong mode so I'd ignore it for now.) Thanks, Martin 2013-08-01 Martin Jambor PR middle-end/57748 * stor-layout.c (compute_record_mode): Treat zero-sized array fields like incomplete types. testsuite/ * gcc.dg/torture/pr57748.c: New test. *** /tmp/lV6Ba8_stor-layout.c Thu Aug 1 16:28:25 2013 --- gcc/stor-layout.c Thu Aug 1 15:36:18 2013 *** compute_record_mode (tree type) *** 1604,1610 && integer_zerop (TYPE_SIZE (TREE_TYPE (field) || ! host_integerp (bit_position (field), 1) || DECL_SIZE (field) == 0 ! || ! host_integerp (DECL_SIZE (field), 1)) return; /* If this field is the whole struct, remember its mode so --- 1604,1612 && integer_zerop (TYPE_SIZE (TREE_TYPE (field) || ! host_integerp (bit_position (field), 1) || DECL_SIZE (field) == 0 ! || ! host_integerp (DECL_SIZE (field), 1) ! || (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE ! && tree_low_cst (DECL_SIZE (field), 1) == 0)) return; /* If this field is the whole struct, remember its mode so *** /dev/null Tue Jun 4 12:34:56 2013 --- gcc/testsuite/gcc.dg/torture/pr57748.c Thu Aug 1 15:42:14 2013 *** *** 0 --- 1,45 + /* PR middle-end/57748 */ + /* { dg-do run } */ + + #include + + extern void abort (void); + + typedef long long V + __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); + + typedef struct S { V a; V b[0]; } P __attribute__((aligned (1))); + + struct __attribute__((packed)) T { char c; P s; }; + + void __attribute__((noinline, noclone)) + check (struct T *t) + { + if (t->s.b[0][0] != 3 || t->s.b[0][1] != 4) + abort (); + } + + int __attribute__((noinline, noclone)) + get_i (void) + { + return 0; + } + + void __attribute__((noinline, noclone)) + foo (P *p) + { + V a = { 3, 4 }; + int i = get_i(); + p->b[i] = a; + } + + int + main () + { + struct T *t = (struct T *) malloc (128); + + foo (&t->s); + check (t); + + return 0; + }
Re: [PATCH][ARM] Delete gcc.target/arm/neon-for-64bits-2.c test
On 02/08/13 11:56, Kyrylo Tkachov wrote: > Hi all, > > The gcc.target/arm/neon-for-64bits-2.c test is supposed to test that > -mneon-for-64bits gets the compiler to use NEON instructions for 64 bit > bitwise operations. However, the -mneon-for-64bits flag was only really added > as a complement of -mno-neon-for-64bits that explicitly prevents generation of > 64 bit. The -mneon-for-64bits option is not supposed to force generation of > NEON instructions and therefore the compiler might choose to not use them, > making this testcase invalid. > > This patch removes it. > > Ok for trunk? > > Thanks, > Kyrill > > 2013-08-02 Kyrylo Tkachov > > * gcc.target/arm/neon-for-64bits-2.c: Delete. > > OK R.
[AARCH64][Insn classification unification 4/N] load/store types
Hi, This patch is part of the ongoing work to unify instruction classification between the ARM and AARCH64 backends. This patch adds "load_acq" and "store_rel" types for classifying load acquire and store release instructions respectively. It also updates the ARMv8 pipeline descriptions. OK for trunk? Thanks Sofiane - ChangeLog: * config/arm/types.md (define_attr "type"): Add "load_acq" and "store_rel". * config/arm/cortex-a53.md (cortex_a53_load1): Update for attribute changes. (cortex_a53_store1): Likewise. arm-a64-fine-tune-load-store.diff Description: Binary data
Re: [AARCH64][Insn classification unification 4/N] load/store types
On 02/08/13 14:03, Sofiane Naci wrote: > Hi, > > This patch is part of the ongoing work to unify instruction classification > between the ARM and AARCH64 backends. > > This patch adds "load_acq" and "store_rel" types for classifying load > acquire and store release instructions respectively. It also updates the > ARMv8 pipeline descriptions. > > OK for trunk? > > Thanks > Sofiane > > - > > ChangeLog: > > * config/arm/types.md (define_attr "type"): Add "load_acq" and > "store_rel". > * config/arm/cortex-a53.md (cortex_a53_load1): Update for attribute > changes. > (cortex_a53_store1): Likewise. > OK
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 13:31, Torvald Riegel wrote: > On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: >> Index: libitm/config/s390/target.h >> === >> *** libitm/config/s390/target.h.orig >> --- libitm/config/s390/target.h >> *** > > [...] > >> + static inline uint32_t >> + htm_begin () >> + { >> + return __builtin_tbegin_nofloat (NULL); >> + } > > I'm not quite sure I understand why the nofloat variant is sufficient; > is this because we don't use FPRs in the abort/restart code of libitm > (and the FPRs are known to be clobbered after starting a txn)? Yes. To my understanding there are 2 potential issues when the FPRs are not saved. 1. The abort code will read corrupted FPR content for FPRs: - which are live across tbegin and - are updated in the TX body and - are read in the abort code The abort code will see the updated value from the TX body. Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. 2. The other problem is that the ABI might get violated when aborting into a callee which already returned and there are FPRs which have been modified afterwards. In that case the compiler would not have generated FPR save/restore instructions in the function prologue/epilogue of the callee. But in fact there are modified FPRs when exiting the second time. But again this should be no problem thanks to the setjmp/longjmp semantics of _ITM_beginTransaction and GTM_longjmp. If the TX body modified a call-saved FPR it will get restored on the abort path back from libitm to the user code. As long as libitm does not use FPRs itself this should be safe without having tbegin clobbering FPRs. > There is a bit in the properties passed to _ITM_beginTransaction which > states whether the txn has floating point update code > (pr_hasNoFloatUpdate). Would this be useful? This probably would be useful in the ITM_beginTransaction / GTM_longjmp implementations which in that case could try to avoid the FPR save/restores. But Richard mentioned that these bits so far are never set by GCC since it is lacking the required infos from the register allocator. > (BTW, sorry for the late response. I don't read gcc-patches frequently, > so please CC me on libitm-related things.) No problem. I'll try to remember this and cc you next time. Bye, -Andreas-
Re: [Patch] regex bracket expression implementaion
Hi, On 08/01/2013 04:59 PM, Tim Shen wrote: Fully tested under x86_64. (make bootstrap && make -k check). Next, I'll try to refactor _Grep_matcher using templates instead of virtual functions, to make implementing back-reference easier. If nobody has further comments over the next day or so, patch is Ok with me. Thanks, Paolo. PS: I think it would be nice if from to time you could provide a rough estimate of the amount of work done vs the work still needed; major milestones; etc. Just to have an idea.
[c++-concepts] class template constraints
Attached is a patch that deals with class template constraints. In particular, it does 3 things: 1. Type constraints are checked on lookup rather than instantiation. 2. Initial support for constrained out-of-class definitions is added (more work needed) 3. Support for constrained partial template specialization is added. Change Log: 2013-08-02 Andrew Sutton * gcc/cp/pt.c (get_class_bindings): Pass the partial specialization for constraint evaluation. Evaluate constraints, resulting in deduction failure on error. (get_specializaing_template_decl), (get_specialization_constraints), (maybe_new_partial_specialization): New. (maybe_process_partial_specialization): Allow the creation of new types for constrained partial specializations. (process_partial_specialization): Modify the canonical type of constrained partial specializations. (lookup_template_class_1): Check constraints on lookup. Do not explicitly check alias constraints. (instantiate_class_template_1): Do not explicitly check constraints during class template instantiation. (more_specialized_class): Pass specializations to get_class_bindings(). Compare specialization constraints if the types are equivalent. (most_specialized_class): Pass specialization to get_class_bndings(). * gcc/cp/constraint.cc (reduce_template_id): Don't crash when omitting ()'s on constraint calls. (check_constraints): Don't try to evaluate when arguments are dependent. (equivalent_constraints): Optimize the case when a and b are the same constraints. * gcc/cp/decl2.c (check_class_fn): Allow overloading of constrained out-of-class member definitions. * gcc/cp/parser.c (cp_parser_init_declarator): Parse requires clauses for out-of-class member definitions. Be sure to restore current constraints before exiting the function. (cp_parser_late_parsing_for_member): Restore constraints after maybe_end_member_template_processing(). Andrew template-1.patch Description: Binary data
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: > Since libitm implements TX begins as function calls only call-saved registers > can be live across a > tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and > get restored when doing > the longjmp back into the user code. So this should be no problem. Except that the htm_begin() routines are declared static inline functions, so when they're inlined, you aren't protected by the semantics of a call anymore, are you? Peter
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 16:23, Peter Bergner wrote: > On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: >> Since libitm implements TX begins as function calls only call-saved >> registers can be live across a >> tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and >> get restored when doing >> the longjmp back into the user code. So this should be no problem. > > Except that the htm_begin() routines are declared static inline functions, > so when they're inlined, you aren't protected by the semantics of a call > anymore, are you? They get inlined into libitm code but not into the user code. As I understand it in the user code will always be a call to _ITM_beginTransaction. -Andreas- > > Peter > > >
patch to fix PR57963 (s390)
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57963 The page contains a good analysis of the PR from Andreas Krebbel. The patch was successfully bootstrapped and tested on x86/x86-64 and s390x. Committed as rev. 201438. 2013-08-02 Vladimir Makarov PR rtl-optimization/57963 * lra-constraints.c (reverse_equiv_p, contains_reloaded_insn_p): New. (lra_constraints): Use them. Index: lra-constraints.c === --- lra-constraints.c (revision 201435) +++ lra-constraints.c (working copy) @@ -3600,6 +3600,40 @@ init_insn_rhs_dead_pseudo_p (int regno) return false; } +/* Return TRUE if REGNO has a reverse equivalence. The equivalence is + reverse only if we have one init insn with given REGNO as a + source. */ +static bool +reverse_equiv_p (int regno) +{ + rtx insns, set; + + if ((insns = ira_reg_equiv[regno].init_insns) == NULL_RTX) +return false; + if (! INSN_P (XEXP (insns, 0)) + || XEXP (insns, 1) != NULL_RTX) +return false; + if ((set = single_set (XEXP (insns, 0))) == NULL_RTX) +return false; + return REG_P (SET_SRC (set)) && (int) REGNO (SET_SRC (set)) == regno; +} + +/* Return TRUE if REGNO was reloaded in an equivalence init insn. We + call this function only for non-reverse equivalence. */ +static bool +contains_reloaded_insn_p (int regno) +{ + rtx set; + rtx list = ira_reg_equiv[regno].init_insns; + + for (; list != NULL_RTX; list = XEXP (list, 1)) +if ((set = single_set (XEXP (list, 0))) == NULL_RTX + || ! REG_P (SET_DEST (set)) + || (int) REGNO (SET_DEST (set)) != regno) + return true; + return false; +} + /* Entry function of LRA constraint pass. Return true if the constraint pass did change the code. */ bool @@ -3643,7 +3677,6 @@ lra_constraints (bool first_p) else if ((x = get_equiv_substitution (reg)) != reg) { bool pseudo_p = contains_reg_p (x, false, false); - rtx set, insns; /* After RTL transformation, we can not guarantee that pseudo in the substitution was not reloaded which might @@ -3675,13 +3708,13 @@ lra_constraints (bool first_p) removed the insn. When the equiv can be a constant, the right hand side of the init insn can be a pseudo. */ - || (! ((insns = ira_reg_equiv[i].init_insns) != NULL_RTX - && INSN_P (XEXP (insns, 0)) - && XEXP (insns, 1) == NULL_RTX - && (set = single_set (XEXP (insns, 0))) != NULL_RTX - && REG_P (SET_SRC (set)) - && (int) REGNO (SET_SRC (set)) == i) - && init_insn_rhs_dead_pseudo_p (i)) + || (! reverse_equiv_p (i) + && (init_insn_rhs_dead_pseudo_p (i) + /* If we reloaded the pseudo in an equivalence + init insn, we can not remove the equiv init + insns and the init insns might write into + const memory in this case. */ + || contains_reloaded_insn_p (i))) /* Prevent access beyond equivalent memory for paradoxical subregs. */ || (MEM_P (x)
Introduce local aliases for call targets when possible
Hi, this patch (developed with Martin Liska) adds symbol table function symtab_nonoverwritable_alias that for a given definition returns symtab node that is not overwritable by linking and points to it. This patch makes simple use of it: when inline function is called and the symbol itself can be interposed by dynamic linking (and can not be interposed by static linking), we replace all direct calls to it by calls to the alias. This saves jumps to PLT entries. For Firefox this makes just few replacements, but I guess every relocation counts. Similar replacements are possibly in virtual tables that I am going to change next - more care is needed there since we do not yet have any infrastructure to redirect relocations. Bootstrapped/regtested ppc64-linux, will commit it shortly. PS: I just noticed symtab_nonoverwritable_alias lacks documentation, will add it momentarily. Honza * cgraph.c (cgraph_function_body_availability): Do not check cgrpah flags. * cgraph.h (symtab_for_node_and_aliases, symtab_nonoverwritable_alias, symtab_node_availability): Declare. * ipa.c (can_replace_by_local_alias): New. (function_and_variable_visibility): Use it. * symtab.c (symtab_for_node_and_aliases, symtab_nonoverwritable_alias_1, symtab_nonoverwritable_alias): New. Index: cgraph.c === --- cgraph.c(revision 201412) +++ cgraph.c(working copy) @@ -1697,7 +1697,6 @@ enum availability cgraph_function_body_availability (struct cgraph_node *node) { enum availability avail; - gcc_assert (cgraph_function_flags_ready); if (!node->symbol.analyzed) avail = AVAIL_NOT_AVAILABLE; else if (node->local.local) Index: cgraph.h === --- cgraph.h(revision 201413) +++ cgraph.h(working copy) @@ -597,6 +597,12 @@ symtab_node symtab_alias_ultimate_target enum availability *avail = NULL); bool symtab_resolve_alias (symtab_node node, symtab_node target); void fixup_same_cpp_alias_visibility (symtab_node node, symtab_node target); +bool symtab_for_node_and_aliases (symtab_node, + bool (*) (symtab_node, void *), + void *, + bool); +symtab_node symtab_nonoverwritable_alias (symtab_node); +enum availability symtab_node_availability (symtab_node); /* In cgraph.c */ void dump_cgraph (FILE *); Index: ipa.c === --- ipa.c (revision 201412) +++ ipa.c (working copy) @@ -750,6 +750,21 @@ varpool_externally_visible_p (struct var return false; } +/* Return true if reference to NODE can be replaced by a local alias. + Local aliases save dynamic linking overhead and enable more optimizations. + */ + +bool +can_replace_by_local_alias (symtab_node node) +{ + return (symtab_node_availability (node) > AVAIL_OVERWRITABLE + && !DECL_EXTERNAL (node->symbol.decl) + && (!DECL_ONE_ONLY (node->symbol.decl) + || node->symbol.resolution == LDPR_PREVAILING_DEF + || node->symbol.resolution == LDPR_PREVAILING_DEF_IRONLY + || node->symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)); +} + /* Mark visibility of all functions. A local function is one whose calls can occur only in the current @@ -871,7 +886,36 @@ function_and_variable_visibility (bool w } } FOR_EACH_DEFINED_FUNCTION (node) -node->local.local = cgraph_local_node_p (node); +{ + node->local.local = cgraph_local_node_p (node); + + /* If we know that function can not be overwritten by a different semantics +and moreover its section can not be discarded, replace all direct calls +by calls to an nonoverwritable alias. This make dynamic linking +cheaper and enable more optimization. + +TODO: We can also update virtual tables. */ + if (node->callers && can_replace_by_local_alias ((symtab_node)node)) + { + struct cgraph_node *alias = cgraph (symtab_nonoverwritable_alias ((symtab_node) node)); + + if (alias != node) + { + while (node->callers) + { + struct cgraph_edge *e = node->callers; + + cgraph_redirect_edge_callee (e, alias); + if (!flag_wpa) + { + push_cfun (DECL_STRUCT_FUNCTION (e->caller->symbol.decl)); + cgraph_redirect_edge_call_stmt_to_callee (e); + pop_cfun (); + } + } + } + } +} FOR_EACH_VARIABLE (vnode) { /* weak flag makes no sense on local variables. */ Index: symtab.c === --- symtab.c(revision 201412) +++ symtab.c(wor
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote: > On 02/08/13 16:23, Peter Bergner wrote: > > On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: > >> Since libitm implements TX begins as function calls only call-saved > >> registers can be live across a > >> tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and > >> get restored when doing > >> the longjmp back into the user code. So this should be no problem. > > > > Except that the htm_begin() routines are declared static inline functions, > > so when they're inlined, you aren't protected by the semantics of a call > > anymore, are you? > > They get inlined into libitm code but not into the user code. As I understand > it in the user code > will always be a call to _ITM_beginTransaction. Sure, that protects the user code, but what about the libitm code? >From your previous comment: > As long as libitm does not use FPRs itself this should be safe without > having tbegin clobbering FPRs. Is it a given that s390 doesn't use FPRs without explicit use of floating point types? I ask, because on POWER, we can and do generate floating point code without explicit use of double, float, etc. Maybe s390 is safe in that respect. Peter
[RFC PATCH][ARM] Uncombine cbz/cbnz in if-conversion
Hi, On ARM/Thumb-2, if-conversion is sometimes unnecessarily constrained by combine pass (example below). In such cases, it would be profitable to teach if-conversion to apply a reverse transformation, but only if if-converting is possible. The attached patch implements that and improves performance on ARM, but in order to "uncombine" insns it references CC_REGNO directly and thus does not work on other architecture. Could anybody suggest a machine-independent approach? Consider: int main(int argc, char **argv) { int i, x = 42; for (i = 0; i < 1024; i++) { if (i % argc) x *= x; else x += 4; } printf("%d\n", x); } arm-unknown-linux-gnueabi-gcc -mthumb -mfloat-abi=softfp -mcpu=cortex-a8 -mtune=cortex-a8 -O2 thumb-vfp-cond.c -o thumb-vfp-cond.s -S Combine propagates compare into if_then_else pattern that matches with cbz/cbnz Thumb-2 patterns (there's no such problem in ARM mode), resulting in the following rtl: (if_then_else (eq (reg:SI 1 r1 [+4 ]) (const_int 0 [0])) (label_ref:SI 22) (pc)) Then if-conversion fails to match cond_exec instruction, because its predicate contains a comparison with regular reg instead of CC: (insn 19 18 22 4 (cond_exec (ne (reg:SI 1 r1 [+4 ]) (const_int 0 [0])) (set (reg/v:SI 5 r5 [orig:110 x ] [110]) (mult:SI (reg/v:SI 5 r5 [orig:110 x ] [110]) (reg/v:SI 5 r5 [orig:110 x ] [110] thumb-vfp-cond.c:8 -1 (nil)) The attached patch allows if-conversion to "uncombine" such if_then_else patterns into two insn: 1) separate comparison that sets CC, and 2) if_then_else that uses it, which can be converted to cond_exec. This transformation is only applied in case if-conversion was successful. Also, this fix works only for if_then_elses with both then and else blocks (converting only if-then and if-else blocks proved unprofitable). Spec2000 results (ref) testbasepeak % timetime gzip289.07 290.4 -0.46 vpr 255.75 257.18 -0.56 gcc 120.17 119.08 0.91 mcf 427.3 425.81 0.35 crafty 156.56 152.53 2.57 parser 373.05 373.44 -0.10 eon 142.07 142.54 -0.33 perlbmk 217.38 218.2 -0.38 gap 138.77 138.56 0.15 vortex 260.64 260.43 0.08 bzip2 255.3 256.01 -0.28 twolf 433.83 433.98 -0.03 Geomean 234.34 233.96 0.16 The results on train are a bit better than on ref, but in both cases crafty grows by 2-2.5%. It was in total 4221 conditional instructions in Spec2000 before the patch and 5772 after, so if-conversion succeeded about 30% more times. Regtested on ARM, mostly ok except for one test (gcc.target/arm/pr46631.c) that will be investigated. I can't reproduce that by hand. 2013-08-02 Dmitry Plotnikov * ifcvt.c (cond_exec_process_if_block): Fix test_expr for if-then-elses when it has comparison without CC reg. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index f081ecd..d915170 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -441,6 +448,7 @@ cond_exec_process_if_block (ce_if_block_t * ce_info, rtx else_first_tail = NULL_RTX; /* First match at the tail of ELSE */ int then_n_insns, else_n_insns, n_insns; enum rtx_code false_code; + rtx uncombined_compare = NULL_RTX; /* If test is comprised of && or || elements, and we've failed at handling all of them together, just use the last test if it is the special case of @@ -462,6 +470,26 @@ cond_exec_process_if_block (ce_if_block_t * ce_info, if (! test_expr) return FALSE; + /* Extract predicate for compare from combined if-then-else. */ + + if (then_bb && else_bb + && REGNO (XEXP (test_expr,0)) != CC_REGNUM) + { + /* Test_expr with non CC reg, so we need to emit compare and make + new test_expr with CC reg. */ + rtx tmp_test_expr; + rtx tmp_cc_reg = gen_rtx_REG(CCmode, CC_REGNUM); + + tmp_test_expr = gen_rtx_fmt_ee (GET_CODE (test_expr), + GET_MODE (test_expr), + tmp_cc_reg, const0_rtx); + uncombined_compare = gen_rtx_SET (GET_MODE (test_expr), tmp_cc_reg, +gen_rtx_COMPARE (CCmode, + XEXP (test_expr,0), + XEXP (test_expr,1))); + test_expr = tmp_test_expr; + } + /* If the conditional jump is more than just a conditional jump, then we can not do conditional execution conversion on this block. */ if (! onlyjump_p (BB_END (test_bb))) @@ -683,6 +711,19 @@ cond_exec_process_if_block (ce_if_block_t * ce_info, IFCVT_MODIFY_FINAL (ce_info); #endif + /* Second part of compare extraction. We should emit compare insn here. */ + if (uncombined_compare != NULL_RTX) + { + rtx test_bb_
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 16:36, Peter Bergner wrote: > On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote: >> On 02/08/13 16:23, Peter Bergner wrote: >> As long as libitm does not use FPRs itself this should be safe without >> having tbegin clobbering FPRs. > > Is it a given that s390 doesn't use FPRs without explicit use of > floating point types? I ask, because on POWER, we can and do > generate floating point code without explicit use of double, > float, etc. Maybe s390 is safe in that respect. S/390 used to be on the safe side regarding this but with mainline we started using FPRs as GPR save slots. As Richard suggested we should build libitm with -msoft-float in order to prevent this within libitm. Unfortunately I forgot about this in my S/390 libitm patch. So I'll commit the following: Index: libitm/configure.tgt === *** libitm/configure.tgt.orig 2013-07-30 07:42:29.0 + --- libitm/configure.tgt2013-08-02 14:43:18.641206151 + *** case "${target_cpu}" in *** 109,115 ARCH=x86 ;; s390|s390x) ! XCFLAGS="${XCFLAGS} -mzarch -mhtm" ARCH=s390 ;; --- 109,115 ARCH=x86 ;; s390|s390x) ! XCFLAGS="${XCFLAGS} -mzarch -mhtm -msoft-float" ARCH=s390 ;;
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Fri, Aug 2, 2013 at 4:22 AM, Bernhard Reutner-Fischer wrote: > On 1 August 2013 18:32, Teresa Johnson wrote: >> Patch 3 of 3 split out from the patch I sent in May that fixes problems with >> -freorder-blocks-and-partition, with changes/fixes discussed in that thread. >> >> See http://gcc.gnu.org/ml/gcc-patches/2013-05/threads.html#00388 for context. >> >> This patch sanitizes the partitioning to address issues such as edge >> weight insanities that sometimes occur due to upstream optimizations, >> and ensures that hot blocks are not dominated by cold blocks. This >> needs to be resanitized after certain cfg optimizations that may >> cause hot blocks previously reached via both hot and cold paths to >> only be reached by cold paths. >> >> The verification code in sanitize_dominator_hotness was contributed by >> Steven Bosscher. >> >> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that >> a profiledbootstrap passed with -freorder-blocks-and-partition enabled >> (and with the dwarf version changed to 2 to work around PR57451). >> >> Ok for trunk? >> >> (I also included the patch as an attachment since my mailer invariably >> messes up the formatting in the pasted version.) >> >> Thanks, >> Teresa >> >> 2013-08-01 Teresa Johnson >> Steven Bosscher >> >> * cfgrtl.c (fixup_bb_partition): New routine. >> (commit_edge_insertions): Invoke fixup_partitions. >> (find_partition_fixes): New routine. >> (fixup_partitions): Ditto. >> (verify_hot_cold_block_grouping): Update comments. >> (rtl_verify_edges): Invoke find_partition_fixes. >> (rtl_verify_bb_pointers): Update comments. >> (rtl_verify_bb_layout): Ditto. >> * basic-block.h (fixup_partitions): Declare. >> * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. >> * bb-reorder.c (sanitize_dominator_hotness): New function. >> (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke >> sanitize_dominator_hotness. >> >> Index: cfgrtl.c >> === >> --- cfgrtl.c (revision 201281) >> +++ cfgrtl.c (working copy) >> @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e) >> } >> } >> >> +/* Called when block BB has been reassigned to a different partition, >> + to ensure that the region crossing attributes are updated. */ >> + >> +static void >> +fixup_bb_partition (basic_block bb) >> +{ >> + edge e; >> + edge_iterator ei; >> + >> + /* Now need to make bb's pred edges non-region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->preds) >> +{ >> + fixup_partition_crossing (e); >> +} >> + >> + /* Possibly need to make bb's successor edges region crossing, >> + or remove stale region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->succs) >> +{ >> + if ((e->flags & EDGE_FALLTHRU) >> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> + && e->dest != EXIT_BLOCK_PTR) >> +force_nonfallthru (e); >> + else >> +fixup_partition_crossing (e); >> +} >> +} >> + >> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >> expense of adding new instructions or reordering basic blocks. >> >> @@ -1979,6 +2007,14 @@ commit_edge_insertions (void) >> { >>basic_block bb; >> >> + /* Optimization passes that invoke this routine can cause hot blocks >> + previously reached by both hot and cold blocks to become dominated only >> + by cold blocks. This will cause the verification below to fail, >> + and lead to now cold code in the hot section. In some cases this >> + may only be visible after newly unreachable blocks are deleted, >> + which will be done by fixup_partitions. */ >> + fixup_partitions (); >> + >> #ifdef ENABLE_CHECKING >>verify_flow_info (); >> #endif >> @@ -2173,6 +2209,101 @@ get_last_bb_insn (basic_block bb) >>return end; >> } >> >> +/* Sanity check partition hotness to ensure that basic blocks in >> + the cold partition don't dominate basic blocks in the hot partition. >> + If FLAG_ONLY is true, report violations as errors. Otherwise >> + re-mark the dominated blocks as cold, since this is run after >> + cfg optimizations that may make hot blocks previously reached >> + by both hot and cold blocks now only reachable along cold paths. */ >> + >> +vec >> +find_partition_fixes (bool flag_only) >> +{ >> + basic_block bb; >> + vec bbs_in_cold_partition = vNULL; >> + vec bbs_to_fix = vNULL; >> + >> + if (!crtl->has_bb_partition) >> +return vNULL; > > I'd push this early return into the callers instead, at most turn it into a > gcc_checking_assert to be safe. > > Both callers, fixup_partitions and rtl_verify_edges, look at > ctrl->has_bb_partition already before calling this, so the above should > be dead already. Right, I was being paranoid - changed to a gcc_checking_assert. > > Did my mailer somehow swallow the s
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: > On 02/08/13 13:31, Torvald Riegel wrote: > > On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: > >> Index: libitm/config/s390/target.h > >> === > >> *** libitm/config/s390/target.h.orig > >> --- libitm/config/s390/target.h > >> *** > > > > [...] > > > >> + static inline uint32_t > >> + htm_begin () > >> + { > >> + return __builtin_tbegin_nofloat (NULL); > >> + } > > > > I'm not quite sure I understand why the nofloat variant is sufficient; > > is this because we don't use FPRs in the abort/restart code of libitm > > (and the FPRs are known to be clobbered after starting a txn)? > > Yes. To my understanding there are 2 potential issues when the FPRs are not > saved. > > 1. The abort code will read corrupted FPR content for FPRs: > - which are live across tbegin and > - are updated in the TX body and > - are read in the abort code > The abort code will see the updated value from the TX body. > > Since libitm implements TX begins as function calls only call-saved registers > can be live across a > tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and > get restored when doing > the longjmp back into the user code. So this should be no problem. > > 2. The other problem is that the ABI might get violated when aborting into a > callee which already > returned and there are FPRs which have been modified afterwards. In that case > the compiler would not > have generated FPR save/restore instructions in the function > prologue/epilogue of the callee. But in > fact there are modified FPRs when exiting the second time. > > But again this should be no problem thanks to the setjmp/longjmp semantics of > _ITM_beginTransaction > and GTM_longjmp. If the TX body modified a call-saved FPR it will get > restored on the abort path > back from libitm to the user code. > > > As long as libitm does not use FPRs itself this should be safe without having > tbegin clobbering FPRs. And I suppose that even if it would use FPRs, it shouldn't ever access memory locations touched in user code. Thanks for the explanation. Some of this was "obvious", sorry, but that's what happens when a mental context switch is incomplete :) Next time, I'd prefer a summary of such reasoning to be added as a comment in the source; it's just too easy to forget about all the subtleties... > > There is a bit in the properties passed to _ITM_beginTransaction which > > states whether the txn has floating point update code > > (pr_hasNoFloatUpdate). Would this be useful? > > This probably would be useful in the ITM_beginTransaction / GTM_longjmp > implementations which in > that case could try to avoid the FPR save/restores. But Richard mentioned > that these bits so far are > never set by GCC since it is lacking the required infos from the register > allocator. Another thing that comes to mind is that it might be useful to put parts of the HTM fast path into the asm pieces of _ITM_beginTransaction. That way, one can use exactly the amount of SW setjmp one needs given what's easy to save/restore for the TM; this might decrease the startup costs of a txn somewhat. Andi Kleen posted a patch for how to do this for RTM a while ago (but it had some issues): http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01189.html Perhaps something along these lines (while avoiding the issues :) ) would be useful for s/390 and Power too. Torvald
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
> > 2013-08-01 Teresa Johnson > Steven Bosscher > > * cfgrtl.c (fixup_bb_partition): New routine. > (commit_edge_insertions): Invoke fixup_partitions. > (find_partition_fixes): New routine. > (fixup_partitions): Ditto. > (verify_hot_cold_block_grouping): Update comments. > (rtl_verify_edges): Invoke find_partition_fixes. > (rtl_verify_bb_pointers): Update comments. > (rtl_verify_bb_layout): Ditto. > * basic-block.h (fixup_partitions): Declare. > * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. > * bb-reorder.c (sanitize_dominator_hotness): New function. > (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke > sanitize_dominator_hotness. > > Index: cfgrtl.c > === > --- cfgrtl.c(revision 201281) > +++ cfgrtl.c(working copy) > @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e) > } > } > > +/* Called when block BB has been reassigned to a different partition, > + to ensure that the region crossing attributes are updated. */ > + > +static void > +fixup_bb_partition (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + > + /* Now need to make bb's pred edges non-region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->preds) > +{ > + fixup_partition_crossing (e); > +} > + > + /* Possibly need to make bb's successor edges region crossing, > + or remove stale region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->succs) > +{ > + if ((e->flags & EDGE_FALLTHRU) > + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > + && e->dest != EXIT_BLOCK_PTR) > +force_nonfallthru (e); > + else > +fixup_partition_crossing (e); > +} > +} Is there particular reason why preds can not be fallhtrus and why force_nonfallthru edge does not need partition crossing fixup? (if so, perhpas it could be mentioned in the description, if not, I think force_nonfallthru path has to check if new BB was introduced and do the right thing on the edge. > +/* Sanity check partition hotness to ensure that basic blocks in > + the cold partition don't dominate basic blocks in the hot partition. > + If FLAG_ONLY is true, report violations as errors. Otherwise > + re-mark the dominated blocks as cold, since this is run after > + cfg optimizations that may make hot blocks previously reached > + by both hot and cold blocks now only reachable along cold paths. */ With profile, I suppose we can have cold blocks dominating hot blocks when the hot blocks is in loop whose trip count is high enough. Indeed for partitioning reasons it does not make sense to push those into different section. I also wonder, if we finally get the pass stable, can we enable it by default and offline probably cold blocks w/o profile? Primarily blocks reachable only by EH + blocks leading to a crash or throw(). For C++ those should be common enough to make a difference... Honza
Go patch committed: Always put descriptor wrappers in their own section
This patch to the Go frontend always puts descriptor wrapper functions in their own section. This lets the linker GC them. That is convenient since we must generate a wrapper for every top-level exported function, although most of them will never be used. We can only know whether they are used or not at link time. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r bc1ad842d946 go/gogo.cc --- a/go/gogo.cc Wed Jul 24 15:24:12 2013 -0700 +++ b/go/gogo.cc Fri Aug 02 09:58:44 2013 -0700 @@ -3582,6 +3582,12 @@ Named_object* dno = gogo->start_function(name, new_fntype, false, loc); dno->func_value()->is_descriptor_wrapper_ = true; + // Put the wrapper in a unique section so that it can be discarded + // by the linker if it is not needed. Every top-level function will + // get a wrapper, in case there is a reference other than a call + // from some other package, but most will not need one. + dno->func_value()->set_in_unique_section(); + gogo->start_block(loc); Expression* fn = Expression::make_func_reference(no, NULL, loc);
Merge from 4.8 branch to gccgo branch
I've merged revision 201443 from the GCC 4.8 branch to the gccgo branch. Ian
Re: [PATCH/Merge Request] Vtable Verification feature.
On Thu, Aug 1, 2013 at 1:19 PM, Benjamin De Kosnik wrote: > You'll need a GWP to do the merge (Maybe Diego, Jason, Richard > Henderson?) and then add yourself to MAINTAINERS for libvtv. Eh, no. We need a GWP to approve the final patch, but Caroline can and should do the merge herself. The easiest approach is to generate a patch on an svn checkout of trunk. Caroline and I discussed this earlier today. It goes something like this (on a git client): $ git checkout -b vtv vtv $ git merge origin/trunk ... fix merge conflicts ... $ git commit $ git diff trunk >patch ... Get rid of all the ChangeLog.vtv and configure diffs ... ... Use git log to find the svn revision REV for the local trunk git tree ... $ git co svn://gcc.gnu.org/gcc/trunk@REV $ patch -p1
Re: [GOOGLE] Refactor AutoFDO
More to follow. David >>static void >> read_profile (void) >> { >> if (gcov_open (auto_profile_file, 1) == 0) >> { >> inform (0, "Cannot open profile file %s.", auto_profile_file); Should be at least warning instead -- I think error is probably more appropriate -- this is different from regular FDO, where one missing gcda might be ok. >> flag_auto_profile = 0; >> return; >> } >> >> if (gcov_read_unsigned () != GCOV_DATA_MAGIC) >> { >> inform (0, "Magic number does not mathch."); Should be an error. >> flag_auto_profile = 0; >> return; >> } >> >> >> /* function_name_map. */ >> afdo_function_name_map = function_name_map::create (); >> if (afdo_function_name_map == NULL) >> { >> inform (0, "Cannot read string table."); Should be an error unless there is a way to recover. Falling back to non-fdo is not the solution. >> flag_auto_profile = 0; >> return; >> } >> >> /* autofdo_source_profile. */ >> afdo_source_profile = autofdo_source_profile::create (); >> if (afdo_source_profile == NULL) >> { >> inform (0, "Cannot read function profile."); An error. >> flag_auto_profile = 0; >> return; >> } >> >> /* autofdo_module_profile. */ >> afdo_module_profile = autofdo_module_profile::create (); >> if (afdo_module_profile == NULL) >> { >> inform (0, "Cannot read module profile."); Should be an error. >> flag_auto_profile = 0; >> return; >> } >> >> /* Read in the working set. */ >> if (gcov_read_unsigned () != GCOV_TAG_AFDO_WORKING_SET) >> { >> inform (0, "Not expected TAG."); Error. >> unsigned curr_module = 1, max_group = PARAM_VALUE (PARAM_MAX_LIPO_GROUP); >> for (string_vector::const_iterator iter = aux_modules->begin(); >>iter != aux_modules->end(); ++iter) >> { >> gcov_module_info *aux_module = afdo_module_profile->get_module (*iter); >> if (aux_module == module) >> continue; >> if (aux_module == NULL) >> { >> inform (0, "aux module %s cannot be found.", *iter); >> continue; >> } >> if ((aux_module->lang & GCOV_MODULE_LANG_MASK) != >> (module->lang & GCOV_MODULE_LANG_MASK)) >> { >> inform (0, "Not importing %s: source language" >> " different from primary module's source language", *iter); Should be guarded with -fopt-info -- see similar code in coverage.c. >> continue; >> } >> if ((aux_module->lang & GCOV_MODULE_ASM_STMTS) >> && flag_ripa_disallow_asm_modules) >> { >> if (flag_opt_info) >>inform (0, "Not importing %s: contains " >>"assembler statements", *iter); Use -fopt-info >> continue; >> } >> if (max_group != 0 && curr_module == max_group) >> { >> if (flag_opt_info) >>inform (0, "Not importing %s: maximum group size reached", *iter); >> } >> if (incompatible_cl_args (module, aux_module)) >> { >> if (flag_opt_info) >>inform (0, "Not importing %s: command-line" >>" arguments not compatible with primary module", *iter); >> continue; Use -fopt-info. >> } >> module_infos[curr_module++] = aux_module; >> add_input_filename (*iter); >> } >> } >> >> /* From AutoFDO profiles, find values inside STMT for that we want to measure >>histograms for indirect-call optimization. */ >> >> { >> hist->hvalue.counters[3] = 0; >> hist->hvalue.counters[4] = 0; >> } It might be better to create a commmon API to create/update histogram entry instead of peeking into the details of the data structure -- to avoid future breaks. Such a change can be done as a follow up and needs to be in trunk. >> } >> >> /* From AutoFDO profiles, find values inside STMT for that we want to measure >>histograms and adds them to list VALUES. */ >> >> static void >> afdo_vpt (gimple stmt, const icall_target_map &map) >> { >> afdo_indirect_call (stmt, map); >> } >> >> /* For a given BB, return its execution count, and annotate value profile >>on statements. */ >> >> static gcov_type >> afdo_get_bb_count (basic_block bb) >> { >> gimple_stmt_iterator gsi; >> gcov_type max_count = 0; >> bool has_annotated = false; >> >> for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> { >> count_info info; >> gimple stmt = gsi_stmt (gsi); >> if (afdo_source_profile->get_count_info (stmt, &info)) >> { indentation. >> if (info.first > max_count) >>max_count = info.first; >> has_annotated = true; >> if (info.second.size() > 0) >>afdo_vpt (stmt, info.second); >> } >> } >> if (has_annotated) >> { >> bb->flags |= BB_ANNOTATED; >> return max_count; >> } >> else >> return 0; >> } >> >> >> static void >> afdo_annotate_cfg (void) >> { >> basic_block bb; >> const function_instance *s = >> afdo_source_profile->get_function_instance_by_decl ( >> current_function_decl); need a new line after decls. On Thu, Aug 1, 2013 at 2:49 PM, Dehao Chen wrote: >
Go patch committed: Always put immutable structs in unique section
The Go backend interfaces uses immutable structs for things like type descriptors, map descriptors, and function descriptors. These objects must always be created for exported names in case they are referenced by a different package, but this often does not happen. This patch moves them to unique sections so that linker GC can discard them a link time. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian 2013-08-02 Ian Lance Taylor * go-gcc.cc (immutable_struct_set_init): Always call resolve_unique_section. Index: go-gcc.cc === --- go-gcc.cc (revision 201222) +++ go-gcc.cc (working copy) @@ -1521,10 +1521,11 @@ Gcc_backend::immutable_struct_set_init(B TREE_PUBLIC(decl) = 1; } else -{ - make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl)); - resolve_unique_section(decl, 1, 0); -} +make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl)); + + // These variables are often unneeded in the final program, so put + // them in their own section so that linker GC can discard them. + resolve_unique_section(decl, 1, 1); rest_of_decl_compilation(decl, 1, 0); }
Re: [PATCH 10/11] Make gcc::context be GC-managed
On Thu, 2013-08-01 at 11:28 -1000, Richard Henderson wrote: > On 07/26/2013 05:04 AM, David Malcolm wrote: > > +/* Functions relating to the garbage collector. */ > > +void > > +gcc::context::gt_ggc_mx () > > +{ > > + /* Currently a no-op. */ > > +} > > + > > +void > > +gcc::context::gt_pch_nx () > > +{ > > + /* Currently a no-op. */ > > +} > > + > > +void > > +gcc::context::gt_pch_nx (gt_pointer_operator op ATTRIBUTE_UNUSED, > > +void *cookie ATTRIBUTE_UNUSED) > > +{ > > + /* Currently a no-op. */ > > +} > > I suppose these are members because you'll want to access private members > later, and that's easier than playing with "friend"? Exactly. The patch adds a comment about this to gcc/context.h. > > +void gt_ggc_mx (gcc::context *ctxt) > > +{ > > + ctxt->gt_ggc_mx (); > > +} > > + > > +void gt_pch_nx (gcc::context *ctxt) > > +{ > > + ctxt->gt_pch_nx (); > > +} > > + > > +void gt_pch_nx (gcc::context *ctxt, gt_pointer_operator op, void *cookie) > > +{ > > + ctxt->gt_pch_nx (op, cookie); > > +} > > Should these be inline functions in context.h, so that the call does direct? > Or do we take their address, making that sorta pointless? > > It seems like there's one level of indirection here that is avoidable... It looks like they can be inlined - their addresses are not taken, and they're only used by gtype-desc.c, inside the autogenerated functions gt_ggc_mx_context, gt_pch_nx_context, and gt_pch_p_7context. I'll make them inline in context.h
Merge from 4.8 branch to gccgo branch
I merged 4.8 branch revision 201447 to the gccgo branch. Ian
Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed
On Thu, 2013-08-01 at 11:45 -1000, Richard Henderson wrote: > On 07/26/2013 05:04 AM, David Malcolm wrote: > > (opt_pass::gt_ggc_mx): New. > > (opt_pass::gt_pch_nx): New. > > (opt_pass::gt_pch_nx_with_op): New. > > (gt_ggc_mx (opt_pass *)): New. > > (gt_pch_nx (opt_pass *)): New. > > (gt_pch_nx_opt_pass): New. > > (pipeline::operator new): New. > > (pipeline::gt_ggc_mx): New. > > (pipeline::gt_pch_nx): New. > > (pipeline::gt_pch_nx_with_op): New. > > (gt_ggc_mx (pipeline *)): New. > > (gt_pch_nx (pipeline *)): New. > > (gt_pch_nx_pipeline): New. > > I guess my previous comments about ::gt_ggc_mx vs class::gt_ggc_mx wrt > the context structure apply as well to this patch. For context.h, the global functions are used by autogenerated code in gtype-desc.c, which calls them the first time the context is visited in a gc or pch traversal; they merely call into the class, to avoid needing friend decls. For opt_pass and pass_manager, something different is going on. For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO, gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types FOO=opt_pass and pass_manager. Presumably this is because the types are only visited by code in context.c So the global functions for opt_pass and pass_manager are a hand-written implementation of what gengtype would write; they are called *each time* the entity is reached during a traversal. The member functions are called only the *first time* the entity is visited. Does this need a descriptive comment in the source code? Thanks Dave
[PATCH] - C11 expressions and stdatomic.h - Just for current state
On 07/30/2013 12:49 PM, Andrew MacLeod wrote: I split the original patch into some smaller hunks, and cleaned up a few bit and pieces here and there... following: Not looking for a review, but posting progress to this point since I'm off for 2 weeks, and someone might want to play with this. I re-examined implementing the atomic type as its own canonical type instead of a qualified variant, as well as some variations... and ultimately, I think the current qualified implementation I posted is probably the best one to work with. Its very convenient to be able to get to the non-atomic type by using TYPE_MAIN_VARIANT... Anyway, so the original 5 patches stand, but will require closer auditing . (starting at http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01478.html) I've attached 2 patches anda header file here. One is the changes required for handling the C11 atomic expressions... ie x += 4.2, and the other is a first cut at the content for the stdatomic.h include file, adna required change for it. The atomic expression patch works on many cases, but falls short with any kind of complexity on the RHS of an expression which needs to be mapped to an atomic_load. Whilst parsing, we cant tell, and the rhs tends to become an expression before we really know. so In the comments I have my proposed solution which I will look at when I return in 2 weeks. Basically, rather than have the ATOMIC_TYPE attribute ripple all the way through an expression, as soon as we see an ATOMIC_TYPE, we wrap it in a new node ATOMIC_EXPR, which holds the atomic expression, and sets it own expression type to the non-atomic variant. This means the only expressions which will have the TYPE_ATOMIC flag set will be ones which are wrapped in ATOMIC_EXPR, and those are the ones where we need to go and replace with an __atomic_load or whatever. I have also punted on the floating point exception stuff which footnote 113 requires.. we'll worry about that later. Nevertheless, the patch that is there can do some interest8ing things... it can compile and execute this test file successfully as proof of concept... with all the right conmversions and use of lockfree routines when available. (use -mcx16 on an x86_64 box, btw, or the __atomic_*_16 routines will be unresolved ) complex types also work. long double complex is 32 bytes in size, and will make calls into libatomic to resolve those operations. double __attribute__((atomic)) a; float __attribute__((atomic)) t; long double __attribute__((atomic)) ld; char __attribute__((atomic)) c; int __attribute__((atomic)) i; long long __attribute__((atomic)) l; int g; main () { g = 40; ld = t = a = 4.567; c = l = i = g + 2; printf ("%f == %f == %Lf == 4.567\n", t, a, ld); printf ("%d == %d == %ld == 42\n", c , i, l); } I have also attach a mockup of stdatomic.h... I haven't even tried compiling or testing it yet, so it may have a syntax error or something, but I wanted to see if the functionality was possible. THe only thing missing is the ability to find the non-atomic type of an atomic type. Ive included an additional 2 line patch which should change the meaning of __typeof__ (again untested, the joys of imminently leaving for 2 weeks :-). Im not sure the normal practical uses of __typeof__ have much meaning for an atomic type, it seems far more useful to have __typeof__ for an atomic qualified type to return the non-atomic variant. If there is resistance to that definition, then I'll need to create a __nonatomic_typeof__ variant... thats just a bit more work. In any case, that patch is required for stdatomic.h to be implemented. You can see where I used it to call the generic atomic operations with all the proper types for the non-atomic fields which need a temp vazr. The header file is actually quite short. I don't know how to implement it other than with macro wrappers around the builtins. And thats it. I'll be back in 2 weeks to get back at wrapping this, and adding some testcases and error reporting. unless *YOU* beat me to it :-) Andrew * c/c-parser.c (c_parser_expr_no_commas): Call build_atomic_load on RHS. * c/c-typeck.c (build_modify_expr): Call build_atomic_assign for atomic expressions. (build_atomic_assign): New. Construct appropriate sequences for atomic assignemnt operations. (build_atomic_load): New. Construct an atomic load. * c-family/c-common.h (build_atomic_load): Prototype. Index: gcc/c/c-parser.c === *** gcc/c/c-parser.c (revision 201248) --- gcc/c/c-parser.c (working copy) *** c_parser_expr_no_commas (c_parser *parse *** 5440,5445 --- 5440,5472 code = BIT_IOR_EXPR; break; default: + /* TODO + This doesnt work. Expressions like ~a where a is atomic do not function + properly. since the rhs is parsed/consumed as an entire expression. + and at the t
Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed
On 08/02/2013 09:08 AM, David Malcolm wrote: > For opt_pass and pass_manager, something different is going on. I'd wondered about that. > For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO, > gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types > FOO=opt_pass and pass_manager. Presumably this is because the types are > only visited by code in context.c > > So the global functions for opt_pass and pass_manager are a hand-written > implementation of what gengtype would write; they are called *each time* > the entity is reached during a traversal. The member functions are > called only the *first time* the entity is visited. I wonder if we can reduce the amount of boiler-plate for this. Perhaps, --- // // These templates assume the presence of several member functions. // template inline void gt_ggc_mx (T *p) { if (ggc_test_and_set_mark (p)) p->gt_ggc_mx (); } template void gt_pch_nx_with_obj(void *this_obj, void *p, gt_pointer_operator op, void *cookie) { if (p == this_obj) { T *t = static_cast(p); t->gt_pch_nx_with_obj (op, cookie); } } template inline void gt_pch_nx (T *p) { if (gt_pch_note_object (p, p, gt_pch_nx_with_obj)) p->gt_pch_nx (); } - I had thought about an abstract base class instead of the templates, but that would unnecessarily force the use of vtables in places that don't need them. In some cases this would be relatively harmless (e.g. the very few pass_manager objects), but in others it might be a no-go. The use of the template obviates that. It ought only be instantiated when necessary for gty user objects that don't have their own specialization. Thoughts? r~
Re: [PATCH] S/390: Hardware transactional memory support
On 08/02/2013 04:45 AM, Andreas Krebbel wrote: > ! XCFLAGS="${XCFLAGS} -mzarch -mhtm -msoft-float" Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. r~
Re: [PATCH, rs6000] Add builtin support for power8 32-bit Altivec multiply insns
On Thu, Aug 1, 2013 at 2:21 PM, Peter Bergner wrote: > This patch adds builtin support for the new 32-bit Altivec multiply > instructions that were added in ISA 2.07 (ie, POWER8). > > This passed bootstrap and regtesting with no errors. Ok for mainline? Peter, The builtins and patterns seem to be in a random order. Sometimes signed is first and sometimes unsigned is first. Please make the ordering of the names consistent. If these operations are multiplies, why are they described as UNSPEC instead of MULT with SELECT? Thanks, David
Re: [PATCH, rs6000] Add builtin support for power8 32-bit Altivec multiply insns
On Fri, 2013-08-02 at 17:13 -0400, David Edelsohn wrote: > The builtins and patterns seem to be in a random order. Sometimes > signed is first and sometimes unsigned is first. Please make the > ordering of the names consistent. Ok, I can clean that up. > If these operations are multiplies, why are they described as UNSPEC > instead of MULT with SELECT? That's a good question. I was just following how the vmul[eo][su][bh] define_insns are defined and assumed they were that way for a reason. Looking at the history of altivec.md, they seem to have been that way since day 1 when Aldy created the file. Peter
Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed
On Fri, 2013-08-02 at 10:01 -1000, Richard Henderson wrote: > On 08/02/2013 09:08 AM, David Malcolm wrote: > > For opt_pass and pass_manager, something different is going on. > > I'd wondered about that. > > > For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO, > > gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types > > FOO=opt_pass and pass_manager. Presumably this is because the types are > > only visited by code in context.c > > > > So the global functions for opt_pass and pass_manager are a hand-written > > implementation of what gengtype would write; they are called *each time* > > the entity is reached during a traversal. The member functions are > > called only the *first time* the entity is visited. > > I wonder if we can reduce the amount of boiler-plate for this. Perhaps, > > --- > // > // These templates assume the presence of several member functions. > // > > template > inline void gt_ggc_mx (T *p) > { > if (ggc_test_and_set_mark (p)) > p->gt_ggc_mx (); > } > > template > void gt_pch_nx_with_obj(void *this_obj, void *p, > gt_pointer_operator op, void *cookie) > { > if (p == this_obj) > { > T *t = static_cast(p); > t->gt_pch_nx_with_obj (op, cookie); > } > } > > template > inline void gt_pch_nx (T *p) > { > if (gt_pch_note_object (p, p, gt_pch_nx_with_obj)) > p->gt_pch_nx (); > } > > - > > I had thought about an abstract base class instead of the templates, but > that would unnecessarily force the use of vtables in places that don't > need them. In some cases this would be relatively harmless (e.g. the > very few pass_manager objects), but in others it might be a no-go. > > The use of the template obviates that. It ought only be instantiated when > necessary for gty user objects that don't have their own specialization. I like this approach, and I'm trying a bootstrap now with the templates. I changed the "with_obj" to "with_op", on the grounds that you always have an obj, but you don't always have an op. FWIW I had a go at avoiding templates by attempting to tell gengtype to write out functions for all GTY((user)) types, regardless of whether it thinks they're referenced, with this: @@ -3697,7 +3697,8 @@ write_types (outf_p output_header, type_p structures, type_p param_structs, /* At last we emit the functions code. */ oprintf (output_header, "\n/* functions code */\n"); for (s = structures; s; s = s->next) -if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO) +if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO +|| s->kind == TYPE_USER_STRUCT) { options_p opt; but I ran into various "incomplete structure" errors due to gengtype's lack of a full C/C++ parser. Hence templates seem the sanest option.
Re: [PATCH, rs6000] Add builtin support for power8 32-bit Altivec multiply insns
On Fri, Aug 02, 2013 at 04:31:37PM -0500, Peter Bergner wrote: > On Fri, 2013-08-02 at 17:13 -0400, David Edelsohn wrote: > > The builtins and patterns seem to be in a random order. Sometimes > > signed is first and sometimes unsigned is first. Please make the > > ordering of the names consistent. > > Ok, I can clean that up. > > > > If these operations are multiplies, why are they described as UNSPEC > > instead of MULT with SELECT? > > That's a good question. I was just following how the vmul[eo][su][bh] > define_insns are defined and assumed they were that way for a reason. > Looking at the history of altivec.md, they seem to have been that > way since day 1 when Aldy created the file. Given the age of the original mods, either vec_select didn't exist back then, or Geoff just liked unspec. I've cleaned up a few in altivec.md over the years. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
patch to fix PR58048
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58048 LRA has a correct behaviour (as reload) on the test with -O0. But with -O2 a constant propagation into asm operands resulted in LRA cycling instead of error reporting. The patch was successfully bootstrapped and tested on x86/x86-64. Committed as rev. 201454. 2013-08-02 Vladimir Makarov PR rtl-optimization/58048 * lra-constraints.c (process_alt_operands): Don't check asm operand on register. 2013-08-02 Vladimir Makarov PR rtl-optimization/58048 * gcc.target/i386/pr58048.c: New. Index: lra-constraints.c === --- lra-constraints.c (revision 201438) +++ lra-constraints.c (working copy) @@ -1892,7 +1892,7 @@ process_alt_operands (int only_alternati /* For asms, verify that the class for this alternative is possible for the mode that is specified. */ - if (!no_regs_p && REG_P (op) && INSN_CODE (curr_insn) < 0) + if (!no_regs_p && INSN_CODE (curr_insn) < 0) { int i; for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) Index: testsuite/gcc.target/i386/pr58048.c === --- testsuite/gcc.target/i386/pr58048.c (revision 0) +++ testsuite/gcc.target/i386/pr58048.c (working copy) @@ -0,0 +1,11 @@ +/* PR target/58048 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void +div3 (void) +{ +double tmp1; + +asm volatile ("fscale":"=t" (tmp1):"0" (0), "u" (0)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */ +}
Re: [libgo PATCH 0/1] mksyscalls.awk: make split ERE more portable
On Thu, Aug 1, 2013 at 5:36 AM, Bernhard Reutner-Fischer wrote: > > When using busybox' awk to bootstrap, libgo's syscalls are generated > incorrectly. > > I'm attaching the split() used by busybox' awk including output before > and after the patch for reference. > > Please install / ok to install? I installed a slightly different version, that retains the desired semantics (skip arbitrary amounts of whitespace). This should avoid confusing the busybox awk by matching the null string. Committed to mainline and 4.8 branch. Thanks for reporting the issue. Ian foo.patch Description: Binary data
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Fri, Aug 2, 2013 at 5:05 PM, Jan Hubicka wrote: >> +/* Called when block BB has been reassigned to a different partition, >> + to ensure that the region crossing attributes are updated. */ >> + >> +static void >> +fixup_bb_partition (basic_block bb) >> +{ >> + edge e; >> + edge_iterator ei; >> + >> + /* Now need to make bb's pred edges non-region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->preds) >> +{ >> + fixup_partition_crossing (e); >> +} >> + >> + /* Possibly need to make bb's successor edges region crossing, >> + or remove stale region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->succs) >> +{ >> + if ((e->flags & EDGE_FALLTHRU) >> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> + && e->dest != EXIT_BLOCK_PTR) >> +force_nonfallthru (e); >> + else >> +fixup_partition_crossing (e); >> +} >> +} > > Is there particular reason why preds can not be fallhtrus Yes, by definition a crossing edge cannot fall through. There is always a control transfer from one section to another. >> +/* Sanity check partition hotness to ensure that basic blocks in >> + the cold partition don't dominate basic blocks in the hot partition. >> + If FLAG_ONLY is true, report violations as errors. Otherwise >> + re-mark the dominated blocks as cold, since this is run after >> + cfg optimizations that may make hot blocks previously reached >> + by both hot and cold blocks now only reachable along cold paths. */ > > With profile, I suppose we can have cold blocks dominating hot blocks when the > hot blocks is in loop whose trip count is high enough. That is the common case, actually. > Indeed for partitioning > reasons it does not make sense to push those into different section. The partitioning algrorithm makes sure this doesn't happen. The hottest path from the entry block to a hot basic block is always part of the hot partition. > I also wonder, if we finally get the pass stable, can we enable it by default > and offline probably cold blocks w/o profile? That is the general idea behind all this work, obviously ;-) > Primarily blocks reachable only > by EH + blocks leading to a crash or throw(). For C++ those should be common > enough to make a difference... Yup, and IIRC Theresa posted some numbers that showed this. Ciao! Steven
Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed
On 08/02/2013 11:53 AM, David Malcolm wrote: > FWIW I had a go at avoiding templates by attempting to tell gengtype to > write out functions for all GTY((user)) types, regardless of whether it > thinks they're referenced, with this: > @@ -3697,7 +3697,8 @@ write_types (outf_p output_header, type_p structures, > type_p param_structs, >/* At last we emit the functions code. */ >oprintf (output_header, "\n/* functions code */\n"); >for (s = structures; s; s = s->next) > -if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO) > +if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO > +|| s->kind == TYPE_USER_STRUCT) >{ > options_p opt; > > but I ran into various "incomplete structure" errors due to gengtype's > lack of a full C/C++ parser. > > Hence templates seem the sanest option. I was actually contemplating going the other direction -- have gengtype write out _less_, relying on the templates more. I hadn't gone very far down that road yet... r~
Re: [PATCH 3.1/11] Explicitly initialize the macro-generated pass fields (was Re: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)
On Thu, 2013-08-01 at 13:13 -0400, David Malcolm wrote: > On Mon, 2013-07-29 at 15:41 -0600, Jeff Law wrote: > > On 07/26/2013 09:04 AM, David Malcolm wrote: > > > This patch is the hand-written part of the conversion of passes from > > > C structs to C++ classes. It does not work without the subsequent > > > autogenerated part, which is huge. > > [ ... ] > > With the changes from pipeline -> pass_manager this is fine. As is the > > follow-up autogenerated patch. > > Thanks. > > The current status is that Jeff has approved patches 1-5 for the trunk, > and patches 6-11 haven't yet been reviewed by a global reviewer. I have > committed patches 1 and 2, having bootstrapped+tested each in turn, but > I can't commit any more of these without further review - details > follow. > > I had only bootstrapped and tested the combination of all 11 patches > together, so I've been attempting to test the approved patches. For > reference: > * Patch 3 is the handwritten part of the conversion of passes to C++ > classes > * Patch 4 is the autogenerated part > * Patch 5 adds -fno-rtti to the testsuite when building plugins > * Patch 6 is the not-yet-approved patch currently named "Rewrite how > instances of passes are cloned" (but that's not all it does, see below). > > I had thought that the combination of patch 3 + 4 + 5 would work as a > unit, and hoped to commit each of these after testing the combination of > (3, 4, 5), but upon attempting a bootstrap I found two bugs: > > (a) patch 3 adds an gcc_assert to the pass_manager constructor to > ensure that pass instance fields are NULL when created, to ensure that > the macro-driven logic is correct. However, the instance fields > haven't been explicitly initialized at that point, leading to sporadic > assertion failures. This wasn't an issue for the full patch series > since patch 11 adds an operator new for pass_manager, allocating it from > the GC heap, using the *cleared* allocator: >void* >pass_manager::operator new (size_t sz) >{ > return ggc_internal_cleared_alloc_stat (sz MEM_STAT_INFO); >} > hence ensuring that the fields are zeroed. So patch 3 works with patch > 11, but not without. In the meantime, I'm attaching a new patch "3.1", > which explicitly zeroes these fields early on in the pass_manager ctor. > > (b) with just these patches, although static_pass_number for every > pass instance is correct, the dumpfile *switch* numbering is wrong, > leading to numerous testsuite failures (e.g. "unrecognized command line > option '-fdump-tree-dce2'") . Patch 6 is needed: it does the necessary > fixups at the right time to ensure that the per-pass-instance dump files > get the correct switch names (I'll add some more notes on this to that > email). > > With the attached patch, the combination of patches (03, 03.1, 04, 05 > *and* 06) successfully bootstraps on x86_64-unknown-linux-gnu, and all > testcases show the same results as an unpatched build (relative to > r201397). > > So I'm looking for review of the attached patch, and of at least patch 6 > before I can proceed with this. AIUI, Jeff is away at the moment, so > another global reviewer would need to do it. Here's a better version of this patch: instead of explicitly initializing the fields (which would become redundant when patch 11 goes in), instead this patch adds an operator new, using xcalloc. This operator new gets reimplemented in a later version of patch 11 to use ggc_internal_cleared_alloc_stat when the class becomes GC-managed, hence at each stage of committing, the initialization is sane. I've successfully bootstrapped the *end result* of the revised patch series 3-11 on x86_64-unknown-linux-gnu: all testcases show the same results as an unpatched build (relative to r201397). OK for trunk? (as part of patches 3-6) >From 3198835b85b9a35906bf93ba8f510762e8e017b9 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Wed, 31 Jul 2013 14:20:07 -0400 Subject: [PATCH 04/15] Zero-initialize pass_manager Ensure that pass_manager instances are fully zero-initialized, by providing an operator new, implemented using xcalloc. gcc/ * passes.c (pass_manager::operator new): New. --- gcc/pass_manager.h | 2 ++ gcc/passes.c | 7 +++ 2 files changed, 9 insertions(+) diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h index ea078a5..00f0b1c 100644 --- a/gcc/pass_manager.h +++ b/gcc/pass_manager.h @@ -47,6 +47,8 @@ class context; class pass_manager { public: + void *operator new (size_t sz); + pass_manager(context *ctxt); void register_pass (struct register_pass_info *pass_info); diff --git a/gcc/passes.c b/gcc/passes.c index fcbd630..8efce30 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1339,6 +1339,13 @@ pass_manager::register_pass (struct register_pass_info *pass_info) -> all_passes */ +void * +pass_manager::operator new (size_t sz) +{ + /* Ensure that all fields of the pass manager are zero-initialized. */ + retu
Updated patch 10 (was Re: [PATCH 10/11] Make gcc::context be GC-managed)
On Fri, 2013-08-02 at 14:31 -0400, David Malcolm wrote: > On Thu, 2013-08-01 at 11:28 -1000, Richard Henderson wrote: > > On 07/26/2013 05:04 AM, David Malcolm wrote: > > > +/* Functions relating to the garbage collector. */ > > > +void > > > +gcc::context::gt_ggc_mx () > > > +{ > > > + /* Currently a no-op. */ > > > +} > > > + > > > +void > > > +gcc::context::gt_pch_nx () > > > +{ > > > + /* Currently a no-op. */ > > > +} > > > + > > > +void > > > +gcc::context::gt_pch_nx (gt_pointer_operator op ATTRIBUTE_UNUSED, > > > + void *cookie ATTRIBUTE_UNUSED) > > > +{ > > > + /* Currently a no-op. */ > > > +} > > > > I suppose these are members because you'll want to access private members > > later, and that's easier than playing with "friend"? > > Exactly. The patch adds a comment about this to gcc/context.h. > > > > +void gt_ggc_mx (gcc::context *ctxt) > > > +{ > > > + ctxt->gt_ggc_mx (); > > > +} > > > + > > > +void gt_pch_nx (gcc::context *ctxt) > > > +{ > > > + ctxt->gt_pch_nx (); > > > +} > > > + > > > +void gt_pch_nx (gcc::context *ctxt, gt_pointer_operator op, void *cookie) > > > +{ > > > + ctxt->gt_pch_nx (op, cookie); > > > +} > > > > Should these be inline functions in context.h, so that the call does direct? > > Or do we take their address, making that sorta pointless? > > > > It seems like there's one level of indirection here that is avoidable... > > It looks like they can be inlined - their addresses are not taken, and > they're only used by gtype-desc.c, inside the autogenerated functions > gt_ggc_mx_context, gt_pch_nx_context, and gt_pch_p_7context. > > I'll make them inline in context.h Here's an updated version of the patch which makes them inline. I've successfully bootstrapped the *end result* of the revised patch series 3-11 (plus the patch "3.1" from [1]) on x86_64-unknown-linux-gnu: all testcases show the same results as an unpatched build (relative to r201397). OK for trunk? [1] http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00118.html >From f5fba88b187c68f92e3c797b83a3ff4b6e6c2e48 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Tue, 30 Jul 2013 12:30:15 -0400 Subject: [PATCH 11/15] Make gcc::context be GC-managed This patch makes gcc::context instances be allocated within the GC-heap, and adds traversal hooks for GC/PCH so that a gcc::context can own refs to other GC-allocated objects. gcc/ * Makefile.in (GTFILES): Add context.h. * context.c (gcc::context::operator new): New. (gcc::context::gt_ggc_mx): New. (gcc::context::gt_pch_nx): New. (gcc::context::gt_pch_nx): New. * context.h (gcc::context): Add GTY((user)) marking. (gcc::context::operator new): New. (gcc::context::gt_ggc_mx): New. (gcc::context::gt_pch_nx): New. (gcc::context::gt_pch_nx): New. (g): Add GTY marking. (gt_ggc_mx (gcc::context *)): New. (gt_pch_nx (gcc::context *)): New. (gt_pch_nx (gcc::context *ctxt, gt_pointer_operator op, void *cookie)): New. * gengtype.c (open_base_files) : Add context.h. --- gcc/Makefile.in | 1 + gcc/context.c | 26 + gcc/context.h | 59 +++-- gcc/gengtype.c | 2 +- 4 files changed, 85 insertions(+), 3 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index afce540..61a4d7c 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3819,6 +3819,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/ipa-inline.h \ $(srcdir)/asan.c \ $(srcdir)/tsan.c \ + $(srcdir)/context.h \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/context.c b/gcc/context.c index b515241..ba6f335 100644 --- a/gcc/context.c +++ b/gcc/context.c @@ -27,7 +27,33 @@ along with GCC; see the file COPYING3. If not see /* The singleton holder of global state: */ gcc::context *g; +void * +gcc::context::operator new (std::size_t size) +{ + return ggc_internal_cleared_alloc_stat (size MEM_STAT_INFO); +} + gcc::context::context() { passes_ = new gcc::pass_manager (this); } + +/* Functions relating to the garbage collector. */ +void +gcc::context::gt_ggc_mx () +{ + /* Currently a no-op. */ +} + +void +gcc::context::gt_pch_nx () +{ + /* Currently a no-op. */ +} + +void +gcc::context::gt_pch_nx (gt_pointer_operator op ATTRIBUTE_UNUSED, + void *cookie ATTRIBUTE_UNUSED) +{ + /* Currently a no-op. */ +} diff --git a/gcc/context.h b/gcc/context.h index 66260cd..2211dc4 100644 --- a/gcc/context.h +++ b/gcc/context.h @@ -27,11 +27,30 @@ class pass_manager; /* GCC's internal state can be divided into zero or more "parallel universe" of state; an instance of this class is one such context of state. */ -class context +class GTY((user)) context { public: + /* Ensure that instances are allocated within the GC-heap. */ + void *operator new (std::size_t size); + context(); + /* Garbage-collector integration. + + Each context assumes it has full control of th
Updated patch (was Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed)
On Fri, 2013-08-02 at 10:01 -1000, Richard Henderson wrote: > On 08/02/2013 09:08 AM, David Malcolm wrote: > > For opt_pass and pass_manager, something different is going on. > > I'd wondered about that. > > > For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO, > > gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types > > FOO=opt_pass and pass_manager. Presumably this is because the types are > > only visited by code in context.c > > > > So the global functions for opt_pass and pass_manager are a hand-written > > implementation of what gengtype would write; they are called *each time* > > the entity is reached during a traversal. The member functions are > > called only the *first time* the entity is visited. > > I wonder if we can reduce the amount of boiler-plate for this. Perhaps, > > --- > // > // These templates assume the presence of several member functions. > // > > template > inline void gt_ggc_mx (T *p) > { > if (ggc_test_and_set_mark (p)) > p->gt_ggc_mx (); > } > > template > void gt_pch_nx_with_obj(void *this_obj, void *p, > gt_pointer_operator op, void *cookie) > { > if (p == this_obj) > { > T *t = static_cast(p); > t->gt_pch_nx_with_obj (op, cookie); > } > } > > template > inline void gt_pch_nx (T *p) > { > if (gt_pch_note_object (p, p, gt_pch_nx_with_obj)) > p->gt_pch_nx (); > } > > - > > I had thought about an abstract base class instead of the templates, but > that would unnecessarily force the use of vtables in places that don't > need them. In some cases this would be relatively harmless (e.g. the > very few pass_manager objects), but in others it might be a no-go. > > The use of the template obviates that. It ought only be instantiated when > necessary for gty user objects that don't have their own specialization. Thanks. Here's an updated version of the patch which uses your template idea, removing the handcoded functions. I put the templates in ggc.h, with a big comment. I changed the "with_obj" to "with_op", on the grounds that you always have an obj, but you don't always have an op. I stepped through a collection, and observed gcc::context::gt_ggc_mx calling gt_ggc_mx, which then called gt_ggc_mx. I've successfully bootstrapped the *end result* of the revised patch series 3-11 (plus the patch "3.1" from [1], which upsets the numbering) on x86_64-unknown-linux-gnu, and ran the testsuite: all testcases show the same results as an unpatched build (relative to r201397). OK for trunk? [1] http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00118.html >From 2e3002e597eb7cfdbfb274ede7c02c0510e35e70 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Fri, 2 Aug 2013 15:58:46 -0400 Subject: [PATCH 12/15] Make opt_pass and gcc::pass_manager be GC-managed This patch makes gcc::pass_manager and opt_pass instances be allocated within the GC-heap, and adds traversal hooks for GC/PCH, so that passes can own refs to other GC-allocated objects. It also adds templates to ggc.h, to create boilerplate for GTY((user)) types that gengtype can't handle. gcc/ Make opt_pass and gcc::pass_manager be GC-managed, so that pass instances can own GC refs. * Makefile.in (GTFILES): Add pass_manager.h and tree-pass.h. * context.c (gcc::context::gt_ggc_mx): Traverse passes_. (gcc::context::gt_pch_nx): Likewise. (gcc::context::gt_pch_nx): Likewise. * ggc.h (gt_ggc_mx ): New. (gt_pch_nx_with_op ): New. (gt_pch_nx ): New. * passes.c (opt_pass::gt_ggc_mx): New. (opt_pass::gt_pch_nx): New. (opt_pass::gt_pch_nx_with_op): New. (pass_manager::gt_ggc_mx): New. (pass_manager::gt_pch_nx): New. (pass_manager::gt_pch_nx_with_op): New. (pass_manager::operator new): Use ggc_internal_cleared_alloc_stat rather than xcalloc. * pass_manager.h (class pass_manager): Add GTY((user)) marking. (pass_manager::gt_ggc_mx): New. (pass_manager::gt_pch_nx): New. (pass_manager::gt_pch_nx_with_op): New. * tree-pass.h (class opt_pass): Add GTY((user)) marking. (opt_pass::operator new): New. (opt_pass::gt_ggc_mx): New. (opt_pass::gt_pch_nx): New. (opt_pass::gt_pch_nx_with_op): New. --- gcc/Makefile.in| 2 + gcc/context.c | 6 +-- gcc/ggc.h | 46 + gcc/pass_manager.h | 9 - gcc/passes.c | 116 - gcc/tree-pass.h| 13 +- 6 files changed, 184 insertions(+), 8 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 61a4d7c..df24bdc 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3820,6 +3820,8 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/asan.c \ $(srcdir)/tsan.c \ $(srcdir)/context.h \ + $(srcdir)/pass_manager.h \ + $(srcdir)/tree-pass.h \ @all_gtfiles@ # Compute the list of GT header files from the corre
GDB hooks for debugging GCC
GDB 7.0 onwards supports hooks written in Python to improve the quality-of-life within the debugger. The best known are the pretty-printing hooks [1], which we already use within libstdc++ for printing better representations of STL containers. I've written debug hooks for when the inferior process is *GCC*, which I'm attaching. Here's a debug session of cc1 using them, with some comments added by hand: [david@surprise gcc]$ gdb \ -iex "add-auto-load-safe-path ." \ --args ./cc1 foo.c -O3 (..snip gdb startup...) Successfully loaded GDB hooks for GCC (gdb) break expand_gimple_stmt Breakpoint 5 at 0x68e94a: file ../../src/gcc/cfgexpand.c, line 2357. (gdb) run Breakpoint 5, expand_gimple_stmt (stmt= ) at ../../src/gcc/cfgexpand.c:2357 2357 location_t saved_location = input_location; (gdb) bt #0 expand_gimple_stmt (stmt=) at ../../src/gcc/cfgexpand.c:2357 (note how stmt isn't just an address, it tells you the type of stmt) #1 0x00694dc6 in expand_gimple_basic_block (bb= , disable_tail_calls=false) at ../../src/gcc/cfgexpand.c:4204 (note how the index of the basic block is given) #2 0x00696855 in gimple_expand_cfg () at ../../src/gcc/cfgexpand.c:4723 #3 0x009b59ad in execute_one_pass (pass= ) at ../../src/gcc/passes.c:1995 #4 0x009b5bb7 in execute_pass_list (pass= ) at ../../src/gcc/passes.c:2047 (note how the name and static_pass_number of the pass is given) #5 0x006c288a in expand_function (node= ) at ../../src/gcc/cgraphunit.c:1594 (note how the cgraph_node* is printed with the decl name, so you can easily figure out which function you're in) (gdb) fin Run till exit from #0 expand_gimple_stmt (stmt= ) at ../../src/gcc/cfgexpand.c:2357 0x00694dc6 in expand_gimple_basic_block (bb= , disable_tail_calls=false) at ../../src/gcc/cfgexpand.c:4204 4204 last = expand_gimple_stmt (stmt); Value returned is $3 = (note 9 8 10 [bb 3] NOTE_INSN_BASIC_BLOCK) (note how the rtx_def* is printed inline. This last one is actually a kludge; all the other pretty-printers work inside gdb, whereas this one injects a call into print-rtl.c into the inferior). Edges are printed giving the endpoints: (gdb) p e $1 = 6)> Note that this doesn't eliminate the .gdbinit script that some people use, but it's arguably far superior, in that it happens automatically, and for all values that are printed e.g. in backtraces, viewing struct fields, etc. I'd like to get this into trunk. Some remaining issues: * installation; currently one has to manually copy it into place, as noted above; it would be nice to automate things so that during a build it gets copied to cc1-gdb.py, cc1plus-gdb.py etc * it hardcoded values in a few places rather than correctly looking up enums * the kludge in the RTX printer mentioned above. Hope this is helpful Dave [1] http://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html """ Installation gdb automatically looks for python files with a -gdb.py suffix relative to executables and DSOs. Currently you need to copy up this file to e.g. "cc1-gdb.py": cp ../../src/gcc/gdb-hooks.py cc1-gdb.py You may see a message from gdb of the form: cc1-gdb.py auto-loading has been declined by your `auto-load safe-path' as a protection against untrustworthy python scripts. See http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html A workaround is to add: -iex "add-auto-load-safe-path ." to the gdb invocation, to mark the current directory as trustworthy. During development, I've been manually invoking the code in this way: cp ../../src/gcc/gdb-hooks.py ./cc1-gdb.py \ && gdb \ -iex "add-auto-load-safe-path ." \ -ex "break expand_gimple_stmt" \ -ex "run" \ -ex "bt" \ --args \ ./cc1 foo.c -O3 Examples of output using the pretty-printers Pointer values are generally shown in the form: For example, an opt_pass* might appear as: (gdb) p pass $2 = The name of the pass is given ("expand"), together with the static_pass_number. Note that you can suppress pretty-printers using /r (for "raw"): (gdb) p /r pass $3 = (opt_pass *) 0x188b600 and dereference the pointer in the normal way: (gdb) p *pass $4 = {type = RTL_PASS, name = 0x120a312 "expand", [etc, ...snipped...] Basic blocks are shown with their index in parentheses, apart from the CFG's entry and exit blocks, which are given as "ENTRY" and "EXIT": (gdb) p bb $9 = (gdb) p cfun->cfg->x_entry_block_ptr $10 = (gdb) p cfun->cfg->x_exit_block_ptr $11 = CFG edges are shown with the src and dest blocks given in parentheses: (gdb) p e $1 = 6)> Tree nodes are printed using Python code that emulates print_node_brief, running in gdb, rather than in the inferior: (gdb) p cfun->decl $1 = For usability, the type is printed
New parameters to control stringop expansion libcall strategy
On x86_64, when the expected size of memcpy/memset is known (e.g, with FDO), libcall strategy is used with the size is > 8192. This value is hard coded, which makes it hard to do performance tuning. This patch adds two new parameters to do that. Potential usage includes per-application libcall strategy min-size tuning based on summary data with FDO (e.g, instruction workset size). Bootstrap and tested on x86_64/linux. Ok for trunk? thanks, David 2013-08-02 Xinliang David Li * params.def: New parameters. * config/i386/i386.c (ix86_option_override_internal): Override default libcall size limit with parameters. Index: config/i386/i386.c === --- config/i386/i386.c (revision 201458) +++ config/i386/i386.c (working copy) @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost = }; /* Processor costs (relative to an add) */ -static const +static struct processor_costs i386_cost = { /* 386 specific costs */ COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {/* 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs i486_cost = { /* 486 specific costs */ COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {/* 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs pentium_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs pentiumpro_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost = 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs geode_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -518,7 +518,7 @@ struct processor_costs geode_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs k6_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -591,7 +591,7 @@ struct processor_costs k6_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs athlon_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs k8_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs pentium4_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (3), /* cost of a lea instruction */ @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs nocona_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs atom_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = { }; /* Generic64 should produce code tuned for Nocona and K8. */ -static const +static struct processor_costs generic64_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ /* On all chips taken into consideration lea is 2 cy
Re: C++ coding conventions: namespaces, references and getters (was Re: [PATCH 2/2] Introduce beginnings of a pipeline class.)
[ Adding Benjamin, Diego, Lawrence ] General remarks first: When we designed the coding standards for GCC, an overriding philosophy was that we did not want to be prescriptive. Rather, we explicitly wanted to encourage common sense and trust the judgment of maintainers to make sound and appropriate judgments. More than a year later, I still believe that was the right thing to do and I would not want us to start being prescriptive. On Mon, Jul 29, 2013 at 1:20 PM, David Malcolm wrote: > On Thu, 2013-07-25 at 15:08 +0200, Martin Jambor wrote: >> Hi, >> >> I don't know why it's me again but again I do have a few comments. > > Thanks for looking over the patch. > >> One global remark first: If we are going to start using the gcc >> namespace (I understand it you need for isolation of symbols once you >> use gcc as library, right?), I'm wondering whether mixing "using >> namespace gcc" and explicit identifier qualifications is a good idea. >> We have never had a discussion about namespace conventions but I'd >> suggest that you add the using directive at the top of all gcc source >> files that need it and never use explicit gcc:: qualification (unless >> there is a reason for it, of course, like in symbol definitions). >> >> But perhaps someone else with more C++ experience disagrees? > > http://gcc.gnu.org/codingconventions.html#Namespace_Use says: > >> "Namespaces are encouraged. All separable libraries should have a > unique global namespace." > [...snip...] >> "Header files should have neither using directives nor namespace-scope > using declarations." > > and the rationale doc says: >> "Using them within an implementation file can help conciseness." > > However, there doesn't seem to be a discussion on the merits of the > various forms of "using" directives. It is a "common wisdom" among C++ programmers that using directives in header files generally negate the benefits of namespaces, and can lead to surprises. This is because most of the time, people don't go scrutinize header files; they include them (based on documentation of declared signatures) in many translation units. There is one notable exception to this, which is known as "namespace composition": http://stackoverflow.com/questions/16871390/why-is-namespace-composition-so-rarely-used We should probably amend the coding standards and include that hint, but I don't feel strongly about it. > > These aren't the GCC coding conventions, but the Google conventions: > http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Namespaces > forbid using directives of the form: > using NAMESPACE; > but suggest using directives of the form: > using NAMESPACE::NAME; > to pick out individual names from a namespace, though *not* in global > scope in a header (to avoid polluting the global namespace). The using declarations are fine, as indicated. > > I like this approach - how about using it for frequently used names in > a .c/.cc file, keeping the names alphabetizing by "fully qualified > path", so e.g.: > > #include "foo.h" > ... > #include "bar.h" > > using gcc::context; > using gcc::pass_manager; > ...etc, individually grabbing the names we'll be needing > > // code follows > > and thus avoiding grabbing whole namespaces, whilst keeping the code > concise. Agreed. […] > Note that as per > http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01365.html > we'll use "pass_manager" rather than "pipeline", so this would look > like: > pass_manager &get_passes () { gcc_assert (passes_); return *passes_; } > > We were chatting about C++ references on IRC on Friday, and IIRC there > was a strong objection to passing *arguments* that are non-const > references, I hope nobody is objecting to std::swap for example. > rather than return values - mutation of *arguments* being > surprising and a place for bugs to arise (though that may have been > Diego who was arguing that, citing > http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Reference_Arguments > ). > > I prefer to have get_passes return a reference rather a pointer since > which thing is being referred to isn't going to change, and is non-NULL. > One could write it as a "gcc::pipeline const * passes", but that doesn't > capture the non-NULL-ness of it. > [within the class data, "passes_" needs to be a *pointer* so that the > PCH deserialization can work, in case the object has been relocated]. > Having the get_passes method do the assertion of non-NULLness and > dereference means that there's a single place where the non-NULLness is > asserted. > > I guess this is a bigger point though: how do GCC maintainers feel about > C++ references in general? I think we should use them where appropriate, and trust maintainers of specific areas of the compiler to exercise sound judgments instead of us trying to cast ideology into stones. > > Looking at the GCC Coding Conventions: > http://gcc.gnu.org/codingconventions.html > and > http://gcc.gnu.org/codingrationale.h
Re: C++ coding conventions: namespaces, references and getters (was Re: [PATCH 2/2] Introduce beginnings of a pipeline class.)
On Tue, Jul 30, 2013 at 4:11 AM, Martin Jambor wrote: > Moreover, I think we should be extra careful here because GCC is > transitioning from C and many developers are used to expect C > semantics, we will be mixing new C++ code with C code a lot in the > future and even though we hope to reduce that, people will be cutting > and pasting all this stuff around. Yes, we can't reject writing idiomatic C++ just because there would be a period of transition where old codes and new codes co-exist. It wouldn't make much sense. -- Gaby
Re: C++ coding conventions: namespaces, references and getters (was Re: [PATCH 2/2] Introduce beginnings of a pipeline class.)
On Tue, Jul 30, 2013 at 4:30 AM, Martin Jambor wrote: >> I'm voting for references. References can be seen as yet another >> software structuring tool that instantly communicate some properties >> such as you mentioned above. In addition to that it's also a hint of >> ownership, i.e. if I get an object& from somewhere I know that I better >> not even think about whether to delete it or not. >> > > well, let me stress again that we should think about this in the > context of GCC. In GCC, we are used to C semantics of the dot > operator and have a lot of existing code that we will continue to use > and mix with new code with the same assumption. Putting a reference > where none has been before might result in silent and hard to spot > erroneous modifications. I cannot make sense of this. > At the same time, we are used to lookup whether the pointer we got can > be NULL or not if necessary. Moreover, in GCC, NULL-dereferences are > not particularly dangerous and are easy to debug. That should not be reason to avoid idiomatic constructions. > In the context of GCC, there will be no ownership hints of this kind > simply because new code will co-exist with tons of old code which uses > pointers and will not give any such hints and so you can't assume you > could free its pointers. Let me not even mention GC. References are not just about ownership (though they might imply it). -- Gaby
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Fri, Aug 2, 2013 at 8:05 AM, Jan Hubicka wrote: >> >> 2013-08-01 Teresa Johnson >> Steven Bosscher >> >> * cfgrtl.c (fixup_bb_partition): New routine. >> (commit_edge_insertions): Invoke fixup_partitions. >> (find_partition_fixes): New routine. >> (fixup_partitions): Ditto. >> (verify_hot_cold_block_grouping): Update comments. >> (rtl_verify_edges): Invoke find_partition_fixes. >> (rtl_verify_bb_pointers): Update comments. >> (rtl_verify_bb_layout): Ditto. >> * basic-block.h (fixup_partitions): Declare. >> * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. >> * bb-reorder.c (sanitize_dominator_hotness): New function. >> (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke >> sanitize_dominator_hotness. >> >> Index: cfgrtl.c >> === >> --- cfgrtl.c(revision 201281) >> +++ cfgrtl.c(working copy) >> @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e) >> } >> } >> >> +/* Called when block BB has been reassigned to a different partition, >> + to ensure that the region crossing attributes are updated. */ >> + >> +static void >> +fixup_bb_partition (basic_block bb) >> +{ >> + edge e; >> + edge_iterator ei; >> + >> + /* Now need to make bb's pred edges non-region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->preds) >> +{ >> + fixup_partition_crossing (e); >> +} >> + >> + /* Possibly need to make bb's successor edges region crossing, >> + or remove stale region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->succs) >> +{ >> + if ((e->flags & EDGE_FALLTHRU) >> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> + && e->dest != EXIT_BLOCK_PTR) >> +force_nonfallthru (e); >> + else >> +fixup_partition_crossing (e); >> +} >> +} > > Is there particular reason why preds can not be fallhtrus and why > force_nonfallthru edge does not need partition crossing fixup? > (if so, perhpas it could be mentioned in the description, if not, > I think force_nonfallthru path has to check if new BB was introduced > and do the right thing on the edge. I need to clarify the comments in this routine, because without the context of how this is called it isn't clear. This routine is only called when we detect a hot bb that is now dominated by a cold bb and needs to become cold. Therefore, its preds will no longer be region crossing (any non-dominating blocks that were previously hot would have been marked cold in the caller for the same reason, so we will not end up adjusting the region crossing-ness or fallthrough-ness of those pred edges). Any that were region crossing before but aren't any longer could not have been fall through (as Steven noted, you can't have a fall through across a partition boundary). I will add some better comments here. Regarding the call to force_nonfallthru, that routine calls fixup_partition_crossing as needed, and I will update the comment to reflect that too. > >> +/* Sanity check partition hotness to ensure that basic blocks in >> + the cold partition don't dominate basic blocks in the hot partition. >> + If FLAG_ONLY is true, report violations as errors. Otherwise >> + re-mark the dominated blocks as cold, since this is run after >> + cfg optimizations that may make hot blocks previously reached >> + by both hot and cold blocks now only reachable along cold paths. */ > > With profile, I suppose we can have cold blocks dominating hot blocks when the > hot blocks is in loop whose trip count is high enough. Indeed for > partitioning > reasons it does not make sense to push those into different section. > > I also wonder, if we finally get the pass stable, can we enable it by default > and offline probably cold blocks w/o profile? Primarily blocks reachable only > by EH + blocks leading to a crash or throw(). For C++ those should be common > enough to make a difference... Yep, as soon as PR57451 is fixed, which I hope to get to next week, then I am going to send a patch to turn this on by default, at least with profile feedback, which is where I've been doing performance tuning. But you are right that there are some cases where it should be beneficial without profile data as well. Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Fri, Aug 2, 2013 at 4:04 PM, Steven Bosscher wrote: > On Fri, Aug 2, 2013 at 5:05 PM, Jan Hubicka wrote: >>> +/* Called when block BB has been reassigned to a different partition, >>> + to ensure that the region crossing attributes are updated. */ >>> + >>> +static void >>> +fixup_bb_partition (basic_block bb) >>> +{ >>> + edge e; >>> + edge_iterator ei; >>> + >>> + /* Now need to make bb's pred edges non-region crossing. */ >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>> +{ >>> + fixup_partition_crossing (e); >>> +} >>> + >>> + /* Possibly need to make bb's successor edges region crossing, >>> + or remove stale region crossing. */ >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>> +{ >>> + if ((e->flags & EDGE_FALLTHRU) >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>> + && e->dest != EXIT_BLOCK_PTR) >>> +force_nonfallthru (e); >>> + else >>> +fixup_partition_crossing (e); >>> +} >>> +} >> >> Is there particular reason why preds can not be fallhtrus > > Yes, by definition a crossing edge cannot fall through. There is > always a control transfer from one section to another. > > >>> +/* Sanity check partition hotness to ensure that basic blocks in >>> + the cold partition don't dominate basic blocks in the hot partition. >>> + If FLAG_ONLY is true, report violations as errors. Otherwise >>> + re-mark the dominated blocks as cold, since this is run after >>> + cfg optimizations that may make hot blocks previously reached >>> + by both hot and cold blocks now only reachable along cold paths. */ >> >> With profile, I suppose we can have cold blocks dominating hot blocks when >> the >> hot blocks is in loop whose trip count is high enough. > > That is the common case, actually. > >> Indeed for partitioning >> reasons it does not make sense to push those into different section. > > The partitioning algrorithm makes sure this doesn't happen. The > hottest path from the entry block to a hot basic block is always part > of the hot partition. Well, at least with this patch that will be true. The trunk version just partitions based on the bb's count without regard to paths. Thanks, Teresa > > >> I also wonder, if we finally get the pass stable, can we enable it by default >> and offline probably cold blocks w/o profile? > > That is the general idea behind all this work, obviously ;-) > >> Primarily blocks reachable only >> by EH + blocks leading to a crash or throw(). For C++ those should be common >> enough to make a difference... > > Yup, and IIRC Theresa posted some numbers that showed this. > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413