Re: PR 57779 New debug check

2013-08-02 Thread Paolo Carlini

... applied this.

Thanks,
Paolo.

//
2013-08-02  Paolo Carlini  

PR libstdc++/58049
* include/debug/functions.h: Include ; minor formatting
changes.
(__foreign_iterator_aux4): Declare __l and __ge constexpr.
* include/debug/safe_iterator.h (_Safe_iterator<>::operator->):
Use __addressof.
* include/debug/safe_local_iterator.h (_Safe_local_iterator<>::
operator->): Likewise.
Index: include/debug/functions.h
===
--- include/debug/functions.h   (revision 201423)
+++ include/debug/functions.h   (working copy)
@@ -33,6 +33,7 @@
 #include  // for iterator_traits, categories 
and
  // _Iter_base
 #include // for __is_integer
+#include // for __addressof and addressof
 #if __cplusplus >= 201103L
 # include   // for less and greater_equal
 # include   // for common_type
@@ -126,8 +127,8 @@
 inline bool
 __valid_range_aux(const _InputIterator& __first,
  const _InputIterator& __last, std::__false_type)
-  { return __valid_range_aux2(__first, __last,
- std::__iterator_category(__first)); }
+{ return __valid_range_aux2(__first, __last,
+   std::__iterator_category(__first)); }
 
   /** Don't know what these iterators are, or if they are even
*  iterators (we may get an integral type for InputIterator), so
@@ -182,15 +183,14 @@
 {
   typedef typename std::common_type<_PointerType1,
_PointerType2>::type _PointerType;
-  std::less<_PointerType> __l;
-  std::greater_equal<_PointerType> __ge;
+  constexpr std::less<_PointerType> __l;
+  constexpr std::greater_equal<_PointerType> __ge;
 
-  return
-   __l(std::addressof(*__other),
-   std::addressof(*(__it._M_get_sequence()->_M_base().begin(
-   || __ge(std::addressof(*__other),
-   std::addressof(*(__it._M_get_sequence()->_M_base().end() - 1)) 
+ 1);
-  
+  return (__l(std::addressof(*__other),
+ std::addressof(*(__it._M_get_sequence()->_M_base().begin(
+ || __ge(std::addressof(*__other),
+ std::addressof(*(__it._M_get_sequence()->_M_base().end()
+  - 1)) + 1));
 }
  
   template
@@ -205,12 +205,13 @@
   // past-the-end iterator.
   if (__it._M_get_sequence()->_M_base().begin()
  != __it._M_get_sequence()->_M_base().end())
-   if (std::__addressof(*(__it._M_get_sequence()->_M_base().end() - 1))
-   - std::__addressof(*(__it._M_get_sequence()->_M_base().begin()))
+   if (std::addressof(*(__it._M_get_sequence()->_M_base().end() - 1))
+   - std::addressof(*(__it._M_get_sequence()->_M_base().begin()))
== __it._M_get_sequence()->size() - 1)
- return __foreign_iterator_aux4(__it, __other,
-   
std::addressof(*(__it._M_get_sequence()->_M_base().begin())),
-   std::addressof(*__other));
+ return (__foreign_iterator_aux4
+ (__it, __other,
+  std::addressof(*(__it._M_get_sequence()->_M_base().begin())),
+  std::addressof(*__other)));
   return true;
 }
   
@@ -232,8 +233,8 @@
 { return __it._M_get_sequence() != __other._M_get_sequence(); }
   
 #if __cplusplus >= 201103L
-  /* This overload detects when passing pointers to the contained elements 
rather
- than using iterators.
+  /* This overload detects when passing pointers to the contained elements
+ rather than using iterators.
*/
   template
 inline bool
@@ -271,10 +272,9 @@
   _InputIterator __other,
   std::__false_type)
 {
-  return
-   _Insert_range_from_self_is_safe<_Sequence>::__value
-   || __foreign_iterator_aux2(__it, __other,
-  std::__iterator_category(__it));
+  return (_Insert_range_from_self_is_safe<_Sequence>::__value
+ || __foreign_iterator_aux2(__it, __other,
+std::__iterator_category(__it)));
 }
 
   template() const
@@ -277,7 +276,7 @@
_GLIBCXX_DEBUG_VERIFY(this->_M_dereferenceable(),
  _M_message(__msg_bad_deref)
  ._M_iterator(*this, "this"));
-   return &*_M_current;
+   return std::__addressof(*_M_current);
   }
 
   // -- Input iterator requirements --
Index: include/debug/safe_local_iterator.h
===
--- include/debug/safe_local_iterator.h (revision 201423)
+++ include/debug/safe_local_iterator.h (working copy)
@@ -17

Re: [PING] [C++ Patch] Remove finish_stmt

2013-08-02 Thread Paolo Carlini

On 08/02/2013 01:42 AM, Gabriel Dos Reis wrote:

2013/8/1 Paolo Carlini :

Hi,

gently pinging this small clean-up:

 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00905.html

Thanks!
Paolo.

It was supposed to provide symmetry and a good place to
put any "cleanup" code we might want to run, but that
never materialized, and now that we are using C++, we
are better abstraction tools.  So, it can go.

Ah now I see, thanks. I'm going to apply the patch then.

Thanks again,
Paolo.



Re: [Patch, Fortran, OOP] PR 57306: ICE on valid with class pointer initialization

2013-08-02 Thread Janus Weil
ping!


2013/7/30 Janus Weil :
>> The attached update fixes it, and thus should hopefully be
>> regression-free. It also renames 'gfc_class_null_initializer' to
>> 'gfc_class_initializer', since it now also does other initializations
>> beside EXPR_NULL.
>>
>> Will do another regtest to make sure it's clean.
>
> No failures observed. As a test case I'm using now Tobias' extended
> version (attached). New ChangeLog below.
>
> Ok for trunk?
>
> Cheers,
> Janus
>
>
> 2013-07-30  Janus Weil  
>
> PR fortran/57306
> * class.c (gfc_class_null_initializer): Rename to
> 'gfc_class_initializer'. Treat non-NULL init-exprs.
> * gfortran.h (gfc_class_null_initializer): Update prototype.
> * trans-decl.c (gfc_get_symbol_decl): Treat class pointers.
> * trans-expr.c (gfc_conv_initializer): Ditto.
> (gfc_trans_subcomponent_assign): Renamed gfc_class_null_initializer.
>
> 2013-07-30  Janus Weil  
>
> PR fortran/57306
> * gfortran.dg/pointer_init_8.f90: New.


Fix ltrans ICE seen while compiling Firefox

2013-08-02 Thread Jan Hubicka
Hi,
this patch fixes one extra place where we need to consider partial cgraphs
and enables firefox to build again.

Comitted as obvious,
Honza
Index: ChangeLog
===
--- ChangeLog   (revision 201430)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2013-08-01  Jan Hubicka  
+
+   * ipa.c (symtab_remove_unreachable_nodes): Nodes in other partitions are
+   not needed.
+
 2013-08-01  Uros Bizjak  
 
* config/i386/i386.h (MAYBE_NON_Q_CLASS_P): New.
Index: ipa.c
===
--- ipa.c   (revision 201430)
+++ ipa.c   (working copy)
@@ -239,6 +239,7 @@ symtab_remove_unreachable_nodes (bool be
   node->used_as_abstract_origin = false;
   if (node->symbol.definition
  && !node->global.inlined_to
+ && !node->symbol.in_other_partition
  && (!cgraph_can_remove_if_no_direct_calls_and_refs_p (node)
  /* Keep around virtual functions for possible devirtualization.  
*/
  || (before_inlining_p


Re: Do not use PARM_DECLs in ipa-cp and ipa-prop

2013-08-02 Thread Martin Jambor
Hi,

On Thu, Aug 01, 2013 at 05:48:20PM +0200, Jan Hubicka wrote:
> > On Thu, Aug 01, 2013 at 03:11:36PM +0200, Jan Hubicka wrote:
> > > Hi,
> > > this is preparation work to move DECL_ARGUMENTS and DECL_RESULT into 
> > > function
> > > sections during WPA.  Even with some work to release unused ones, there 
> > > are 4M
> > > of PARM_DECLs and 2M of RESULT_DECLs streamed during LTO (for 6M of
> > > function_decls) making them one of the most common nodes.
> > > 
> > > This patch makes ipa-cp and ipa-prop to not use DECL_ARGUMENTS during WPA
> > > stage.  this only needed to tamn debug info, move logic doing casts from
> > > get_replacement_map to tree_function_versioning and stream move_cost that 
> > > is
> > > computed form parm type.
> > > 
> > > Martin, does this patch look OK?
> > > 
> > 
> > Generally yes, except that I think you did not convert some dumping in
> > ipa-cp.c, at least in functions estimate_local_effects,
> > find_more_scalar_values_for_callers_subset and decide_about_value.  I
> > believe that after your change there should not be a single call to
> > ipa_get_param in file ipa-cp.c.
> 
> Hmm, I was aware of that while making the patch but then forgot.
> OK, I will convert these too (and the testsuite) and commit.
> (alternative I was thining of having ipa_dump_param helper that will
> dump "parm 4" in WPA mode and "parm 4 (parmname)" in non-WPA). Do
> you think it would be useful?

Yes, I think it would be very convenient.

> We probably should comonnize syntax of dumps in between ipa-prop
> and ipa-inline-analysis predicates (that currently use op4 syntax and call the
> parameters operands). Something to track incrementally anyway.
> 

Indeed.

Thanks,

Martin


[PATCH, Fortran, PR 57987] Do not call cgraph_finalize_function multiple times on finalizers

2013-08-02 Thread Martin Jambor
Hi,

when looking at another PR, I found out that inliner refused to even
consider __final_test2_T/0 because, according to the dump, "redefined
extern inline functions are not considered for inlining."  When I
looked at why, I realized that the function is "finalized" (by
cgraph_finalize_function) twice.  Once directly from front-end and
second time from un-nesting nested functions.

This patch makes sure that the Fortran front-end does not do that.
Some details about the patch development over the time is in bugzilla.
Bootstrapped and tested on x86_64-linux without any problems, OK for
trunk?

Thanks,

Martin


2013-08-01  Martin Jambor  

* cgraphunit.c (cgraph_finalize_function): Assert that nested function
is not re-finalized.  Rename second parameter to no_collect.

fortran/
* trans-decl.c (gfc_generate_function_code): Never call
cgraph_finalize_function on nested functions.

testsuite/
* gfortran.dg/pr57987.f90: New test.

Index: src/gcc/cgraphunit.c
===
--- src.orig/gcc/cgraphunit.c
+++ src/gcc/cgraphunit.c
@@ -405,17 +405,20 @@ referred_to_p (symtab_node node)
 }
 
 /* DECL has been parsed.  Take it, queue it, compile it at the whim of the
-   logic in effect.  If NESTED is true, then our caller cannot stand to have
+   logic in effect.  If NO_COLLECT is true, then our caller cannot stand to 
have
the garbage collector run at the moment.  We would need to either create
a new GC context, or just not compile right now.  */
 
 void
-cgraph_finalize_function (tree decl, bool nested)
+cgraph_finalize_function (tree decl, bool no_collect)
 {
   struct cgraph_node *node = cgraph_get_create_node (decl);
 
   if (node->symbol.definition)
 {
+  /* Nested functions should only be defined once.  */
+  gcc_assert (!DECL_CONTEXT (decl)
+ || TREE_CODE (DECL_CONTEXT (decl)) != FUNCTION_DECL);
   cgraph_reset_node (node);
   node->local.redefined_extern_inline = true;
 }
@@ -454,7 +457,7 @@ cgraph_finalize_function (tree decl, boo
   if (warn_unused_parameter)
 do_warn_unused_parameter (decl);
 
-  if (!nested)
+  if (!no_collect)
 ggc_collect ();
 
   if (cgraph_state == CGRAPH_STATE_CONSTRUCTION
Index: src/gcc/fortran/trans-decl.c
===
--- src.orig/gcc/fortran/trans-decl.c
+++ src/gcc/fortran/trans-decl.c
@@ -5640,14 +5640,16 @@ gfc_generate_function_code (gfc_namespac
 }
   current_function_decl = old_context;
 
-  if (decl_function_context (fndecl) && gfc_option.coarray != GFC_FCOARRAY_LIB
-  && has_coarray_vars)
-/* Register this function with cgraph just far enough to get it
-   added to our parent's nested function list.
-   If there are static coarrays in this function, the nested _caf_init
-   function has already called cgraph_create_node, which also created
-   the cgraph node for this function.  */
-(void) cgraph_create_node (fndecl);
+  if (decl_function_context (fndecl))
+{
+  /* Register this function with cgraph just far enough to get it
+added to our parent's nested function list.
+If there are static coarrays in this function, the nested _caf_init
+function has already called cgraph_create_node, which also created
+the cgraph node for this function.  */
+  if (!has_coarray_vars || gfc_option.coarray != GFC_FCOARRAY_LIB)
+   (void) cgraph_create_node (fndecl);
+}
   else
 cgraph_finalize_function (fndecl, true);
 
Index: src/gcc/testsuite/gfortran.dg/pr57987.f90
===
--- /dev/null
+++ src/gcc/testsuite/gfortran.dg/pr57987.f90
@@ -0,0 +1,24 @@
+! { dg-do compile }
+! { dg-options "-O3 -fno-ipa-cp -fdump-ipa-inline" }
+
+program test
+  call test2 ()
+contains
+  subroutine test2 ()
+type t
+  integer, allocatable :: x
+end type t
+
+type t2
+  class(t), allocatable :: a
+end type t2
+
+type(t2) :: one, two
+
+allocate (two%a)
+one = two
+  end subroutine test2
+end program test
+
+! { dg-final { scan-ipa-dump-not "redefined extern inline functions are not 
considered for inlining" "inline" } }
+! { dg-final { cleanup-ipa-dump "inline" } }


Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA

2013-08-02 Thread Ilya Enkovich
Hi All,

I've updated MPX Wiki page
(http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler).
I added instrumentation description, programming model description,
differences with other checkers, implementation details.

What about the first patch? Should I post next patches in the series?

Thanks,
Ilya

2013/7/29 Ilya Enkovich :
> Hi,
>
> Here is updated version of the patch. I removed redundant
> mode_for_bound, added comments to BOUND_TYPE and added -mmpx option.
> I also fixed bndmk/bndldx/bndstx constraints to avoid incorrect
> register allocation (created two new constraints for that).
>
> Thanks,
> Ilya
>
> ---
> 2013-07-29  Ilya Enkovich  
>
> * mode-classes.def (MODE_BOUND): New.
> * tree.def (BOUND_TYPE): New.
> * genmodes.c (complete_mode): Support MODE_BOUND.
> (BOUND_MODE): New.
> (make_bound_mode): New.
> * machmode.h (BOUND_MODE_P): New.
> * stor-layout.c (int_mode_for_mode): Support MODE_BOUND.
> (layout_type): Support BOUND_TYPE.
> * tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE.
> * tree.c (build_int_cst_wide): Support BOUND_TYPE.
> (type_contains_placeholder_1): Likewise.
> * tree.h (BOUND_TYPE_P): New.
> * varasm.c (output_constant): Support BOUND_TYPE.
> * config/i386/constraints.md (B): New.
> (Ti): New.
> (Tb): New.
> * config/i386/i386-modes.def (BND32): New.
> (BND64): New.
> * config/i386/i386-protos.h (ix86_bnd_prefixed_insn_p): New.
> * config/i386/i386.c (isa_opts): Add mmpx.
> (regclass_map): Add bound registers.
> (dbx_register_map): Likewise.
> (dbx64_register_map): Likewise.
> (svr4_dbx_register_map): Likewise.
> (PTA_MPX): New.
> (ix86_option_override_internal) Support MPX ISA.
> (ix86_code_end): Add MPX bnd prefix.
> (output_set_got): Likewise.
> (ix86_output_call_insn): Likewise.
> (get_some_local_dynamic_name): Add '!' (MPX bnd) print prefix support.
> (ix86_print_operand_punct_valid_p): Likewise.
> (ix86_print_operand_address): Support UNSPEC_BNDMK_ADDR and
> UNSPEC_BNDMK_ADDR.
> (ix86_class_likely_spilled_p): Add bound regs support.
> (ix86_hard_regno_mode_ok): Likewise.
> (x86_order_regs_for_local_alloc): Likewise.
> (ix86_bnd_prefixed_insn_p): New.
> * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Fix to new value.
> (FIXED_REGISTERS): Add bound registers.
> (CALL_USED_REGISTERS): Likewise.
> (REG_ALLOC_ORDER): Likewise.
> (HARD_REGNO_NREGS): Likewise.
> (TARGET_MPX): New.
> (VALID_BND_REG_MODE): New.
> (FIRST_BND_REG): New.
> (LAST_BND_REG): New.
> (reg_class): Add BND_REGS.
> (REG_CLASS_NAMES): Likewise.
> (REG_CLASS_CONTENTS): Likewise.
> (BND_REGNO_P): New.
> (ANY_BND_REG_P): New.
> (BNDmode): New.
> (HI_REGISTER_NAMES): Add bound registers.
> * config/i386/i386.md (UNSPEC_BNDMK): New.
> (UNSPEC_BNDMK_ADDR): New.
> (UNSPEC_BNDSTX): New.
> (UNSPEC_BNDLDX): New.
> (UNSPEC_BNDLDX_ADDR): New.
> (UNSPEC_BNDCL): New.
> (UNSPEC_BNDCU): New.
> (UNSPEC_BNDCN): New.
> (UNSPEC_MPX_FENCE): New.
> (BND0_REG): New.
> (BND1_REG): New.
> (type): Add mpxmov, mpxmk, mpxchk, mpxld, mpxst.
> (length_immediate): Likewise.
> (prefix_0f): Likewise.
> (memory): Likewise.
> (prefix_rep): Check for bnd prefix.
> (BND): New.
> (bnd_ptr): New.
> (BNDCHECK): New.
> (bndcheck): New.
> (*jcc_1): Add MPX bnd prefix and fix length.
> (*jcc_2): Likewise.
> (jump): Likewise.
> (simple_return_internal): Likewise.
> (simple_return_pop_internal): Likewise.
> (*indirect_jump): Add MPX bnd prefix.
> (*tablejump_1): Likewise.
> (simple_return_internal_long): Likewise.
> (simple_return_indirect_internal): Likewise.
> (_mk): New.
> (*_mk): New.
> (mov): New.
> (*mov_internal_mpx): New.
> (_): New.
> (*_): New.
> (_ldx): New.
> (*_ldx): New.
> (_stx): New.
> (*_stx): New.
> * config/i386/predicates.md (lea_address_operand) Rename to...
> (address_no_seg_operand): ... this.
> (address_mpx_no_base_operand): New.
> (address_mpx_no_index_operand): New.
> (bnd_mem_operator): New.
> * config/i386/i386.opt (mmpx): New.


Re: PR 57779 New debug check

2013-08-02 Thread Gabriel Dos Reis
On Fri, Aug 2, 2013 at 2:36 AM, Paolo Carlini  wrote:
> ... applied this.

thanks.


[PATCH][ARM] Delete gcc.target/arm/neon-for-64bits-2.c test

2013-08-02 Thread Kyrylo Tkachov
Hi all,

The gcc.target/arm/neon-for-64bits-2.c test is supposed to test that
-mneon-for-64bits gets the compiler to use NEON instructions for 64 bit
bitwise operations. However, the -mneon-for-64bits flag was only really added
as a complement of -mno-neon-for-64bits that explicitly prevents generation of
64 bit. The -mneon-for-64bits option is not supposed to force generation of
NEON instructions and therefore the compiler might choose to not use them,
making this testcase invalid.

This patch removes it.

Ok for trunk?

Thanks,
Kyrill

2013-08-02  Kyrylo Tkachov  

* gcc.target/arm/neon-for-64bits-2.c: Delete.

remove_neon_64bits_test.patch
Description: Binary data


Re: [PATCH, Fortran, PR 57987] Do not call cgraph_finalize_function multiple times on finalizers

2013-08-02 Thread Jan Hubicka
> Hi,
> 
> when looking at another PR, I found out that inliner refused to even
> consider __final_test2_T/0 because, according to the dump, "redefined
> extern inline functions are not considered for inlining."  When I
> looked at why, I realized that the function is "finalized" (by
> cgraph_finalize_function) twice.  Once directly from front-end and
> second time from un-nesting nested functions.
> 
> This patch makes sure that the Fortran front-end does not do that.
> Some details about the patch development over the time is in bugzilla.
> Bootstrapped and tested on x86_64-linux without any problems, OK for
> trunk?

OK (I hope that the fortran part is cgraph centric so I can approve it), thanks.
Thanks for noticing it - those missed inlines seems quite unforutnate.
Honza


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-02 Thread Bernhard Reutner-Fischer
On 1 August 2013 18:32, Teresa Johnson  wrote:
> Patch 3 of 3 split out from the patch I sent in May that fixes problems with
> -freorder-blocks-and-partition, with changes/fixes discussed in that thread.
>
> See http://gcc.gnu.org/ml/gcc-patches/2013-05/threads.html#00388 for context.
>
> This patch sanitizes the partitioning to address issues such as edge
> weight insanities that sometimes occur due to upstream optimizations,
> and ensures that hot blocks are not dominated by cold blocks. This
> needs to be resanitized after certain cfg optimizations that may
> cause hot blocks previously reached via both hot and cold paths to
> only be reached by cold paths.
>
> The verification code in sanitize_dominator_hotness was contributed by
> Steven Bosscher.
>
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that
> a profiledbootstrap passed with -freorder-blocks-and-partition enabled
> (and with the dwarf version changed to 2 to work around PR57451).
>
> Ok for trunk?
>
> (I also included the patch as an attachment since my mailer invariably
> messes up the formatting in the pasted version.)
>
> Thanks,
> Teresa
>
> 2013-08-01  Teresa Johnson  
> Steven Bosscher  
>
> * cfgrtl.c (fixup_bb_partition): New routine.
> (commit_edge_insertions): Invoke fixup_partitions.
> (find_partition_fixes): New routine.
> (fixup_partitions): Ditto.
> (verify_hot_cold_block_grouping): Update comments.
> (rtl_verify_edges): Invoke find_partition_fixes.
> (rtl_verify_bb_pointers): Update comments.
> (rtl_verify_bb_layout): Ditto.
> * basic-block.h (fixup_partitions): Declare.
> * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
> * bb-reorder.c (sanitize_dominator_hotness): New function.
> (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
> sanitize_dominator_hotness.
>
> Index: cfgrtl.c
> ===
> --- cfgrtl.c (revision 201281)
> +++ cfgrtl.c (working copy)
> @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e)
>  }
>  }
>
> +/* Called when block BB has been reassigned to a different partition,
> +   to ensure that the region crossing attributes are updated.  */
> +
> +static void
> +fixup_bb_partition (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* Now need to make bb's pred edges non-region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +{
> +  fixup_partition_crossing (e);
> +}
> +
> +  /* Possibly need to make bb's successor edges region crossing,
> + or remove stale region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +{
> +  if ((e->flags & EDGE_FALLTHRU)
> +  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> +  && e->dest != EXIT_BLOCK_PTR)
> +force_nonfallthru (e);
> +  else
> +fixup_partition_crossing (e);
> +}
> +}
> +
>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
> expense of adding new instructions or reordering basic blocks.
>
> @@ -1979,6 +2007,14 @@ commit_edge_insertions (void)
>  {
>basic_block bb;
>
> +  /* Optimization passes that invoke this routine can cause hot blocks
> + previously reached by both hot and cold blocks to become dominated only
> + by cold blocks. This will cause the verification below to fail,
> + and lead to now cold code in the hot section. In some cases this
> + may only be visible after newly unreachable blocks are deleted,
> + which will be done by fixup_partitions.  */
> +  fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
>verify_flow_info ();
>  #endif
> @@ -2173,6 +2209,101 @@ get_last_bb_insn (basic_block bb)
>return end;
>  }
>
> +/* Sanity check partition hotness to ensure that basic blocks in
> +   the cold partition don't dominate basic blocks in the hot partition.
> +   If FLAG_ONLY is true, report violations as errors. Otherwise
> +   re-mark the dominated blocks as cold, since this is run after
> +   cfg optimizations that may make hot blocks previously reached
> +   by both hot and cold blocks now only reachable along cold paths.  */
> +
> +vec
> +find_partition_fixes (bool flag_only)
> +{
> +  basic_block bb;
> +  vec bbs_in_cold_partition = vNULL;
> +  vec bbs_to_fix = vNULL;
> +
> +  if (!crtl->has_bb_partition)
> +return vNULL;

I'd push this early return into the callers instead, at most turn it into a
gcc_checking_assert to be safe.

Both callers, fixup_partitions and rtl_verify_edges, look at
ctrl->has_bb_partition already before calling this, so the above should
be dead already.

Did my mailer somehow swallow the static from find_partition_fixes?

> +
> +  FOR_EACH_BB (bb)
> +if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +  bbs_in_cold_partition.safe_push (bb);
> +
> +  if (bbs_in_cold_partition.is_empty ())
> +return vNULL;
> +
> +  bool dom_calculated_here = !

Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Torvald Riegel
On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
> Index: libitm/config/s390/target.h
> ===
> *** libitm/config/s390/target.h.orig
> --- libitm/config/s390/target.h
> ***

[...]

> + static inline uint32_t
> + htm_begin ()
> + {
> +   return __builtin_tbegin_nofloat (NULL);
> + } 

I'm not quite sure I understand why the nofloat variant is sufficient;
is this because we don't use FPRs in the abort/restart code of libitm
(and the FPRs are known to be clobbered after starting a txn)?

There is a bit in the properties passed to _ITM_beginTransaction which
states whether the txn has floating point update code
(pr_hasNoFloatUpdate).  Would this be useful?

(BTW, sorry for the late response.  I don't read gcc-patches frequently,
so please CC me on libitm-related things.)



Re: [PATCH][4.8 backport] S/390: Transactional Execution support

2013-08-02 Thread Torvald Riegel
On Thu, 2013-08-01 at 10:55 +0200, Andreas Krebbel wrote:
> Torvald, could you please consider adding backports of the following
> patches to the 4.8 branch?
> 
> 19/06/13 [patch] libitm: Fix handling of reentrancy in the HTM fastpath
> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01132.html
> 
> 20/06/13 Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath
> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01188.html

I believe Jakub already backported these two; at least he told me he was
planning to, IIRC.

Torvald



[PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-02 Thread Martin Jambor
Hi,

while compute_record_mode in stor-layout.c makes sure it assigns BLK
mode to structs with flexible arrays, it has no such provisions for
zero length arrays
(http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Zero-Length.html).  I
think that in order to avoid problems and surprises like PR 57748
(where this triggered code that was intended for small structures that
fit into a scalar mode and ICEd), we should assign both variable array
possibilities the same mode.

Bootstrapped and tested on x86_64-linux without any problems.  OK for
trunk and the 4.8 branch?  (I'm not sure about the 4.7, this PR does
not happen there despite the wrong mode so I'd ignore it for now.)

Thanks,

Martin


2013-08-01  Martin Jambor  

PR middle-end/57748
* stor-layout.c (compute_record_mode): Treat zero-sized array fields
like incomplete types.

testsuite/
* gcc.dg/torture/pr57748.c: New test.


*** /tmp/lV6Ba8_stor-layout.c   Thu Aug  1 16:28:25 2013
--- gcc/stor-layout.c   Thu Aug  1 15:36:18 2013
*** compute_record_mode (tree type)
*** 1604,1610 
   && integer_zerop (TYPE_SIZE (TREE_TYPE (field)
  || ! host_integerp (bit_position (field), 1)
  || DECL_SIZE (field) == 0
! || ! host_integerp (DECL_SIZE (field), 1))
return;
  
/* If this field is the whole struct, remember its mode so
--- 1604,1612 
   && integer_zerop (TYPE_SIZE (TREE_TYPE (field)
  || ! host_integerp (bit_position (field), 1)
  || DECL_SIZE (field) == 0
! || ! host_integerp (DECL_SIZE (field), 1)
! || (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE
! && tree_low_cst (DECL_SIZE (field), 1) == 0))
return;
  
/* If this field is the whole struct, remember its mode so
*** /dev/null   Tue Jun  4 12:34:56 2013
--- gcc/testsuite/gcc.dg/torture/pr57748.c  Thu Aug  1 15:42:14 2013
***
*** 0 
--- 1,45 
+ /* PR middle-end/57748 */
+ /* { dg-do run } */
+ 
+ #include 
+ 
+ extern void abort (void);
+ 
+ typedef long long V
+   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
+ 
+ typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
+ 
+ struct __attribute__((packed)) T { char c; P s; };
+ 
+ void __attribute__((noinline, noclone))
+ check (struct T *t)
+ {
+   if (t->s.b[0][0] != 3 || t->s.b[0][1] != 4)
+ abort ();
+ }
+ 
+ int __attribute__((noinline, noclone))
+ get_i (void)
+ {
+   return 0;
+ }
+ 
+ void __attribute__((noinline, noclone))
+ foo (P *p)
+ {
+   V a = { 3, 4 };
+   int i = get_i();
+   p->b[i] = a;
+ }
+ 
+ int
+ main ()
+ {
+   struct T *t = (struct T *) malloc (128);
+ 
+   foo (&t->s);
+   check (t);
+ 
+   return 0;
+ }


Re: [PATCH][ARM] Delete gcc.target/arm/neon-for-64bits-2.c test

2013-08-02 Thread Richard Earnshaw
On 02/08/13 11:56, Kyrylo Tkachov wrote:
> Hi all,
> 
> The gcc.target/arm/neon-for-64bits-2.c test is supposed to test that
> -mneon-for-64bits gets the compiler to use NEON instructions for 64 bit
> bitwise operations. However, the -mneon-for-64bits flag was only really added
> as a complement of -mno-neon-for-64bits that explicitly prevents generation of
> 64 bit. The -mneon-for-64bits option is not supposed to force generation of
> NEON instructions and therefore the compiler might choose to not use them,
> making this testcase invalid.
> 
> This patch removes it.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2013-08-02  Kyrylo Tkachov  
> 
>   * gcc.target/arm/neon-for-64bits-2.c: Delete.
> 
> 

OK

R.




[AARCH64][Insn classification unification 4/N] load/store types

2013-08-02 Thread Sofiane Naci
Hi,

This patch is part of the ongoing work to unify instruction classification
between the ARM and AARCH64 backends.

This patch adds "load_acq" and "store_rel" types for classifying load
acquire and store release instructions respectively. It also updates the
ARMv8 pipeline descriptions.

OK for trunk?

Thanks
Sofiane

-

ChangeLog:

* config/arm/types.md (define_attr "type"): Add "load_acq" and
"store_rel".
* config/arm/cortex-a53.md (cortex_a53_load1): Update for attribute
changes.
(cortex_a53_store1): Likewise.


arm-a64-fine-tune-load-store.diff
Description: Binary data


Re: [AARCH64][Insn classification unification 4/N] load/store types

2013-08-02 Thread Richard Earnshaw
On 02/08/13 14:03, Sofiane Naci wrote:
> Hi,
> 
> This patch is part of the ongoing work to unify instruction classification
> between the ARM and AARCH64 backends.
> 
> This patch adds "load_acq" and "store_rel" types for classifying load
> acquire and store release instructions respectively. It also updates the
> ARMv8 pipeline descriptions.
> 
> OK for trunk?
> 
> Thanks
> Sofiane
> 
> -
> 
> ChangeLog:
> 
>   * config/arm/types.md (define_attr "type"): Add "load_acq" and
> "store_rel".
>   * config/arm/cortex-a53.md (cortex_a53_load1): Update for attribute
>   changes.
>   (cortex_a53_store1): Likewise.
> 

OK





Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Andreas Krebbel
On 02/08/13 13:31, Torvald Riegel wrote:
> On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
>> Index: libitm/config/s390/target.h
>> ===
>> *** libitm/config/s390/target.h.orig
>> --- libitm/config/s390/target.h
>> ***
> 
> [...]
> 
>> + static inline uint32_t
>> + htm_begin ()
>> + {
>> +   return __builtin_tbegin_nofloat (NULL);
>> + } 
> 
> I'm not quite sure I understand why the nofloat variant is sufficient;
> is this because we don't use FPRs in the abort/restart code of libitm
> (and the FPRs are known to be clobbered after starting a txn)?

Yes. To my understanding there are 2 potential issues when the FPRs are not 
saved.

1. The abort code will read corrupted FPR content for FPRs:
- which are live across tbegin and
- are updated in the TX body and
- are read in the abort code
The abort code will see the updated value from the TX body.

Since libitm implements TX begins as function calls only call-saved registers 
can be live across a
tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get 
restored when doing
the longjmp back into the user code. So this should be no problem.

2. The other problem is that the ABI might get violated when aborting into a 
callee which already
returned and there are FPRs which have been modified afterwards. In that case 
the compiler would not
have generated FPR save/restore instructions in the function prologue/epilogue 
of the callee. But in
fact there are modified FPRs when exiting the second time.

But again this should be no problem thanks to the setjmp/longjmp semantics of 
_ITM_beginTransaction
and GTM_longjmp. If the TX body modified a call-saved FPR it will get restored 
on the abort path
back from libitm to the user code.


As long as libitm does not use FPRs itself this should be safe without having 
tbegin clobbering FPRs.

> There is a bit in the properties passed to _ITM_beginTransaction which
> states whether the txn has floating point update code
> (pr_hasNoFloatUpdate).  Would this be useful?

This probably would be useful in the ITM_beginTransaction / GTM_longjmp 
implementations which in
that case could try to avoid the FPR save/restores. But Richard mentioned that 
these bits so far are
never set by GCC since it is lacking the required infos from the register 
allocator.

> (BTW, sorry for the late response.  I don't read gcc-patches frequently,
> so please CC me on libitm-related things.)

No problem. I'll try to remember this and cc you next time.

Bye,

-Andreas-



Re: [Patch] regex bracket expression implementaion

2013-08-02 Thread Paolo Carlini

Hi,

On 08/01/2013 04:59 PM, Tim Shen wrote:

Fully tested under x86_64. (make bootstrap && make -k check).

Next, I'll try to refactor _Grep_matcher using templates instead of
virtual functions, to make implementing back-reference easier.

If nobody has further comments over the next day or so, patch is Ok with me.

Thanks,
Paolo.



PS: I think it would be nice if from to time you could provide a rough 
estimate of the amount of work done vs the work still needed; major 
milestones; etc. Just to have an idea.


[c++-concepts] class template constraints

2013-08-02 Thread Andrew Sutton
Attached is a patch that deals with class template constraints. In
particular, it does 3 things:

1. Type constraints are checked on lookup rather than instantiation.
2. Initial support for constrained out-of-class definitions is added
(more work needed)
3. Support for constrained partial template specialization is added.

Change Log:

2013-08-02  Andrew Sutton  
* gcc/cp/pt.c (get_class_bindings): Pass the partial specialization
for constraint evaluation. Evaluate constraints, resulting in
deduction failure on error.
(get_specializaing_template_decl), (get_specialization_constraints),
(maybe_new_partial_specialization): New.
(maybe_process_partial_specialization): Allow the creation of
new types for constrained partial specializations.
(process_partial_specialization): Modify the canonical type
of constrained partial specializations.
(lookup_template_class_1): Check constraints on lookup. Do not
explicitly check alias constraints.
(instantiate_class_template_1): Do not explicitly check constraints
during class template instantiation.
(more_specialized_class): Pass specializations to get_class_bindings().
Compare specialization constraints if the types are equivalent.
(most_specialized_class): Pass specialization to get_class_bndings().
* gcc/cp/constraint.cc (reduce_template_id): Don't crash when
omitting ()'s on constraint calls.
(check_constraints): Don't try to evaluate when arguments are
dependent.
(equivalent_constraints): Optimize the case when a and b are the
same constraints.
* gcc/cp/decl2.c (check_class_fn): Allow overloading of constrained
out-of-class member definitions.
* gcc/cp/parser.c (cp_parser_init_declarator): Parse requires
clauses for out-of-class member definitions. Be sure to restore
current constraints before exiting the function.
(cp_parser_late_parsing_for_member): Restore constraints after
maybe_end_member_template_processing().

Andrew


template-1.patch
Description: Binary data


Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Peter Bergner
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
> Since libitm implements TX begins as function calls only call-saved registers 
> can be live across a
> tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
> get restored when doing
> the longjmp back into the user code. So this should be no problem.

Except that the htm_begin() routines are declared static inline functions,
so when they're inlined, you aren't protected by the semantics of a call
anymore, are you?

Peter





Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Andreas Krebbel
On 02/08/13 16:23, Peter Bergner wrote:
> On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
>> Since libitm implements TX begins as function calls only call-saved 
>> registers can be live across a
>> tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
>> get restored when doing
>> the longjmp back into the user code. So this should be no problem.
> 
> Except that the htm_begin() routines are declared static inline functions,
> so when they're inlined, you aren't protected by the semantics of a call
> anymore, are you?

They get inlined into libitm code but not into the user code. As I understand 
it in the user code
will always be a call to _ITM_beginTransaction.

-Andreas-

> 
> Peter
> 
> 
> 



patch to fix PR57963 (s390)

2013-08-02 Thread Vladimir Makarov

The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57963

The page contains a good analysis of the PR from Andreas Krebbel.

The patch was successfully bootstrapped and tested on x86/x86-64 and s390x.

Committed as rev. 201438.

2013-08-02  Vladimir Makarov  

PR rtl-optimization/57963
* lra-constraints.c (reverse_equiv_p, contains_reloaded_insn_p):
New.
(lra_constraints): Use them.


Index: lra-constraints.c
===
--- lra-constraints.c   (revision 201435)
+++ lra-constraints.c   (working copy)
@@ -3600,6 +3600,40 @@ init_insn_rhs_dead_pseudo_p (int regno)
   return false;
 }
 
+/* Return TRUE if REGNO has a reverse equivalence.  The equivalence is
+   reverse only if we have one init insn with given REGNO as a
+   source.  */
+static bool
+reverse_equiv_p (int regno)
+{
+  rtx insns, set;
+
+  if ((insns = ira_reg_equiv[regno].init_insns) == NULL_RTX)
+return false;
+  if (! INSN_P (XEXP (insns, 0))
+  || XEXP (insns, 1) != NULL_RTX)
+return false;
+  if ((set = single_set (XEXP (insns, 0))) == NULL_RTX)
+return false;
+  return REG_P (SET_SRC (set)) && (int) REGNO (SET_SRC (set)) == regno;
+}
+
+/* Return TRUE if REGNO was reloaded in an equivalence init insn.  We
+   call this function only for non-reverse equivalence.  */
+static bool
+contains_reloaded_insn_p (int regno)
+{
+  rtx set;
+  rtx list = ira_reg_equiv[regno].init_insns;
+
+  for (; list != NULL_RTX; list = XEXP (list, 1))
+if ((set = single_set (XEXP (list, 0))) == NULL_RTX
+   || ! REG_P (SET_DEST (set))
+   || (int) REGNO (SET_DEST (set)) != regno)
+  return true;
+  return false;
+}
+
 /* Entry function of LRA constraint pass.  Return true if the
constraint pass did change the code. */
 bool
@@ -3643,7 +3677,6 @@ lra_constraints (bool first_p)
else if ((x = get_equiv_substitution (reg)) != reg)
  {
bool pseudo_p = contains_reg_p (x, false, false);
-   rtx set, insns;
 
/* After RTL transformation, we can not guarantee that
   pseudo in the substitution was not reloaded which might
@@ -3675,13 +3708,13 @@ lra_constraints (bool first_p)
   removed the insn.  When the equiv can be a
   constant, the right hand side of the init insn can
   be a pseudo.  */
-   || (! ((insns = ira_reg_equiv[i].init_insns) != NULL_RTX
-  && INSN_P (XEXP (insns, 0))
-  && XEXP (insns, 1) == NULL_RTX
-  && (set = single_set (XEXP (insns, 0))) != NULL_RTX
-  && REG_P (SET_SRC (set))
-  && (int) REGNO (SET_SRC (set)) == i)
-   && init_insn_rhs_dead_pseudo_p (i))
+   || (! reverse_equiv_p (i)
+   && (init_insn_rhs_dead_pseudo_p (i)
+   /* If we reloaded the pseudo in an equivalence
+  init insn, we can not remove the equiv init
+  insns and the init insns might write into
+  const memory in this case.  */
+   || contains_reloaded_insn_p (i)))
/* Prevent access beyond equivalent memory for
   paradoxical subregs.  */
|| (MEM_P (x)


Introduce local aliases for call targets when possible

2013-08-02 Thread Jan Hubicka
Hi,
this patch (developed with Martin Liska) adds symbol table function
symtab_nonoverwritable_alias that for a given definition returns symtab node
that is not overwritable by linking and points to it.
This patch makes simple use of it: when inline function is called and the symbol
itself can be interposed by dynamic linking (and can not be interposed by static
linking), we replace all direct calls to it by calls to the alias.
This saves jumps to PLT entries.

For Firefox this makes just few replacements, but I guess every relocation 
counts.
Similar replacements are possibly in virtual tables that I am going to change 
next -
more care is needed there since we do not yet have any infrastructure to 
redirect
relocations.

Bootstrapped/regtested ppc64-linux, will commit it shortly.

PS: I just noticed symtab_nonoverwritable_alias lacks documentation, will add it
momentarily.

Honza

* cgraph.c (cgraph_function_body_availability): Do not check cgrpah 
flags.
* cgraph.h (symtab_for_node_and_aliases, symtab_nonoverwritable_alias,
symtab_node_availability): Declare.
* ipa.c (can_replace_by_local_alias): New.
(function_and_variable_visibility): Use it.
* symtab.c (symtab_for_node_and_aliases, symtab_nonoverwritable_alias_1,
symtab_nonoverwritable_alias): New.
Index: cgraph.c
===
--- cgraph.c(revision 201412)
+++ cgraph.c(working copy)
@@ -1697,7 +1697,6 @@ enum availability
 cgraph_function_body_availability (struct cgraph_node *node)
 {
   enum availability avail;
-  gcc_assert (cgraph_function_flags_ready);
   if (!node->symbol.analyzed)
 avail = AVAIL_NOT_AVAILABLE;
   else if (node->local.local)
Index: cgraph.h
===
--- cgraph.h(revision 201413)
+++ cgraph.h(working copy)
@@ -597,6 +597,12 @@ symtab_node symtab_alias_ultimate_target
  enum availability *avail = NULL);
 bool symtab_resolve_alias (symtab_node node, symtab_node target);
 void fixup_same_cpp_alias_visibility (symtab_node node, symtab_node target);
+bool symtab_for_node_and_aliases (symtab_node,
+ bool (*) (symtab_node, void *),
+ void *,
+ bool);
+symtab_node symtab_nonoverwritable_alias (symtab_node);
+enum availability symtab_node_availability (symtab_node);
 
 /* In cgraph.c  */
 void dump_cgraph (FILE *);
Index: ipa.c
===
--- ipa.c   (revision 201412)
+++ ipa.c   (working copy)
@@ -750,6 +750,21 @@ varpool_externally_visible_p (struct var
   return false;
 }
 
+/* Return true if reference to NODE can be replaced by a local alias.
+   Local aliases save dynamic linking overhead and enable more optimizations.
+ */
+
+bool
+can_replace_by_local_alias (symtab_node node)
+{
+  return (symtab_node_availability (node) > AVAIL_OVERWRITABLE
+ && !DECL_EXTERNAL (node->symbol.decl)
+ && (!DECL_ONE_ONLY (node->symbol.decl)
+ || node->symbol.resolution == LDPR_PREVAILING_DEF
+ || node->symbol.resolution == LDPR_PREVAILING_DEF_IRONLY
+ || node->symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP));
+}
+
 /* Mark visibility of all functions.
 
A local function is one whose calls can occur only in the current
@@ -871,7 +886,36 @@ function_and_variable_visibility (bool w
}
 }
   FOR_EACH_DEFINED_FUNCTION (node)
-node->local.local = cgraph_local_node_p (node);
+{
+  node->local.local = cgraph_local_node_p (node);
+
+  /* If we know that function can not be overwritten by a different 
semantics
+and moreover its section can not be discarded, replace all direct calls
+by calls to an nonoverwritable alias.  This make dynamic linking
+cheaper and enable more optimization.
+
+TODO: We can also update virtual tables.  */
+  if (node->callers && can_replace_by_local_alias ((symtab_node)node))
+   {
+ struct cgraph_node *alias = cgraph (symtab_nonoverwritable_alias 
((symtab_node) node));
+
+ if (alias != node)
+   {
+ while (node->callers)
+   {
+ struct cgraph_edge *e = node->callers;
+
+ cgraph_redirect_edge_callee (e, alias);
+ if (!flag_wpa)
+   {
+ push_cfun (DECL_STRUCT_FUNCTION (e->caller->symbol.decl));
+ cgraph_redirect_edge_call_stmt_to_callee (e);
+ pop_cfun ();
+   }
+   }
+   }
+   }
+}
   FOR_EACH_VARIABLE (vnode)
 {
   /* weak flag makes no sense on local variables.  */
Index: symtab.c
===
--- symtab.c(revision 201412)
+++ symtab.c(wor

Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Peter Bergner
On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote:
> On 02/08/13 16:23, Peter Bergner wrote:
> > On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
> >> Since libitm implements TX begins as function calls only call-saved 
> >> registers can be live across a
> >> tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
> >> get restored when doing
> >> the longjmp back into the user code. So this should be no problem.
> > 
> > Except that the htm_begin() routines are declared static inline functions,
> > so when they're inlined, you aren't protected by the semantics of a call
> > anymore, are you?
> 
> They get inlined into libitm code but not into the user code. As I understand 
> it in the user code
> will always be a call to _ITM_beginTransaction.

Sure, that protects the user code, but what about the libitm code?
>From your previous comment:

> As long as libitm does not use FPRs itself this should be safe without
> having tbegin clobbering FPRs.

Is it a given that s390 doesn't use FPRs without explicit use of
floating point types?  I ask, because on POWER, we can and do 
generate floating point code without explicit use of double,
float, etc.  Maybe s390 is safe in that respect.

Peter





[RFC PATCH][ARM] Uncombine cbz/cbnz in if-conversion

2013-08-02 Thread Plotnikov Dmitry

Hi,

On ARM/Thumb-2, if-conversion is sometimes unnecessarily constrained by 
combine pass (example below).  In such cases, it would be profitable to 
teach if-conversion to apply a reverse transformation, but only if 
if-converting is possible.  The attached patch implements that and 
improves performance on ARM, but in order to "uncombine" insns it 
references CC_REGNO directly and thus does not work on other 
architecture.  Could anybody suggest a machine-independent approach?



Consider:

int main(int argc, char **argv) {
   int i, x = 42;
   for (i = 0; i < 1024; i++) {
 if (i % argc)
   x *= x;
 else
   x += 4;
   }
   printf("%d\n", x);

}

arm-unknown-linux-gnueabi-gcc -mthumb -mfloat-abi=softfp -mcpu=cortex-a8 
-mtune=cortex-a8 -O2 thumb-vfp-cond.c -o thumb-vfp-cond.s -S


Combine propagates compare into if_then_else pattern that matches with 
cbz/cbnz Thumb-2 patterns (there's no such problem in ARM mode), 
resulting in the following rtl:


(if_then_else (eq (reg:SI 1 r1 [+4 ])
 (const_int 0 [0]))
 (label_ref:SI 22)
 (pc))

Then if-conversion fails to match cond_exec instruction, because its 
predicate contains a comparison with regular reg instead of CC:


(insn 19 18 22 4 (cond_exec (ne (reg:SI 1 r1 [+4 ])
 (const_int 0 [0]))
 (set (reg/v:SI 5 r5 [orig:110 x ] [110])
 (mult:SI (reg/v:SI 5 r5 [orig:110 x ] [110])
 (reg/v:SI 5 r5 [orig:110 x ] [110] 
thumb-vfp-cond.c:8 -1

  (nil))

The attached patch allows if-conversion to "uncombine" such if_then_else 
patterns into two insn: 1) separate comparison that sets CC, and 2) 
if_then_else that uses it, which can be converted to cond_exec. This 
transformation is only applied in case if-conversion was successful.
Also, this fix works only for if_then_elses with both then and else 
blocks (converting only if-then and if-else blocks proved unprofitable).


Spec2000 results (ref)

testbasepeak %
 timetime
gzip289.07  290.4   -0.46
vpr 255.75  257.18  -0.56
gcc 120.17  119.08  0.91
mcf 427.3   425.81  0.35
crafty  156.56  152.53  2.57
parser  373.05  373.44  -0.10
eon 142.07  142.54  -0.33
perlbmk 217.38  218.2   -0.38
gap 138.77  138.56  0.15
vortex  260.64  260.43  0.08
bzip2   255.3   256.01  -0.28
twolf   433.83  433.98  -0.03
Geomean 234.34  233.96  0.16

The results on train are a bit better than on ref, but in both cases
crafty grows by 2-2.5%.
It was in total 4221 conditional instructions in Spec2000 before the
patch and 5772 after, so if-conversion succeeded about 30% more times.

Regtested on ARM, mostly ok except for one test 
(gcc.target/arm/pr46631.c) that will be investigated. I can't reproduce 
that by hand.


2013-08-02  Dmitry Plotnikov  

* ifcvt.c (cond_exec_process_if_block): Fix test_expr for if-then-elses
when it has comparison without CC reg.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f081ecd..d915170 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -441,6 +448,7 @@ cond_exec_process_if_block (ce_if_block_t * ce_info,
   rtx else_first_tail = NULL_RTX;  /* First match at the tail of ELSE */
   int then_n_insns, else_n_insns, n_insns;
   enum rtx_code false_code;
+  rtx uncombined_compare = NULL_RTX;

   /* If test is comprised of && or || elements, and we've failed at 
handling
  all of them together, just use the last test if it is the special 
case of

@@ -462,6 +470,26 @@ cond_exec_process_if_block (ce_if_block_t * ce_info,
   if (! test_expr)
 return FALSE;

+  /* Extract predicate for compare from combined if-then-else.  */
+
+  if (then_bb && else_bb
+  && REGNO (XEXP (test_expr,0)) != CC_REGNUM)
+  {
+  /* Test_expr with non CC reg, so we need to emit compare and make
+ new test_expr with CC reg.  */
+  rtx tmp_test_expr;
+  rtx tmp_cc_reg = gen_rtx_REG(CCmode, CC_REGNUM);
+
+  tmp_test_expr = gen_rtx_fmt_ee (GET_CODE (test_expr),
+  GET_MODE (test_expr),
+  tmp_cc_reg, const0_rtx);
+  uncombined_compare = gen_rtx_SET (GET_MODE (test_expr), tmp_cc_reg,
+gen_rtx_COMPARE (CCmode,
+ XEXP 
(test_expr,0),
+ XEXP 
(test_expr,1)));

+  test_expr = tmp_test_expr;
+  }
+
   /* If the conditional jump is more than just a conditional jump,
  then we can not do conditional execution conversion on this 
block.  */

   if (! onlyjump_p (BB_END (test_bb)))
@@ -683,6 +711,19 @@ cond_exec_process_if_block (ce_if_block_t * ce_info,
   IFCVT_MODIFY_FINAL (ce_info);
 #endif

+  /* Second part of compare extraction. We should emit compare insn 
here.  */

+  if (uncombined_compare != NULL_RTX)
+  {
+  rtx test_bb_

Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Andreas Krebbel
On 02/08/13 16:36, Peter Bergner wrote:
> On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote:
>> On 02/08/13 16:23, Peter Bergner wrote:
>> As long as libitm does not use FPRs itself this should be safe without
>> having tbegin clobbering FPRs.
> 
> Is it a given that s390 doesn't use FPRs without explicit use of
> floating point types?  I ask, because on POWER, we can and do 
> generate floating point code without explicit use of double,
> float, etc.  Maybe s390 is safe in that respect.

S/390 used to be on the safe side regarding this but with mainline we started 
using FPRs as GPR save
slots. As Richard suggested we should build libitm with -msoft-float in order 
to prevent this within
libitm. Unfortunately I forgot about this in my S/390 libitm patch. So I'll 
commit the following:


Index: libitm/configure.tgt
===
*** libitm/configure.tgt.orig   2013-07-30 07:42:29.0 +
--- libitm/configure.tgt2013-08-02 14:43:18.641206151 +
*** case "${target_cpu}" in
*** 109,115 
ARCH=x86
;;
s390|s390x)
!   XCFLAGS="${XCFLAGS} -mzarch -mhtm"
ARCH=s390
;;

--- 109,115 
ARCH=x86
;;
s390|s390x)
!   XCFLAGS="${XCFLAGS} -mzarch -mhtm -msoft-float"
ARCH=s390
;;



Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-02 Thread Teresa Johnson
On Fri, Aug 2, 2013 at 4:22 AM, Bernhard Reutner-Fischer
 wrote:
> On 1 August 2013 18:32, Teresa Johnson  wrote:
>> Patch 3 of 3 split out from the patch I sent in May that fixes problems with
>> -freorder-blocks-and-partition, with changes/fixes discussed in that thread.
>>
>> See http://gcc.gnu.org/ml/gcc-patches/2013-05/threads.html#00388 for context.
>>
>> This patch sanitizes the partitioning to address issues such as edge
>> weight insanities that sometimes occur due to upstream optimizations,
>> and ensures that hot blocks are not dominated by cold blocks. This
>> needs to be resanitized after certain cfg optimizations that may
>> cause hot blocks previously reached via both hot and cold paths to
>> only be reached by cold paths.
>>
>> The verification code in sanitize_dominator_hotness was contributed by
>> Steven Bosscher.
>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that
>> a profiledbootstrap passed with -freorder-blocks-and-partition enabled
>> (and with the dwarf version changed to 2 to work around PR57451).
>>
>> Ok for trunk?
>>
>> (I also included the patch as an attachment since my mailer invariably
>> messes up the formatting in the pasted version.)
>>
>> Thanks,
>> Teresa
>>
>> 2013-08-01  Teresa Johnson  
>> Steven Bosscher  
>>
>> * cfgrtl.c (fixup_bb_partition): New routine.
>> (commit_edge_insertions): Invoke fixup_partitions.
>> (find_partition_fixes): New routine.
>> (fixup_partitions): Ditto.
>> (verify_hot_cold_block_grouping): Update comments.
>> (rtl_verify_edges): Invoke find_partition_fixes.
>> (rtl_verify_bb_pointers): Update comments.
>> (rtl_verify_bb_layout): Ditto.
>> * basic-block.h (fixup_partitions): Declare.
>> * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
>> * bb-reorder.c (sanitize_dominator_hotness): New function.
>> (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
>> sanitize_dominator_hotness.
>>
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c (revision 201281)
>> +++ cfgrtl.c (working copy)
>> @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e)
>>  }
>>  }
>>
>> +/* Called when block BB has been reassigned to a different partition,
>> +   to ensure that the region crossing attributes are updated.  */
>> +
>> +static void
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> +{
>> +  fixup_partition_crossing (e);
>> +}
>> +
>> +  /* Possibly need to make bb's successor edges region crossing,
>> + or remove stale region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> +{
>> +  if ((e->flags & EDGE_FALLTHRU)
>> +  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> +  && e->dest != EXIT_BLOCK_PTR)
>> +force_nonfallthru (e);
>> +  else
>> +fixup_partition_crossing (e);
>> +}
>> +}
>> +
>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>> expense of adding new instructions or reordering basic blocks.
>>
>> @@ -1979,6 +2007,14 @@ commit_edge_insertions (void)
>>  {
>>basic_block bb;
>>
>> +  /* Optimization passes that invoke this routine can cause hot blocks
>> + previously reached by both hot and cold blocks to become dominated only
>> + by cold blocks. This will cause the verification below to fail,
>> + and lead to now cold code in the hot section. In some cases this
>> + may only be visible after newly unreachable blocks are deleted,
>> + which will be done by fixup_partitions.  */
>> +  fixup_partitions ();
>> +
>>  #ifdef ENABLE_CHECKING
>>verify_flow_info ();
>>  #endif
>> @@ -2173,6 +2209,101 @@ get_last_bb_insn (basic_block bb)
>>return end;
>>  }
>>
>> +/* Sanity check partition hotness to ensure that basic blocks in
>> +   the cold partition don't dominate basic blocks in the hot partition.
>> +   If FLAG_ONLY is true, report violations as errors. Otherwise
>> +   re-mark the dominated blocks as cold, since this is run after
>> +   cfg optimizations that may make hot blocks previously reached
>> +   by both hot and cold blocks now only reachable along cold paths.  */
>> +
>> +vec
>> +find_partition_fixes (bool flag_only)
>> +{
>> +  basic_block bb;
>> +  vec bbs_in_cold_partition = vNULL;
>> +  vec bbs_to_fix = vNULL;
>> +
>> +  if (!crtl->has_bb_partition)
>> +return vNULL;
>
> I'd push this early return into the callers instead, at most turn it into a
> gcc_checking_assert to be safe.
>
> Both callers, fixup_partitions and rtl_verify_edges, look at
> ctrl->has_bb_partition already before calling this, so the above should
> be dead already.

Right, I was being paranoid - changed to a gcc_checking_assert.

>
> Did my mailer somehow swallow the s

Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Torvald Riegel
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
> On 02/08/13 13:31, Torvald Riegel wrote:
> > On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
> >> Index: libitm/config/s390/target.h
> >> ===
> >> *** libitm/config/s390/target.h.orig
> >> --- libitm/config/s390/target.h
> >> ***
> > 
> > [...]
> > 
> >> + static inline uint32_t
> >> + htm_begin ()
> >> + {
> >> +   return __builtin_tbegin_nofloat (NULL);
> >> + } 
> > 
> > I'm not quite sure I understand why the nofloat variant is sufficient;
> > is this because we don't use FPRs in the abort/restart code of libitm
> > (and the FPRs are known to be clobbered after starting a txn)?
> 
> Yes. To my understanding there are 2 potential issues when the FPRs are not 
> saved.
> 
> 1. The abort code will read corrupted FPR content for FPRs:
> - which are live across tbegin and
> - are updated in the TX body and
> - are read in the abort code
> The abort code will see the updated value from the TX body.
> 
> Since libitm implements TX begins as function calls only call-saved registers 
> can be live across a
> tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
> get restored when doing
> the longjmp back into the user code. So this should be no problem.
> 
> 2. The other problem is that the ABI might get violated when aborting into a 
> callee which already
> returned and there are FPRs which have been modified afterwards. In that case 
> the compiler would not
> have generated FPR save/restore instructions in the function 
> prologue/epilogue of the callee. But in
> fact there are modified FPRs when exiting the second time.
> 
> But again this should be no problem thanks to the setjmp/longjmp semantics of 
> _ITM_beginTransaction
> and GTM_longjmp. If the TX body modified a call-saved FPR it will get 
> restored on the abort path
> back from libitm to the user code.
> 
> 
> As long as libitm does not use FPRs itself this should be safe without having 
> tbegin clobbering FPRs.

And I suppose that even if it would use FPRs, it shouldn't ever access
memory locations touched in user code.

Thanks for the explanation.  Some of this was "obvious", sorry, but
that's what happens when a mental context switch is incomplete :)  Next
time, I'd prefer a summary of such reasoning to be added as a comment in
the source; it's just too easy to forget about all the subtleties...

> > There is a bit in the properties passed to _ITM_beginTransaction which
> > states whether the txn has floating point update code
> > (pr_hasNoFloatUpdate).  Would this be useful?
> 
> This probably would be useful in the ITM_beginTransaction / GTM_longjmp 
> implementations which in
> that case could try to avoid the FPR save/restores. But Richard mentioned 
> that these bits so far are
> never set by GCC since it is lacking the required infos from the register 
> allocator.

Another thing that comes to mind is that it might be useful to put parts
of the HTM fast path into the asm pieces of _ITM_beginTransaction.  That
way, one can use exactly the amount of SW setjmp one needs given what's
easy to save/restore for the TM; this might decrease the startup costs
of a txn somewhat.  Andi Kleen posted a patch for how to do this for RTM
a while ago (but it had some issues):
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01189.html

Perhaps something along these lines (while avoiding the issues :) )
would be useful for s/390 and Power too.


Torvald



Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-02 Thread Jan Hubicka
> 
> 2013-08-01  Teresa Johnson  
> Steven Bosscher  
> 
> * cfgrtl.c (fixup_bb_partition): New routine.
> (commit_edge_insertions): Invoke fixup_partitions.
> (find_partition_fixes): New routine.
> (fixup_partitions): Ditto.
> (verify_hot_cold_block_grouping): Update comments.
> (rtl_verify_edges): Invoke find_partition_fixes.
> (rtl_verify_bb_pointers): Update comments.
> (rtl_verify_bb_layout): Ditto.
> * basic-block.h (fixup_partitions): Declare.
> * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
> * bb-reorder.c (sanitize_dominator_hotness): New function.
> (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
> sanitize_dominator_hotness.
> 
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 201281)
> +++ cfgrtl.c(working copy)
> @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e)
>  }
>  }
> 
> +/* Called when block BB has been reassigned to a different partition,
> +   to ensure that the region crossing attributes are updated.  */
> +
> +static void
> +fixup_bb_partition (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* Now need to make bb's pred edges non-region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +{
> +  fixup_partition_crossing (e);
> +}
> +
> +  /* Possibly need to make bb's successor edges region crossing,
> + or remove stale region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +{
> +  if ((e->flags & EDGE_FALLTHRU)
> +  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> +  && e->dest != EXIT_BLOCK_PTR)
> +force_nonfallthru (e);
> +  else
> +fixup_partition_crossing (e);
> +}
> +}

Is there particular reason why preds can not be fallhtrus and why
force_nonfallthru edge does not need partition crossing fixup?
(if so, perhpas it could be mentioned in the description, if not,
I think force_nonfallthru path has to check if new BB was introduced
and do the right thing on the edge.

> +/* Sanity check partition hotness to ensure that basic blocks in
> +   the cold partition don't dominate basic blocks in the hot partition.
> +   If FLAG_ONLY is true, report violations as errors. Otherwise
> +   re-mark the dominated blocks as cold, since this is run after
> +   cfg optimizations that may make hot blocks previously reached
> +   by both hot and cold blocks now only reachable along cold paths.  */

With profile, I suppose we can have cold blocks dominating hot blocks when the
hot blocks is in loop whose trip count is high enough.  Indeed for partitioning
reasons it does not make sense to push those into different section.

I also wonder, if we finally get the pass stable, can we enable it by default
and offline probably cold blocks w/o profile? Primarily blocks reachable only
by EH + blocks leading to a crash or throw().  For C++ those should be common
enough to make a difference...

Honza


Go patch committed: Always put descriptor wrappers in their own section

2013-08-02 Thread Ian Lance Taylor
This patch to the Go frontend always puts descriptor wrapper functions
in their own section.  This lets the linker GC them.  That is convenient
since we must generate a wrapper for every top-level exported function,
although most of them will never be used.  We can only know whether they
are used or not at link time.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.8 branch.

Ian

diff -r bc1ad842d946 go/gogo.cc
--- a/go/gogo.cc	Wed Jul 24 15:24:12 2013 -0700
+++ b/go/gogo.cc	Fri Aug 02 09:58:44 2013 -0700
@@ -3582,6 +3582,12 @@
   Named_object* dno = gogo->start_function(name, new_fntype, false, loc);
   dno->func_value()->is_descriptor_wrapper_ = true;
 
+  // Put the wrapper in a unique section so that it can be discarded
+  // by the linker if it is not needed.  Every top-level function will
+  // get a wrapper, in case there is a reference other than a call
+  // from some other package, but most will not need one.
+  dno->func_value()->set_in_unique_section();
+
   gogo->start_block(loc);
 
   Expression* fn = Expression::make_func_reference(no, NULL, loc);


Merge from 4.8 branch to gccgo branch

2013-08-02 Thread Ian Lance Taylor
I've merged revision 201443 from the GCC 4.8 branch to the gccgo branch.

Ian


Re: [PATCH/Merge Request] Vtable Verification feature.

2013-08-02 Thread Diego Novillo
On Thu, Aug 1, 2013 at 1:19 PM, Benjamin De Kosnik  wrote:

> You'll need a GWP to do the merge (Maybe Diego, Jason, Richard
> Henderson?) and then add yourself to MAINTAINERS for libvtv.

Eh, no.  We need a GWP to approve the final patch, but Caroline can
and should do the merge herself.  The easiest approach is to generate
a patch on an svn checkout of trunk.  Caroline and I discussed this
earlier today.  It goes something like this (on a git client):

$ git checkout -b vtv vtv
$ git merge origin/trunk
... fix merge conflicts ...
$ git commit
$ git diff trunk >patch
... Get rid of all the ChangeLog.vtv and configure diffs ...
... Use git log to find the svn revision REV for the local trunk git tree ...
$ git co svn://gcc.gnu.org/gcc/trunk@REV
$ patch -p1 

Re: [GOOGLE] Refactor AutoFDO

2013-08-02 Thread Xinliang David Li
More to follow.

David

>>static void
>> read_profile (void)
>> {
>>   if (gcov_open (auto_profile_file, 1) == 0)
>> {
>>   inform (0, "Cannot open profile file %s.", auto_profile_file);


 Should be at least warning instead -- I think error is probably more
appropriate -- this is different from regular FDO, where one missing
gcda might be ok.

>>   flag_auto_profile = 0;
>>   return;
>> }
>>
>>   if (gcov_read_unsigned () != GCOV_DATA_MAGIC)
>> {
>>   inform (0, "Magic number does not mathch.");

Should be an error.

>>   flag_auto_profile = 0;
>>   return;
>> }
>>
>>
>>   /* function_name_map.  */
>>   afdo_function_name_map = function_name_map::create ();
>>   if (afdo_function_name_map == NULL)
>> {
>>   inform (0, "Cannot read string table.");

Should be an error unless there is a way to recover. Falling back to
non-fdo is not the solution.

>>   flag_auto_profile = 0;
>>   return;
>> }
>>
>>   /* autofdo_source_profile.  */
>>   afdo_source_profile = autofdo_source_profile::create ();
>>   if (afdo_source_profile == NULL)
>> {
>>   inform (0, "Cannot read function profile.");

An error.

>>   flag_auto_profile = 0;
>>   return;
>> }
>>
>>   /* autofdo_module_profile.  */
>>   afdo_module_profile = autofdo_module_profile::create ();
>>   if (afdo_module_profile == NULL)
>> {
>>   inform (0, "Cannot read module profile.");

Should be an error.

>>   flag_auto_profile = 0;
>>   return;
>> }
>>
>>   /* Read in the working set.  */
>>   if (gcov_read_unsigned () != GCOV_TAG_AFDO_WORKING_SET)
>> {
>>   inform (0, "Not expected TAG.");

Error.

>>   unsigned curr_module = 1, max_group = PARAM_VALUE (PARAM_MAX_LIPO_GROUP);
>>   for (string_vector::const_iterator iter = aux_modules->begin();
>>iter != aux_modules->end(); ++iter)
>> {
>>   gcov_module_info *aux_module = afdo_module_profile->get_module (*iter);
>>   if (aux_module == module)
>> continue;
>>   if (aux_module == NULL)
>> {
>>  inform (0, "aux module %s cannot be found.", *iter);
>>  continue;
>> }
>>   if ((aux_module->lang & GCOV_MODULE_LANG_MASK) !=
>>  (module->lang & GCOV_MODULE_LANG_MASK))
>> {
>>  inform (0, "Not importing %s: source language"
>>  " different from primary module's source language", *iter);


Should be guarded with -fopt-info -- see similar code in coverage.c.

>>  continue;
>> }
>>   if ((aux_module->lang & GCOV_MODULE_ASM_STMTS)
>>   && flag_ripa_disallow_asm_modules)
>> {
>>  if (flag_opt_info)
>>inform (0, "Not importing %s: contains "
>>"assembler statements", *iter);

Use -fopt-info

>>  continue;
>> }
>>   if (max_group != 0 && curr_module == max_group)
>> {
>>  if (flag_opt_info)
>>inform (0, "Not importing %s: maximum group size reached", *iter);
>> }
>>   if (incompatible_cl_args (module, aux_module))
>> {
>>  if (flag_opt_info)
>>inform (0, "Not importing %s: command-line"
>>" arguments not compatible with primary module", *iter);
>>  continue;

Use -fopt-info.


>> }
>>   module_infos[curr_module++] = aux_module;
>>   add_input_filename (*iter);
>> }
>> }
>>
>> /* From AutoFDO profiles, find values inside STMT for that we want to measure
>>histograms for indirect-call optimization.  */
>>
>> {
>>   hist->hvalue.counters[3] = 0;
>>   hist->hvalue.counters[4] = 0;
>> }

It might be better to create a commmon API to create/update histogram
entry instead of peeking into the details of the data structure -- to
avoid future breaks. Such a change can be done as a follow up and
needs to be in trunk.

>> }
>>
>> /* From AutoFDO profiles, find values inside STMT for that we want to measure
>>histograms and adds them to list VALUES.  */
>>
>> static void
>> afdo_vpt (gimple stmt, const icall_target_map &map)
>> {
>>   afdo_indirect_call (stmt, map);
>> }
>>
>> /* For a given BB, return its execution count, and annotate value profile
>>on statements.  */
>>
>> static gcov_type
>> afdo_get_bb_count (basic_block bb)
>> {
>>   gimple_stmt_iterator gsi;
>>   gcov_type max_count = 0;
>>   bool has_annotated = false;
>>
>>   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> {
>>   count_info info;
>>   gimple stmt = gsi_stmt (gsi);
>>   if (afdo_source_profile->get_count_info (stmt, &info))
>> {

indentation.

>>  if (info.first > max_count)
>>max_count = info.first;
>>  has_annotated = true;
>>  if (info.second.size() > 0)
>>afdo_vpt (stmt, info.second);
>> }
>> }
>>   if (has_annotated)
>> {
>>   bb->flags |= BB_ANNOTATED;
>>   return max_count;
>> }
>>   else
>> return 0;
>> }
>>
>>
>> static void
>> afdo_annotate_cfg (void)
>> {
>>   basic_block bb;
>>   const function_instance *s =
>>   afdo_source_profile->get_function_instance_by_decl (
>>   current_function_decl);

need a new line after decls.

On Thu, Aug 1, 2013 at 2:49 PM, Dehao Chen  wrote:
>

Go patch committed: Always put immutable structs in unique section

2013-08-02 Thread Ian Lance Taylor
The Go backend interfaces uses immutable structs for things like type
descriptors, map descriptors, and function descriptors.  These objects
must always be created for exported names in case they are referenced by
a different package, but this often does not happen.  This patch moves
them to unique sections so that linker GC can discard them a link time.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.8 branch.

Ian


2013-08-02  Ian Lance Taylor  

* go-gcc.cc (immutable_struct_set_init): Always call
resolve_unique_section.


Index: go-gcc.cc
===
--- go-gcc.cc	(revision 201222)
+++ go-gcc.cc	(working copy)
@@ -1521,10 +1521,11 @@ Gcc_backend::immutable_struct_set_init(B
 	TREE_PUBLIC(decl) = 1;
 }
   else
-{
-  make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl));
-  resolve_unique_section(decl, 1, 0);
-}
+make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl));
+
+  // These variables are often unneeded in the final program, so put
+  // them in their own section so that linker GC can discard them.
+  resolve_unique_section(decl, 1, 1);
 
   rest_of_decl_compilation(decl, 1, 0);
 }


Re: [PATCH 10/11] Make gcc::context be GC-managed

2013-08-02 Thread David Malcolm
On Thu, 2013-08-01 at 11:28 -1000, Richard Henderson wrote:
> On 07/26/2013 05:04 AM, David Malcolm wrote:
> > +/* Functions relating to the garbage collector.  */
> > +void
> > +gcc::context::gt_ggc_mx ()
> > +{
> > +  /* Currently a no-op.  */
> > +}
> > +
> > +void
> > +gcc::context::gt_pch_nx ()
> > +{
> > +  /* Currently a no-op.  */
> > +}
> > +
> > +void
> > +gcc::context::gt_pch_nx (gt_pointer_operator op ATTRIBUTE_UNUSED,
> > +void *cookie ATTRIBUTE_UNUSED)
> > +{
> > +  /* Currently a no-op.  */
> > +}
> 
> I suppose these are members because you'll want to access private members
> later, and that's easier than playing with "friend"?

Exactly.  The patch adds a comment about this to gcc/context.h.

> > +void gt_ggc_mx (gcc::context *ctxt)
> > +{
> > +  ctxt->gt_ggc_mx ();
> > +}
> > +
> > +void gt_pch_nx (gcc::context *ctxt)
> > +{
> > +  ctxt->gt_pch_nx ();
> > +}
> > +
> > +void gt_pch_nx (gcc::context *ctxt, gt_pointer_operator op, void *cookie)
> > +{
> > +  ctxt->gt_pch_nx (op, cookie);
> > +}
> 
> Should these be inline functions in context.h, so that the call does direct?
> Or do we take their address, making that sorta pointless?
> 
> It seems like there's one level of indirection here that is avoidable...

It looks like they can be inlined - their addresses are not taken, and
they're only used by gtype-desc.c, inside the autogenerated functions
gt_ggc_mx_context, gt_pch_nx_context, and gt_pch_p_7context.

I'll make them inline in context.h



Merge from 4.8 branch to gccgo branch

2013-08-02 Thread Ian Lance Taylor
I merged 4.8 branch revision 201447 to the gccgo branch.

Ian


Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed

2013-08-02 Thread David Malcolm
On Thu, 2013-08-01 at 11:45 -1000, Richard Henderson wrote:
> On 07/26/2013 05:04 AM, David Malcolm wrote:
> > (opt_pass::gt_ggc_mx): New.
> > (opt_pass::gt_pch_nx): New.
> > (opt_pass::gt_pch_nx_with_op): New.
> > (gt_ggc_mx (opt_pass *)): New.
> > (gt_pch_nx (opt_pass *)): New.
> > (gt_pch_nx_opt_pass): New.
> > (pipeline::operator new): New.
> > (pipeline::gt_ggc_mx): New.
> > (pipeline::gt_pch_nx): New.
> > (pipeline::gt_pch_nx_with_op): New.
> > (gt_ggc_mx (pipeline *)): New.
> > (gt_pch_nx (pipeline *)): New.
> > (gt_pch_nx_pipeline): New.
> 
> I guess my previous comments about ::gt_ggc_mx vs class::gt_ggc_mx wrt
> the context structure apply as well to this patch.

For context.h, the global functions are used by autogenerated code in
gtype-desc.c, which calls them the first time the context is visited in
a gc or pch traversal; they merely call into the class, to avoid needing
friend decls.

For opt_pass and pass_manager, something different is going on.

For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO,
gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types
FOO=opt_pass and pass_manager.  Presumably this is because the types are
only visited by code in context.c

So the global functions for opt_pass and pass_manager are a hand-written
implementation of what gengtype would write; they are called *each time*
the entity is reached during a traversal.  The member functions are
called only the *first time* the entity is visited.

Does this need a descriptive comment in the source code?

Thanks
Dave



[PATCH] - C11 expressions and stdatomic.h - Just for current state

2013-08-02 Thread Andrew MacLeod

On 07/30/2013 12:49 PM, Andrew MacLeod wrote:
I split the original patch into some smaller hunks, and cleaned up a 
few bit and pieces here and there... following:



Not looking for a review, but posting progress to this point since I'm 
off for 2 weeks, and someone might want to play with this.


I re-examined implementing the atomic type as its own canonical type 
instead of a qualified variant, as well as some variations... and 
ultimately, I think the current qualified implementation I posted is 
probably the best one to work with.  Its very convenient to be able to 
get to the non-atomic type by using TYPE_MAIN_VARIANT...  Anyway, so the 
original 5 patches stand, but will require closer auditing . (starting 
at http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01478.html)


I've attached 2 patches anda  header file here.  One is the changes 
required for handling the C11 atomic expressions...  ie x += 4.2, and 
the other is a first cut at the content for the stdatomic.h include 
file, adna  required change for it.


The atomic expression patch works on many cases, but falls short with 
any kind of complexity on the RHS of an expression which needs to be 
mapped to an atomic_load.  Whilst parsing, we cant tell, and the rhs 
tends to become an expression before we really know. so In the comments 
I have my proposed solution which I will look at when I return in 2 weeks.
Basically, rather than have the ATOMIC_TYPE attribute ripple all the way 
through an expression, as soon as we see an ATOMIC_TYPE, we wrap it in a 
new node ATOMIC_EXPR, which holds the atomic expression, and sets it own 
expression type to the non-atomic variant.  This means the only 
expressions which will have the TYPE_ATOMIC flag set will be ones which 
are wrapped in  ATOMIC_EXPR, and those are the ones where we need to go 
and replace with an __atomic_load or whatever.


I have also punted on the floating point exception stuff which footnote 
113 requires.. we'll worry about that later.


Nevertheless, the patch that is there can do some interest8ing things... 
it can compile and execute this test file successfully as proof of 
concept... with all the right conmversions and use of lockfree routines 
when available.  (use -mcx16 on an x86_64 box, btw, or the __atomic_*_16 
routines will be unresolved )   complex types also work.  long double 
complex is 32 bytes in size, and will make calls into libatomic to 
resolve those operations.


double __attribute__((atomic)) a;
float __attribute__((atomic)) t;
long double __attribute__((atomic))  ld;
char __attribute__((atomic)) c;
int __attribute__((atomic)) i;
long long __attribute__((atomic)) l;
int g;
main ()
{
  g = 40;
  ld = t = a = 4.567;
  c = l = i = g + 2;
  printf ("%f == %f == %Lf == 4.567\n", t, a, ld);
  printf ("%d == %d == %ld == 42\n", c , i, l);
}


I have also attach a mockup of stdatomic.h...   I haven't even tried 
compiling or testing it yet, so it may have a syntax error or something, 
but I wanted to see if the functionality was possible.
THe only thing missing is the ability to find the non-atomic type of an 
atomic type.


Ive included an additional 2 line patch which should  change the meaning 
of __typeof__  (again untested, the joys of imminently leaving for 2 
weeks  :-).   Im not sure the normal practical uses of
__typeof__ have much meaning for an atomic type, it seems far more 
useful to have __typeof__ for an atomic qualified type to return the 
non-atomic variant.   If there is resistance to that definition, then 
I'll need to create a __nonatomic_typeof__ variant...  thats just a bit 
more work.  In any case, that patch is required for stdatomic.h to be 
implemented.  You can see where I used it to call the generic atomic 
operations with all the proper types for the non-atomic fields which 
need a temp vazr. The header file is actually quite short.  I don't 
know how to implement it other than with macro wrappers around the builtins.


And thats it. I'll be back in 2 weeks to get back at wrapping this, and 
adding some testcases and error reporting.


unless *YOU*  beat me to it :-)

Andrew





	* c/c-parser.c (c_parser_expr_no_commas): Call build_atomic_load on RHS.
	* c/c-typeck.c (build_modify_expr): Call build_atomic_assign for atomic
	expressions.
	(build_atomic_assign): New.  Construct appropriate sequences for
	atomic assignemnt operations.
	(build_atomic_load):  New.  Construct an atomic load.
	* c-family/c-common.h (build_atomic_load): Prototype.

Index: gcc/c/c-parser.c
===
*** gcc/c/c-parser.c	(revision 201248)
--- gcc/c/c-parser.c	(working copy)
*** c_parser_expr_no_commas (c_parser *parse
*** 5440,5445 
--- 5440,5472 
code = BIT_IOR_EXPR;
break;
  default:
+   /* TODO
+ 	 This doesnt work. Expressions like ~a where a is atomic do not function
+ 	 properly. since the rhs is parsed/consumed as an entire expression.  
+ 	 and at the t

Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed

2013-08-02 Thread Richard Henderson
On 08/02/2013 09:08 AM, David Malcolm wrote:
> For opt_pass and pass_manager, something different is going on.

I'd wondered about that.

> For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO,
> gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types
> FOO=opt_pass and pass_manager.  Presumably this is because the types are
> only visited by code in context.c
> 
> So the global functions for opt_pass and pass_manager are a hand-written
> implementation of what gengtype would write; they are called *each time*
> the entity is reached during a traversal.  The member functions are
> called only the *first time* the entity is visited.

I wonder if we can reduce the amount of boiler-plate for this.  Perhaps,

---
//
// These templates assume the presence of several member functions.
//

template
inline void gt_ggc_mx (T *p)
{
  if (ggc_test_and_set_mark (p))
p->gt_ggc_mx ();
}

template
void gt_pch_nx_with_obj(void *this_obj, void *p,
gt_pointer_operator op, void *cookie)
{
  if (p == this_obj)
{
  T *t = static_cast(p);
  t->gt_pch_nx_with_obj (op, cookie);
}
}

template
inline void gt_pch_nx (T *p)
{
  if (gt_pch_note_object (p, p, gt_pch_nx_with_obj))
p->gt_pch_nx ();
}

-

I had thought about an abstract base class instead of the templates, but
that would unnecessarily force the use of vtables in places that don't
need them.  In some cases this would be relatively harmless (e.g. the
very few pass_manager objects), but in others it might be a no-go.

The use of the template obviates that.  It ought only be instantiated when
necessary for gty user objects that don't have their own specialization.

Thoughts?


r~


Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Richard Henderson
On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
> !   XCFLAGS="${XCFLAGS} -mzarch -mhtm -msoft-float"

Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
floating point registers; similarly for the write accessors.


r~


Re: [PATCH, rs6000] Add builtin support for power8 32-bit Altivec multiply insns

2013-08-02 Thread David Edelsohn
On Thu, Aug 1, 2013 at 2:21 PM, Peter Bergner  wrote:
> This patch adds builtin support for the new 32-bit Altivec multiply
> instructions that were added in ISA 2.07 (ie, POWER8).
>
> This passed bootstrap and regtesting with no errors.  Ok for mainline?

Peter,

The builtins and patterns seem to be in a random order. Sometimes
signed is first and sometimes unsigned is first.  Please make the
ordering of the names consistent.

If these operations are multiplies, why are they described as UNSPEC
instead of MULT with SELECT?

Thanks, David


Re: [PATCH, rs6000] Add builtin support for power8 32-bit Altivec multiply insns

2013-08-02 Thread Peter Bergner
On Fri, 2013-08-02 at 17:13 -0400, David Edelsohn wrote:
> The builtins and patterns seem to be in a random order. Sometimes
> signed is first and sometimes unsigned is first.  Please make the
> ordering of the names consistent.

Ok, I can clean that up.


> If these operations are multiplies, why are they described as UNSPEC
> instead of MULT with SELECT?

That's a good question.  I was just following how the vmul[eo][su][bh]
define_insns are defined and assumed they were that way for a reason.
Looking at the history of altivec.md, they seem to have been that
way since day 1 when Aldy created the file.

Peter





Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed

2013-08-02 Thread David Malcolm
On Fri, 2013-08-02 at 10:01 -1000, Richard Henderson wrote:
> On 08/02/2013 09:08 AM, David Malcolm wrote:
> > For opt_pass and pass_manager, something different is going on.
> 
> I'd wondered about that.
> 
> > For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO,
> > gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types
> > FOO=opt_pass and pass_manager.  Presumably this is because the types are
> > only visited by code in context.c
> > 
> > So the global functions for opt_pass and pass_manager are a hand-written
> > implementation of what gengtype would write; they are called *each time*
> > the entity is reached during a traversal.  The member functions are
> > called only the *first time* the entity is visited.
> 
> I wonder if we can reduce the amount of boiler-plate for this.  Perhaps,
> 
> ---
> //
> // These templates assume the presence of several member functions.
> //
> 
> template
> inline void gt_ggc_mx (T *p)
> {
>   if (ggc_test_and_set_mark (p))
> p->gt_ggc_mx ();
> }
> 
> template
> void gt_pch_nx_with_obj(void *this_obj, void *p,
> gt_pointer_operator op, void *cookie)
> {
>   if (p == this_obj)
> {
>   T *t = static_cast(p);
>   t->gt_pch_nx_with_obj (op, cookie);
> }
> }
> 
> template
> inline void gt_pch_nx (T *p)
> {
>   if (gt_pch_note_object (p, p, gt_pch_nx_with_obj))
> p->gt_pch_nx ();
> }
> 
> -
> 
> I had thought about an abstract base class instead of the templates, but
> that would unnecessarily force the use of vtables in places that don't
> need them.  In some cases this would be relatively harmless (e.g. the
> very few pass_manager objects), but in others it might be a no-go.
> 
> The use of the template obviates that.  It ought only be instantiated when
> necessary for gty user objects that don't have their own specialization.

I like this approach, and I'm trying a bootstrap now with the templates.
I changed the "with_obj" to "with_op", on the grounds that you always
have an obj, but you don't always have an op.

FWIW I had a go at avoiding templates by attempting to tell gengtype to
write out functions for all GTY((user)) types, regardless of whether it
thinks they're referenced, with this:
@@ -3697,7 +3697,8 @@ write_types (outf_p output_header, type_p structures, 
type_p param_structs,
   /* At last we emit the functions code.  */
   oprintf (output_header, "\n/* functions code */\n");
   for (s = structures; s; s = s->next)
-if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO)
+if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO
+|| s->kind == TYPE_USER_STRUCT)
   {
options_p opt;
 
but I ran into various "incomplete structure" errors due to gengtype's
lack of a full C/C++ parser.

Hence templates seem the sanest option.




Re: [PATCH, rs6000] Add builtin support for power8 32-bit Altivec multiply insns

2013-08-02 Thread Michael Meissner
On Fri, Aug 02, 2013 at 04:31:37PM -0500, Peter Bergner wrote:
> On Fri, 2013-08-02 at 17:13 -0400, David Edelsohn wrote:
> > The builtins and patterns seem to be in a random order. Sometimes
> > signed is first and sometimes unsigned is first.  Please make the
> > ordering of the names consistent.
> 
> Ok, I can clean that up.
> 
> 
> > If these operations are multiplies, why are they described as UNSPEC
> > instead of MULT with SELECT?
> 
> That's a good question.  I was just following how the vmul[eo][su][bh]
> define_insns are defined and assumed they were that way for a reason.
> Looking at the history of altivec.md, they seem to have been that
> way since day 1 when Aldy created the file.

Given the age of the original mods, either vec_select didn't exist back then,
or Geoff just liked unspec.  I've cleaned up a few in altivec.md over the
years.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



patch to fix PR58048

2013-08-02 Thread Vladimir Makarov

The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58048

LRA has a correct behaviour (as reload) on the test with -O0.  But with 
-O2 a constant propagation into asm operands resulted in LRA cycling 
instead of error reporting.


The patch was successfully bootstrapped and tested on x86/x86-64.

Committed as rev. 201454.

2013-08-02  Vladimir Makarov  

PR rtl-optimization/58048
* lra-constraints.c (process_alt_operands): Don't check asm
operand on register.

2013-08-02  Vladimir Makarov  

PR rtl-optimization/58048
* gcc.target/i386/pr58048.c: New.

Index: lra-constraints.c
===
--- lra-constraints.c   (revision 201438)
+++ lra-constraints.c   (working copy)
@@ -1892,7 +1892,7 @@ process_alt_operands (int only_alternati
 
  /* For asms, verify that the class for this alternative is 
possible
 for the mode that is specified.  */
- if (!no_regs_p && REG_P (op) && INSN_CODE (curr_insn) < 0)
+ if (!no_regs_p && INSN_CODE (curr_insn) < 0)
{
  int i;
  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
Index: testsuite/gcc.target/i386/pr58048.c
===
--- testsuite/gcc.target/i386/pr58048.c (revision 0)
+++ testsuite/gcc.target/i386/pr58048.c (working copy)
@@ -0,0 +1,11 @@
+/* PR target/58048 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void
+div3 (void)
+{
+double tmp1;
+
+asm volatile ("fscale":"=t" (tmp1):"0" (0), "u" (0)); /* { dg-error 
"inconsistent operand constraints in an 'asm'"  } */
+}


Re: [libgo PATCH 0/1] mksyscalls.awk: make split ERE more portable

2013-08-02 Thread Ian Lance Taylor
On Thu, Aug 1, 2013 at 5:36 AM, Bernhard Reutner-Fischer
 wrote:
>
> When using busybox' awk to bootstrap, libgo's syscalls are generated
> incorrectly.
>
> I'm attaching the split() used by busybox' awk including output before
> and after the patch for reference.
>
> Please install / ok to install?

I installed a slightly different version, that retains the desired
semantics (skip arbitrary amounts of whitespace).  This should avoid
confusing the busybox awk by matching the null string.  Committed to
mainline and 4.8 branch.

Thanks for reporting the issue.

Ian


foo.patch
Description: Binary data


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-02 Thread Steven Bosscher
On Fri, Aug 2, 2013 at 5:05 PM, Jan Hubicka  wrote:
>> +/* Called when block BB has been reassigned to a different partition,
>> +   to ensure that the region crossing attributes are updated.  */
>> +
>> +static void
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> +{
>> +  fixup_partition_crossing (e);
>> +}
>> +
>> +  /* Possibly need to make bb's successor edges region crossing,
>> + or remove stale region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> +{
>> +  if ((e->flags & EDGE_FALLTHRU)
>> +  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> +  && e->dest != EXIT_BLOCK_PTR)
>> +force_nonfallthru (e);
>> +  else
>> +fixup_partition_crossing (e);
>> +}
>> +}
>
> Is there particular reason why preds can not be fallhtrus

Yes, by definition a crossing edge cannot fall through. There is
always a control transfer from one section to another.


>> +/* Sanity check partition hotness to ensure that basic blocks in
>> +   the cold partition don't dominate basic blocks in the hot partition.
>> +   If FLAG_ONLY is true, report violations as errors. Otherwise
>> +   re-mark the dominated blocks as cold, since this is run after
>> +   cfg optimizations that may make hot blocks previously reached
>> +   by both hot and cold blocks now only reachable along cold paths.  */
>
> With profile, I suppose we can have cold blocks dominating hot blocks when the
> hot blocks is in loop whose trip count is high enough.

That is the common case, actually.

>  Indeed for partitioning
> reasons it does not make sense to push those into different section.

The partitioning algrorithm makes sure this doesn't happen. The
hottest path from the entry block to a hot basic block is always part
of the hot partition.


> I also wonder, if we finally get the pass stable, can we enable it by default
> and offline probably cold blocks w/o profile?

That is the general idea behind all this work, obviously ;-)

> Primarily blocks reachable only
> by EH + blocks leading to a crash or throw().  For C++ those should be common
> enough to make a difference...

Yup, and IIRC Theresa posted some numbers that showed this.

Ciao!
Steven


Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed

2013-08-02 Thread Richard Henderson
On 08/02/2013 11:53 AM, David Malcolm wrote:
> FWIW I had a go at avoiding templates by attempting to tell gengtype to
> write out functions for all GTY((user)) types, regardless of whether it
> thinks they're referenced, with this:
> @@ -3697,7 +3697,8 @@ write_types (outf_p output_header, type_p structures, 
> type_p param_structs,
>/* At last we emit the functions code.  */
>oprintf (output_header, "\n/* functions code */\n");
>for (s = structures; s; s = s->next)
> -if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO)
> +if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO
> +|| s->kind == TYPE_USER_STRUCT)
>{
> options_p opt;
>  
> but I ran into various "incomplete structure" errors due to gengtype's
> lack of a full C/C++ parser.
> 
> Hence templates seem the sanest option.

I was actually contemplating going the other direction -- have gengtype
write out _less_, relying on the templates more.  I hadn't gone very far
down that road yet...


r~



Re: [PATCH 3.1/11] Explicitly initialize the macro-generated pass fields (was Re: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)

2013-08-02 Thread David Malcolm
On Thu, 2013-08-01 at 13:13 -0400, David Malcolm wrote:
> On Mon, 2013-07-29 at 15:41 -0600, Jeff Law wrote:
> > On 07/26/2013 09:04 AM, David Malcolm wrote:
> > > This patch is the hand-written part of the conversion of passes from
> > > C structs to C++ classes.  It does not work without the subsequent
> > > autogenerated part, which is huge.
> > [ ... ]
> > With the changes from pipeline -> pass_manager this is fine.  As is the 
> > follow-up autogenerated patch.
> 
> Thanks.
> 
> The current status is that Jeff has approved patches 1-5 for the trunk,
> and patches 6-11 haven't yet been reviewed by a global reviewer.  I have
> committed patches 1 and 2, having bootstrapped+tested each in turn, but
> I can't commit any more of these without further review - details
> follow.
> 
> I had only bootstrapped and tested the combination of all 11 patches
> together, so I've been attempting to test the approved patches.  For
> reference:
>   * Patch 3 is the handwritten part of the conversion of passes to C++
> classes
>   * Patch 4 is the autogenerated part
>   * Patch 5 adds -fno-rtti to the testsuite when building plugins
>   * Patch 6 is the not-yet-approved patch currently named "Rewrite how
> instances of passes are cloned" (but that's not all it does, see below).
> 
> I had thought that the combination of patch 3 + 4 + 5 would work as a
> unit, and hoped to commit each of these after testing the combination of
> (3, 4, 5), but upon attempting a bootstrap I found two bugs:
> 
>   (a) patch 3 adds an gcc_assert to the pass_manager constructor to
> ensure that pass instance fields are NULL when created, to ensure that
> the macro-driven logic is correct.   However, the instance fields
> haven't been explicitly initialized at that point, leading to sporadic
> assertion failures.  This wasn't an issue for the full patch series
> since patch 11 adds an operator new for pass_manager, allocating it from
> the GC heap, using the *cleared* allocator:
>void*
>pass_manager::operator new (size_t sz)
>{
>  return ggc_internal_cleared_alloc_stat (sz MEM_STAT_INFO);
>}
> hence ensuring that the fields are zeroed.  So patch 3 works with patch
> 11, but not without.  In the meantime, I'm attaching a new patch "3.1",
> which explicitly zeroes these fields early on in the pass_manager ctor.
> 
>   (b) with just these patches, although static_pass_number for every
> pass instance is correct, the dumpfile *switch* numbering is wrong,
> leading to numerous testsuite failures (e.g. "unrecognized command line
> option '-fdump-tree-dce2'") .  Patch 6 is needed: it does the necessary
> fixups at the right time to ensure that the per-pass-instance dump files
> get the correct switch names (I'll add some more notes on this to that
> email).
> 
> With the attached patch, the combination of patches (03, 03.1, 04, 05
> *and* 06) successfully bootstraps on x86_64-unknown-linux-gnu, and all
> testcases show the same results as an unpatched build (relative to
> r201397).
> 
> So I'm looking for review of the attached patch, and of at least patch 6
> before I can proceed with this.  AIUI, Jeff is away at the moment, so
> another global reviewer would need to do it.
Here's a better version of this patch: instead of explicitly
initializing the fields (which would become redundant when patch 11 goes
in), instead this patch adds an operator new, using xcalloc.  This
operator new gets reimplemented in a later version of patch 11 to use
ggc_internal_cleared_alloc_stat when the class becomes GC-managed, hence
at each stage of committing, the initialization is sane.

I've successfully bootstrapped the *end result* of the revised patch
series 3-11 on x86_64-unknown-linux-gnu: all testcases show the same
results as an unpatched build (relative to r201397).

OK for trunk? (as part of patches 3-6)

>From 3198835b85b9a35906bf93ba8f510762e8e017b9 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Wed, 31 Jul 2013 14:20:07 -0400
Subject: [PATCH 04/15] Zero-initialize pass_manager

Ensure that pass_manager instances are fully zero-initialized, by
providing an operator new, implemented using xcalloc.

gcc/

	* passes.c (pass_manager::operator new): New.
---
 gcc/pass_manager.h | 2 ++
 gcc/passes.c   | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index ea078a5..00f0b1c 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -47,6 +47,8 @@ class context;
 class pass_manager
 {
 public:
+  void *operator new (size_t sz);
+
   pass_manager(context *ctxt);
 
   void register_pass (struct register_pass_info *pass_info);
diff --git a/gcc/passes.c b/gcc/passes.c
index fcbd630..8efce30 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1339,6 +1339,13 @@ pass_manager::register_pass (struct register_pass_info *pass_info)
 -> all_passes
 */
 
+void *
+pass_manager::operator new (size_t sz)
+{
+  /* Ensure that all fields of the pass manager are zero-initialized.  */
+  retu

Updated patch 10 (was Re: [PATCH 10/11] Make gcc::context be GC-managed)

2013-08-02 Thread David Malcolm
On Fri, 2013-08-02 at 14:31 -0400, David Malcolm wrote:
> On Thu, 2013-08-01 at 11:28 -1000, Richard Henderson wrote:
> > On 07/26/2013 05:04 AM, David Malcolm wrote:
> > > +/* Functions relating to the garbage collector.  */
> > > +void
> > > +gcc::context::gt_ggc_mx ()
> > > +{
> > > +  /* Currently a no-op.  */
> > > +}
> > > +
> > > +void
> > > +gcc::context::gt_pch_nx ()
> > > +{
> > > +  /* Currently a no-op.  */
> > > +}
> > > +
> > > +void
> > > +gcc::context::gt_pch_nx (gt_pointer_operator op ATTRIBUTE_UNUSED,
> > > +  void *cookie ATTRIBUTE_UNUSED)
> > > +{
> > > +  /* Currently a no-op.  */
> > > +}
> > 
> > I suppose these are members because you'll want to access private members
> > later, and that's easier than playing with "friend"?
> 
> Exactly.  The patch adds a comment about this to gcc/context.h.
> 
> > > +void gt_ggc_mx (gcc::context *ctxt)
> > > +{
> > > +  ctxt->gt_ggc_mx ();
> > > +}
> > > +
> > > +void gt_pch_nx (gcc::context *ctxt)
> > > +{
> > > +  ctxt->gt_pch_nx ();
> > > +}
> > > +
> > > +void gt_pch_nx (gcc::context *ctxt, gt_pointer_operator op, void *cookie)
> > > +{
> > > +  ctxt->gt_pch_nx (op, cookie);
> > > +}
> > 
> > Should these be inline functions in context.h, so that the call does direct?
> > Or do we take their address, making that sorta pointless?
> > 
> > It seems like there's one level of indirection here that is avoidable...
> 
> It looks like they can be inlined - their addresses are not taken, and
> they're only used by gtype-desc.c, inside the autogenerated functions
> gt_ggc_mx_context, gt_pch_nx_context, and gt_pch_p_7context.
> 
> I'll make them inline in context.h

Here's an updated version of the patch which makes them inline.

I've successfully bootstrapped the *end result* of the revised patch
series 3-11 (plus the patch "3.1" from [1]) on x86_64-unknown-linux-gnu:
all testcases show the same results as an unpatched build (relative to
r201397).

OK for trunk?

[1] http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00118.html
>From f5fba88b187c68f92e3c797b83a3ff4b6e6c2e48 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Tue, 30 Jul 2013 12:30:15 -0400
Subject: [PATCH 11/15] Make gcc::context be GC-managed

This patch makes gcc::context instances be allocated within the GC-heap,
and adds traversal hooks for GC/PCH so that a gcc::context can own refs
to other GC-allocated objects.

gcc/
	* Makefile.in (GTFILES): Add context.h.
	* context.c (gcc::context::operator new): New.
	(gcc::context::gt_ggc_mx): New.
	(gcc::context::gt_pch_nx): New.
	(gcc::context::gt_pch_nx): New.
	* context.h (gcc::context): Add GTY((user)) marking.
	(gcc::context::operator new): New.
	(gcc::context::gt_ggc_mx): New.
	(gcc::context::gt_pch_nx): New.
	(gcc::context::gt_pch_nx): New.
	(g): Add GTY marking.
	(gt_ggc_mx (gcc::context *)): New.
	(gt_pch_nx (gcc::context *)): New.
	(gt_pch_nx (gcc::context *ctxt, gt_pointer_operator op,
	void *cookie)): New.
	* gengtype.c (open_base_files) : Add context.h.
---
 gcc/Makefile.in |  1 +
 gcc/context.c   | 26 +
 gcc/context.h   | 59 +++--
 gcc/gengtype.c  |  2 +-
 4 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index afce540..61a4d7c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3819,6 +3819,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/ipa-inline.h \
   $(srcdir)/asan.c \
   $(srcdir)/tsan.c \
+  $(srcdir)/context.h \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/context.c b/gcc/context.c
index b515241..ba6f335 100644
--- a/gcc/context.c
+++ b/gcc/context.c
@@ -27,7 +27,33 @@ along with GCC; see the file COPYING3.  If not see
 /* The singleton holder of global state: */
 gcc::context *g;
 
+void *
+gcc::context::operator new (std::size_t size)
+{
+  return ggc_internal_cleared_alloc_stat (size MEM_STAT_INFO);
+}
+
 gcc::context::context()
 {
   passes_ = new gcc::pass_manager (this);
 }
+
+/* Functions relating to the garbage collector.  */
+void
+gcc::context::gt_ggc_mx ()
+{
+  /* Currently a no-op.  */
+}
+
+void
+gcc::context::gt_pch_nx ()
+{
+  /* Currently a no-op.  */
+}
+
+void
+gcc::context::gt_pch_nx (gt_pointer_operator op ATTRIBUTE_UNUSED,
+			 void *cookie ATTRIBUTE_UNUSED)
+{
+  /* Currently a no-op.  */
+}
diff --git a/gcc/context.h b/gcc/context.h
index 66260cd..2211dc4 100644
--- a/gcc/context.h
+++ b/gcc/context.h
@@ -27,11 +27,30 @@ class pass_manager;
 /* GCC's internal state can be divided into zero or more
"parallel universe" of state; an instance of this class is one such
context of state.  */
-class context
+class GTY((user)) context
 {
 public:
+  /* Ensure that instances are allocated within the GC-heap.  */
+  void *operator new (std::size_t size);
+
   context();
 
+  /* Garbage-collector integration.
+
+ Each context assumes it has full control of th

Updated patch (was Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed)

2013-08-02 Thread David Malcolm
On Fri, 2013-08-02 at 10:01 -1000, Richard Henderson wrote:
> On 08/02/2013 09:08 AM, David Malcolm wrote:
> > For opt_pass and pass_manager, something different is going on.
> 
> I'd wondered about that.
> 
> > For some reason gengtype doesn't generate the triad of gt_ggc_mx_FOO,
> > gt_pch_nx_FOO, gt_pch_p_NFOO functions in gtype-desc.c, for types
> > FOO=opt_pass and pass_manager.  Presumably this is because the types are
> > only visited by code in context.c
> > 
> > So the global functions for opt_pass and pass_manager are a hand-written
> > implementation of what gengtype would write; they are called *each time*
> > the entity is reached during a traversal.  The member functions are
> > called only the *first time* the entity is visited.
> 
> I wonder if we can reduce the amount of boiler-plate for this.  Perhaps,
> 
> ---
> //
> // These templates assume the presence of several member functions.
> //
> 
> template
> inline void gt_ggc_mx (T *p)
> {
>   if (ggc_test_and_set_mark (p))
> p->gt_ggc_mx ();
> }
> 
> template
> void gt_pch_nx_with_obj(void *this_obj, void *p,
> gt_pointer_operator op, void *cookie)
> {
>   if (p == this_obj)
> {
>   T *t = static_cast(p);
>   t->gt_pch_nx_with_obj (op, cookie);
> }
> }
> 
> template
> inline void gt_pch_nx (T *p)
> {
>   if (gt_pch_note_object (p, p, gt_pch_nx_with_obj))
> p->gt_pch_nx ();
> }
> 
> -
> 
> I had thought about an abstract base class instead of the templates, but
> that would unnecessarily force the use of vtables in places that don't
> need them.  In some cases this would be relatively harmless (e.g. the
> very few pass_manager objects), but in others it might be a no-go.
> 
> The use of the template obviates that.  It ought only be instantiated when
> necessary for gty user objects that don't have their own specialization.

Thanks.

Here's an updated version of the patch which uses your template idea,
removing the handcoded functions.

I put the templates in ggc.h, with a big comment.  I changed the
"with_obj" to "with_op", on the grounds that you always have an obj, but
you don't always have an op. 

I stepped through a collection, and observed gcc::context::gt_ggc_mx
calling gt_ggc_mx, which then called
gt_ggc_mx.

I've successfully bootstrapped the *end result* of the revised patch
series 3-11 (plus the patch "3.1" from [1], which upsets the numbering)
on x86_64-unknown-linux-gnu, and ran the testsuite: all testcases show
the same results as an unpatched build (relative to r201397).

OK for trunk?

[1] http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00118.html
>From 2e3002e597eb7cfdbfb274ede7c02c0510e35e70 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Fri, 2 Aug 2013 15:58:46 -0400
Subject: [PATCH 12/15] Make opt_pass and gcc::pass_manager be GC-managed

This patch makes gcc::pass_manager and opt_pass instances be allocated
within the GC-heap, and adds traversal hooks for GC/PCH, so that passes
can own refs to other GC-allocated objects.

It also adds templates to ggc.h, to create boilerplate for GTY((user))
types that gengtype can't handle.

gcc/
	Make opt_pass and gcc::pass_manager be GC-managed, so that pass
	instances can own GC refs.

	* Makefile.in (GTFILES): Add pass_manager.h and tree-pass.h.
	* context.c (gcc::context::gt_ggc_mx): Traverse passes_.
	(gcc::context::gt_pch_nx): Likewise.
	(gcc::context::gt_pch_nx):  Likewise.
	* ggc.h (gt_ggc_mx ): New.
	(gt_pch_nx_with_op ): New.
	(gt_pch_nx ): New.
	* passes.c (opt_pass::gt_ggc_mx): New.
	(opt_pass::gt_pch_nx): New.
	(opt_pass::gt_pch_nx_with_op): New.
	(pass_manager::gt_ggc_mx): New.
	(pass_manager::gt_pch_nx): New.
	(pass_manager::gt_pch_nx_with_op): New.
	(pass_manager::operator new): Use
	ggc_internal_cleared_alloc_stat rather than xcalloc.
	* pass_manager.h (class pass_manager): Add GTY((user)) marking.
	(pass_manager::gt_ggc_mx): New.
	(pass_manager::gt_pch_nx): New.
	(pass_manager::gt_pch_nx_with_op): New.
	* tree-pass.h (class opt_pass): Add GTY((user)) marking.
	(opt_pass::operator new): New.
	(opt_pass::gt_ggc_mx): New.
	(opt_pass::gt_pch_nx): New.
	(opt_pass::gt_pch_nx_with_op): New.
---
 gcc/Makefile.in|   2 +
 gcc/context.c  |   6 +--
 gcc/ggc.h  |  46 +
 gcc/pass_manager.h |   9 -
 gcc/passes.c   | 116 -
 gcc/tree-pass.h|  13 +-
 6 files changed, 184 insertions(+), 8 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 61a4d7c..df24bdc 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3820,6 +3820,8 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/asan.c \
   $(srcdir)/tsan.c \
   $(srcdir)/context.h \
+  $(srcdir)/pass_manager.h \
+  $(srcdir)/tree-pass.h \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corre

GDB hooks for debugging GCC

2013-08-02 Thread David Malcolm
GDB 7.0 onwards supports hooks written in Python to improve the
quality-of-life within the debugger.  The best known are the
pretty-printing hooks [1], which we already use within libstdc++ for
printing better representations of STL containers.

I've written debug hooks for when the inferior process is *GCC*, which
I'm attaching.

Here's a debug session of cc1 using them, with some comments added by
hand:

[david@surprise gcc]$ gdb \
  -iex "add-auto-load-safe-path ." \
  --args ./cc1 foo.c -O3
(..snip gdb startup...)
Successfully loaded GDB hooks for GCC
(gdb) break expand_gimple_stmt
Breakpoint 5 at 0x68e94a: file ../../src/gcc/cfgexpand.c, line 2357.

(gdb) run
Breakpoint 5, expand_gimple_stmt (stmt=
) at ../../src/gcc/cfgexpand.c:2357
2357  location_t saved_location = input_location;

(gdb) bt
#0  expand_gimple_stmt (stmt=)
at ../../src/gcc/cfgexpand.c:2357
  (note how stmt isn't just an address, it tells you the type of stmt)

#1  0x00694dc6 in expand_gimple_basic_block (bb=
, disable_tail_calls=false)
at ../../src/gcc/cfgexpand.c:4204

  (note how the index of the basic block is given)

#2  0x00696855 in gimple_expand_cfg ()
at ../../src/gcc/cfgexpand.c:4723
#3  0x009b59ad in execute_one_pass (pass=
)
at ../../src/gcc/passes.c:1995
#4  0x009b5bb7 in execute_pass_list (pass=
)
at ../../src/gcc/passes.c:2047

  (note how the name and static_pass_number of the pass is given)


#5  0x006c288a in expand_function (node=
)
at ../../src/gcc/cgraphunit.c:1594

  (note how the cgraph_node* is printed with the decl name, so you can
easily figure out which function you're in)

(gdb) fin
Run till exit from #0  expand_gimple_stmt (stmt=
) at ../../src/gcc/cfgexpand.c:2357
0x00694dc6 in expand_gimple_basic_block (bb=
, disable_tail_calls=false)
at ../../src/gcc/cfgexpand.c:4204
4204  last = expand_gimple_stmt (stmt);
Value returned is $3 = (note 9 8 10 [bb 3] NOTE_INSN_BASIC_BLOCK)

  (note how the rtx_def* is printed inline.  This last one is actually a
kludge; all the other pretty-printers work inside gdb, whereas this one
injects a call into print-rtl.c into the inferior).

Edges are printed giving the endpoints:

  (gdb) p e
  $1 =  6)>

Note that this doesn't eliminate the .gdbinit script that some people
use, but it's arguably far superior, in that it happens automatically,
and for all values that are printed e.g. in backtraces, viewing struct
fields, etc.

I'd like to get this into trunk.  Some remaining issues:
  * installation; currently one has to manually copy it into place, as
noted above; it would be nice to automate things so that during a build
it gets copied to cc1-gdb.py, cc1plus-gdb.py etc
  * it hardcoded values in a few places rather than correctly looking up
enums
  * the kludge in the RTX printer mentioned above.

Hope this is helpful
Dave

[1] http://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html

"""
Installation

gdb automatically looks for python files with a -gdb.py suffix relative
to executables and DSOs.

Currently you need to copy up this file to e.g. "cc1-gdb.py":

  cp ../../src/gcc/gdb-hooks.py cc1-gdb.py

You may see a message from gdb of the form:
  cc1-gdb.py auto-loading has been declined by your `auto-load safe-path'
as a protection against untrustworthy python scripts.  See
  http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html

A workaround is to add:
  -iex "add-auto-load-safe-path ."
to the gdb invocation, to mark the current directory as trustworthy.

During development, I've been manually invoking the code in this way:

  cp ../../src/gcc/gdb-hooks.py ./cc1-gdb.py \
 && gdb \
  -iex "add-auto-load-safe-path ." \
  -ex "break expand_gimple_stmt" \
  -ex "run" \
  -ex "bt" \
  --args \
./cc1 foo.c -O3

Examples of output using the pretty-printers

Pointer values are generally shown in the form:
  

For example, an opt_pass* might appear as:
  (gdb) p pass
  $2 = 

The name of the pass is given ("expand"), together with the
static_pass_number.

Note that you can suppress pretty-printers using /r (for "raw"):
  (gdb) p /r pass
  $3 = (opt_pass *) 0x188b600
and dereference the pointer in the normal way:
  (gdb) p *pass
  $4 = {type = RTL_PASS, name = 0x120a312 "expand",
  [etc, ...snipped...]

Basic blocks are shown with their index in parentheses, apart from the
CFG's entry and exit blocks, which are given as "ENTRY" and "EXIT":
  (gdb) p bb
  $9 = 
  (gdb) p cfun->cfg->x_entry_block_ptr
  $10 = 
  (gdb) p cfun->cfg->x_exit_block_ptr
  $11 = 

CFG edges are shown with the src and dest blocks given in parentheses:
  (gdb) p e
  $1 =  6)>

Tree nodes are printed using Python code that emulates print_node_brief,
running in gdb, rather than in the inferior:
  (gdb) p cfun->decl
  $1 = 
For usability, the type is printed 

New parameters to control stringop expansion libcall strategy

2013-08-02 Thread Xinliang David Li
On x86_64, when the expected size of memcpy/memset is known (e.g, with
FDO), libcall strategy is used with the size is > 8192. This value is
hard coded, which makes it hard to do performance tuning. This patch
adds two new parameters to do that. Potential usage includes
per-application libcall strategy min-size tuning based on summary data
with FDO (e.g, instruction workset size).

Bootstrap and tested on x86_64/linux. Ok for trunk?

thanks,

David


2013-08-02  Xinliang David Li  

* params.def: New parameters.
* config/i386/i386.c (ix86_option_override_internal):
Override default libcall size limit with parameters.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 201458)
+++ config/i386/i386.c  (working copy)
@@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
 };
 
 /* Processor costs (relative to an add) */
-static const
+static
 struct processor_costs i386_cost = {   /* 386 specific costs */
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1),   /* cost of a lea instruction */
@@ -226,7 +226,7 @@ struct processor_costs i386_cost = {/*
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs i486_cost = {   /* 486 specific costs */
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1),   /* cost of a lea instruction */
@@ -298,7 +298,7 @@ struct processor_costs i486_cost = {/*
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentium_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1),   /* cost of a lea instruction */
@@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentiumpro_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1),   /* cost of a lea instruction */
@@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs geode_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1),   /* cost of a lea instruction */
@@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs k6_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (2),   /* cost of a lea instruction */
@@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs athlon_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (2),   /* cost of a lea instruction */
@@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs k8_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (2),   /* cost of a lea instruction */
@@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentium4_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (3),   /* cost of a lea instruction */
@@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs nocona_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1),   /* cost of a lea instruction */
@@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
   1,   /* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs atom_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,   /* cost of a lea instruction */
@@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
 };
 
 /* Generic64 should produce code tuned for Nocona and K8.  */
-static const
+static
 struct processor_costs generic64_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   /* On all chips taken into consideration lea is 2 cy

Re: C++ coding conventions: namespaces, references and getters (was Re: [PATCH 2/2] Introduce beginnings of a pipeline class.)

2013-08-02 Thread Gabriel Dos Reis
[ Adding Benjamin, Diego, Lawrence ]

General remarks first:
When we designed the coding standards for GCC, an overriding
philosophy was that we did not want to be prescriptive.  Rather, we
explicitly wanted to encourage common sense and trust the judgment
of maintainers to make sound and appropriate judgments.

More than a year later, I still believe that was the right thing to do
and I would not want us to start being prescriptive.

On Mon, Jul 29, 2013 at 1:20 PM, David Malcolm  wrote:
> On Thu, 2013-07-25 at 15:08 +0200, Martin Jambor wrote:
>> Hi,
>>
>> I don't know why it's me again but again I do have a few comments.
>
> Thanks for looking over the patch.
>
>> One global remark first: If we are going to start using the gcc
>> namespace (I understand it you need for isolation of symbols once you
>> use gcc as library, right?), I'm wondering whether mixing "using
>> namespace gcc" and explicit identifier qualifications is a good idea.
>> We have never had a discussion about namespace conventions but I'd
>> suggest that you add the using directive at the top of all gcc source
>> files that need it and never use explicit gcc:: qualification (unless
>> there is a reason for it, of course, like in symbol definitions).
>>
>> But perhaps someone else with more C++ experience disagrees?
>
> http://gcc.gnu.org/codingconventions.html#Namespace_Use says:
>
>> "Namespaces are encouraged. All separable libraries should have a
> unique global namespace."
> [...snip...]
>> "Header files should have neither using directives nor namespace-scope
> using declarations."
>
> and the rationale doc says:
>> "Using them within an implementation file can help conciseness."
>
> However, there doesn't seem to be a discussion on the merits of the
> various forms of "using" directives.

It is a "common wisdom" among C++ programmers that using directives
in header files generally negate the benefits of namespaces, and can
lead to surprises.  This is because most of the time, people don't go
scrutinize header files; they include them (based on documentation
of declared signatures) in many translation units.

There is one notable exception to this, which is known as
"namespace composition":

  
http://stackoverflow.com/questions/16871390/why-is-namespace-composition-so-rarely-used

We should probably amend the coding standards and include that hint,
but I don't feel strongly about it.

>
> These aren't the GCC coding conventions, but the Google conventions:
> http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Namespaces
> forbid using directives of the form:
>   using NAMESPACE;
> but suggest using directives of the form:
>   using NAMESPACE::NAME;
> to pick out individual names from a namespace, though *not* in global
> scope in a header (to avoid polluting the global namespace).

The using declarations are fine, as indicated.

>
> I like this approach - how about using it for frequently used names in
> a .c/.cc file, keeping the names alphabetizing by "fully qualified
> path", so e.g.:
>
>   #include "foo.h"
>   ...
>   #include "bar.h"
>
>   using gcc::context;
>   using gcc::pass_manager;
>   ...etc, individually grabbing the names we'll be needing
>
>   // code follows
>
> and thus avoiding grabbing whole namespaces, whilst keeping the code
> concise.

Agreed.

[…]

> Note that as per
>   http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01365.html
> we'll use "pass_manager" rather than "pipeline", so this would look
> like:
>   pass_manager &get_passes () { gcc_assert (passes_); return *passes_; }
>
> We were chatting about C++ references on IRC on Friday, and IIRC there
> was a strong objection to passing *arguments* that are non-const
> references,

I hope nobody is objecting to std::swap for example.

> rather than return values - mutation of *arguments* being
> surprising and a place for bugs to arise (though that may have been
> Diego who was arguing that, citing
> http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Reference_Arguments
> ).
>
> I prefer to have get_passes return a reference rather a pointer since
> which thing is being referred to isn't going to change, and is non-NULL.
> One could write it as a "gcc::pipeline const * passes", but that doesn't
> capture the non-NULL-ness of it.
> [within the class data, "passes_" needs to be a *pointer* so that the
> PCH deserialization can work, in case the object has been relocated].
> Having the get_passes method do the assertion of non-NULLness and
> dereference means that there's a single place where the non-NULLness is
> asserted.
>
> I guess this is a bigger point though: how do GCC maintainers feel about
> C++ references in general?

I think we should use them where appropriate, and trust maintainers
of specific areas of the compiler to exercise sound judgments instead
of us trying to cast ideology into stones.

>
> Looking at the GCC Coding Conventions:
>   http://gcc.gnu.org/codingconventions.html
> and
>   http://gcc.gnu.org/codingrationale.h

Re: C++ coding conventions: namespaces, references and getters (was Re: [PATCH 2/2] Introduce beginnings of a pipeline class.)

2013-08-02 Thread Gabriel Dos Reis
On Tue, Jul 30, 2013 at 4:11 AM, Martin Jambor  wrote:

> Moreover, I think we should be extra careful here because GCC is
> transitioning from C and many developers are used to expect C
> semantics, we will be mixing new C++ code with C code a lot in the
> future and even though we hope to reduce that, people will be cutting
> and pasting all this stuff around.

Yes, we can't reject writing idiomatic C++ just because there would be
a period of transition where old codes and new codes co-exist.
It wouldn't make much sense.

-- Gaby


Re: C++ coding conventions: namespaces, references and getters (was Re: [PATCH 2/2] Introduce beginnings of a pipeline class.)

2013-08-02 Thread Gabriel Dos Reis
On Tue, Jul 30, 2013 at 4:30 AM, Martin Jambor  wrote:

>> I'm voting for references.  References can be seen as yet another
>> software structuring tool that instantly communicate some properties
>> such as you mentioned above.  In addition to that it's also a hint of
>> ownership, i.e. if I get an object& from somewhere I know that I better
>> not even think about whether to delete it or not.
>>
>
> well, let me stress again that we should think about this in the
> context of GCC.  In GCC, we are used to C semantics of the dot
> operator and have a lot of existing code that we will continue to use
> and mix with new code with the same assumption.  Putting a reference
> where none has been before might result in silent and hard to spot
> erroneous modifications.

I cannot make sense of this.

> At the same time, we are used to lookup whether the pointer we got can
> be NULL or not if necessary.  Moreover, in GCC, NULL-dereferences are
> not particularly dangerous and are easy to debug.

That should not be reason to avoid idiomatic constructions.

> In the context of GCC, there will be no ownership hints of this kind
> simply because new code will co-exist with tons of old code which uses
> pointers and will not give any such hints and so you can't assume you
> could free its pointers.  Let me not even mention GC.

References are not just about ownership (though they might imply it).


-- Gaby


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-02 Thread Teresa Johnson
On Fri, Aug 2, 2013 at 8:05 AM, Jan Hubicka  wrote:
>>
>> 2013-08-01  Teresa Johnson  
>> Steven Bosscher  
>>
>> * cfgrtl.c (fixup_bb_partition): New routine.
>> (commit_edge_insertions): Invoke fixup_partitions.
>> (find_partition_fixes): New routine.
>> (fixup_partitions): Ditto.
>> (verify_hot_cold_block_grouping): Update comments.
>> (rtl_verify_edges): Invoke find_partition_fixes.
>> (rtl_verify_bb_pointers): Update comments.
>> (rtl_verify_bb_layout): Ditto.
>> * basic-block.h (fixup_partitions): Declare.
>> * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
>> * bb-reorder.c (sanitize_dominator_hotness): New function.
>> (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
>> sanitize_dominator_hotness.
>>
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 201281)
>> +++ cfgrtl.c(working copy)
>> @@ -1341,6 +1341,34 @@ fixup_partition_crossing (edge e)
>>  }
>>  }
>>
>> +/* Called when block BB has been reassigned to a different partition,
>> +   to ensure that the region crossing attributes are updated.  */
>> +
>> +static void
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> +{
>> +  fixup_partition_crossing (e);
>> +}
>> +
>> +  /* Possibly need to make bb's successor edges region crossing,
>> + or remove stale region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> +{
>> +  if ((e->flags & EDGE_FALLTHRU)
>> +  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> +  && e->dest != EXIT_BLOCK_PTR)
>> +force_nonfallthru (e);
>> +  else
>> +fixup_partition_crossing (e);
>> +}
>> +}
>
> Is there particular reason why preds can not be fallhtrus and why
> force_nonfallthru edge does not need partition crossing fixup?
> (if so, perhpas it could be mentioned in the description, if not,
> I think force_nonfallthru path has to check if new BB was introduced
> and do the right thing on the edge.

I need to clarify the comments in this routine, because without the
context of how this is called it isn't clear. This routine is only
called when we detect a hot bb that is now dominated by a cold bb and
needs to become cold. Therefore, its preds will no longer be region
crossing (any non-dominating blocks that were previously hot would
have been marked cold in the caller for the same reason, so we will
not end up adjusting the region crossing-ness or fallthrough-ness of
those pred edges). Any that were region crossing before but aren't any
longer could not have been fall through (as Steven noted, you can't
have a fall through across a partition boundary). I will add some
better comments here.

Regarding the call to force_nonfallthru, that routine calls
fixup_partition_crossing as needed, and I will update the comment to
reflect that too.

>
>> +/* Sanity check partition hotness to ensure that basic blocks in
>> +   the cold partition don't dominate basic blocks in the hot partition.
>> +   If FLAG_ONLY is true, report violations as errors. Otherwise
>> +   re-mark the dominated blocks as cold, since this is run after
>> +   cfg optimizations that may make hot blocks previously reached
>> +   by both hot and cold blocks now only reachable along cold paths.  */
>
> With profile, I suppose we can have cold blocks dominating hot blocks when the
> hot blocks is in loop whose trip count is high enough.  Indeed for 
> partitioning
> reasons it does not make sense to push those into different section.
>
> I also wonder, if we finally get the pass stable, can we enable it by default
> and offline probably cold blocks w/o profile? Primarily blocks reachable only
> by EH + blocks leading to a crash or throw().  For C++ those should be common
> enough to make a difference...

Yep, as soon as PR57451 is fixed, which I hope to get to next week,
then I am going to send a patch to turn this on by default, at least
with profile feedback, which is where I've been doing performance
tuning. But you are right that there are some cases where it should be
beneficial without profile data as well.

Thanks,
Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-02 Thread Teresa Johnson
On Fri, Aug 2, 2013 at 4:04 PM, Steven Bosscher  wrote:
> On Fri, Aug 2, 2013 at 5:05 PM, Jan Hubicka  wrote:
>>> +/* Called when block BB has been reassigned to a different partition,
>>> +   to ensure that the region crossing attributes are updated.  */
>>> +
>>> +static void
>>> +fixup_bb_partition (basic_block bb)
>>> +{
>>> +  edge e;
>>> +  edge_iterator ei;
>>> +
>>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>> +{
>>> +  fixup_partition_crossing (e);
>>> +}
>>> +
>>> +  /* Possibly need to make bb's successor edges region crossing,
>>> + or remove stale region crossing.  */
>>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>> +{
>>> +  if ((e->flags & EDGE_FALLTHRU)
>>> +  && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>> +  && e->dest != EXIT_BLOCK_PTR)
>>> +force_nonfallthru (e);
>>> +  else
>>> +fixup_partition_crossing (e);
>>> +}
>>> +}
>>
>> Is there particular reason why preds can not be fallhtrus
>
> Yes, by definition a crossing edge cannot fall through. There is
> always a control transfer from one section to another.
>
>
>>> +/* Sanity check partition hotness to ensure that basic blocks in
>>> +   the cold partition don't dominate basic blocks in the hot partition.
>>> +   If FLAG_ONLY is true, report violations as errors. Otherwise
>>> +   re-mark the dominated blocks as cold, since this is run after
>>> +   cfg optimizations that may make hot blocks previously reached
>>> +   by both hot and cold blocks now only reachable along cold paths.  */
>>
>> With profile, I suppose we can have cold blocks dominating hot blocks when 
>> the
>> hot blocks is in loop whose trip count is high enough.
>
> That is the common case, actually.
>
>>  Indeed for partitioning
>> reasons it does not make sense to push those into different section.
>
> The partitioning algrorithm makes sure this doesn't happen. The
> hottest path from the entry block to a hot basic block is always part
> of the hot partition.

Well, at least with this patch that will be true. The trunk version
just partitions based on the bb's count without regard to paths.

Thanks,
Teresa

>
>
>> I also wonder, if we finally get the pass stable, can we enable it by default
>> and offline probably cold blocks w/o profile?
>
> That is the general idea behind all this work, obviously ;-)
>
>> Primarily blocks reachable only
>> by EH + blocks leading to a crash or throw().  For C++ those should be common
>> enough to make a difference...
>
> Yup, and IIRC Theresa posted some numbers that showed this.
>
> Ciao!
> Steven



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413