Re: [PATCH i386 3/8] [AVX512] Add AVX-512 patterns.

2013-08-21 Thread Kirill Yukhin
On 20 Aug 08:30, Richard Henderson wrote:
Hello,
> This is ok.

Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-08/msg00504.html

--
Thanks, K


[PING] 3 patches waiting for approval/review

2013-08-21 Thread Andreas Krebbel
[RFC] Allow functions calling mcount before prologue to be leaf functions
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00993.html

[PATCH] PR57377: Fix mnemonic attribute
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01364.html

[PATCH] Doc: Add documentation for the mnemonic attribute
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01436.html

Bye,

-Andreas-



Re: [PATCH 0/2] Port symtab/cgraph/varpool nodes to use C++ inheritance

2013-08-21 Thread Martin Jambor
Hi,

On Tue, Aug 20, 2013 at 11:01:04PM +0200, Jan Hubicka wrote:

[...]

> > Currently to access the base symtab fields of a cgraph or varpool
> > node, the code has e.g.
> > 
> >node->symbol.decl
> > 
> > whereas with C++ inheritance, the "symbol" field is no more, and we
> > directly use the base-class field:
> > 
> >node->decl
> 
> Indeed, this is very nice.  We also use
> (symtab_node)node whenver we need to go from cgraph/varpool node back
> to basetype.  These should go, too.
> Finally I introduced cgraph(node)/varpool(node) functions that converts
> symtab node to cgraph/varpool node and ICEs when failed.
> 
> We probably should use our new template conversions.  We have is_a
> predicate and dyn_cast convertor that returns NULL on failure.  Do
> we have variant that ICEs when conversion is not possible?
> > 

as_a ...it ICEs when gcc is configured with checking (and happily does
invalid conversion otherwise).

Martin


[RFC] Add conditional compare support

2013-08-21 Thread Zhenqiang Chen
Hi,

Several ports in gcc support conditional compare instructions which is
an efficient way to handle SHORT_CIRCUIT. But the information is not
represented in TREE or GIMPLE. And it depends on BRANCH_COST and
combine pass to generate the instructions.

To explicitly represent the semantics of conditional compare (CCMP)
and remove the dependence on BRANCH_COST and combine, we propose to
add a set of new keywords/operators on TREE.

A CCMP operator has three operands: two are from the compare and
the other is from the result of previous compare.
e.g.

  r = CMP (a, b) && CMP (c, d)  /*CMP can be >, >=, etc.*/

in current gcc, if LOGICAL_OP_NON_SHORT_CIRCUIT, it is like

  t1 = CMP (a, b)
  t2 = CMP (c, d)
  r = t1 & t2

with CCMP, it will be

  t1 = CMP (a, b)
  r  = CCMP (t1, c, d)

SHORT_CIRCUIT expression will be converted to CCMP expressions
at the end of front-end.

The CCMP will keep until expand pass. In expand, we will define a new
operator for ports to generate optimized RTL directly.

Current status:
* The basic support can bootstrap on ARM Chomebook and x86-64,
  no ICE and runtime fail in regression tests.
* uninit, vrp and reassoc passes can handle CCMP.

I will post a series of patches for this feature, which includes
* Add new keywords support.
* Handle the new keywords in middle-end passes: uninit, vrp, reassoc, ...
* Add new op to expand CCMP.

Thanks!
-Zhenqiang


[PATCH 1/n] Add conditional compare support

2013-08-21 Thread Zhenqiang Chen
Hi,

The attached patch is the basic support for conditional compare (CCMP). It
adds
a set of keywords on TREE to represent CCMP:

DEFTREECODE (TRUTH_ANDIF_LT_EXPR, "truth_andif_lt_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ANDIF_LE_EXPR, "truth_andif_le_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ANDIF_GT_EXPR, "truth_andif_gt_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ANDIF_GE_EXPR, "truth_andif_ge_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ANDIF_EQ_EXPR, "truth_andif_eq_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ANDIF_NE_EXPR, "truth_andif_ne_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ORIF_LT_EXPR, "truth_orif_lt_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ORIF_LE_EXPR, "truth_orif_le_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ORIF_GT_EXPR, "truth_orif_gt_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ORIF_GE_EXPR, "truth_orif_ge_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ORIF_EQ_EXPR, "truth_orif_eq_expr", tcc_ccomparison, 3)
DEFTREECODE (TRUTH_ORIF_NE_EXPR, "truth_orif_ne_expr", tcc_ccomparison, 3)

To distinguish others, the patch dumps CCMP as

  && to "?&&"
  || to "?||"

A CCMP operator has three operands: two are from the compare and the other
is from the result of previous compare. To reuse current codes, the result
of
previous compare is in TREE_OPERAND (ccmp, 2)/gimple_assign_rhs3. e.g.

  r = (a > b) && (c > d)

with CCMP, it will be

  t1 = GT_EXPR (a, b)
  r  = TRUTH_ANDIF_GT_EXPR (c, d, t1)

The patch does not include ops to expand CCMP to RTL. It just roll CCMP back
to BIT operators. So with a dummy "conditional_compare", we can test
the patch in any port.
In the patch, dummy conditional_compare for i386 and arm are added for test.

Bootstrap on x86-64 and ARM Chromebook.
No ICE and runtime errors in regression tests.

Thanks!
-Zhenqiang

ChangeLog:
2013-08-21  Zhenqiang Chen  

* tree.def: Add conditional compare ops: TRUTH_ANDIF_LT_EXPR,
TRUTH_ANDIF_LE_EXPR, TRUTH_ANDIF_GT_EXPR, TRUTH_ANDIF_GE_EXPR,
TRUTH_ANDIF_EQ_EXPR, TRUTH_ANDIF_NE_EXPR, TRUTH_ORIF_LT_EXPR,
TRUTH_ORIF_LE_EXPR, TRUTH_ORIF_GT_EXPR, TRUTH_ORIF_GE_EXPR,
TRUTH_ORIF_EQ_EXPR, TRUTH_ORIF_NE_EXPR.
* tree.h: Add misc utils for conditional compare.
* tree.c (tree_node_structure_for_code, tree_code_size,
record_node_allocation_statistics, contains_placeholder_p,
find_placeholder_in_expr, substitute_in_expr,
substitute_placeholder_in_expr, stabilize_reference_1, build2_stat,
simple_cst_equal): Handle conditional compare.
(get_code_from_ccompare_expr): New added.
(generate_ccompare_code ): New added.
* c-family/c-pretty-print.c (pp_c_expression): Handle conditional
compare.
* cp/error.c (dump_expr): Likewise.
* cp/semantics.c (cxx_eval_constant_expression): Likewise.
(potential_constant_expression_1): Likewise.
(cxx_eval_ccmp_expression): New added.
* cfgexpand.c (expand_debug_expr): Handle conditional compare.
* expr.c (safe_from_p, expand_expr_real_2): Likewise.
(expand_ccmp_to_bitop): New added.
* gimple-pretty-print.c (dump_ccomparison_rhs): New added.
(dump_gimple_assign): Handle conditional compare.
* print-tree.c (print_node): Likewise
* tree-dump.c (dequeue_and_dump): Likewise.
* tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
* gimple.c (recalculate_side_effects): Likewise.
(get_gimple_rhs_num_ops): Likewise
* gimplify.c (goa_stabilize_expr, gimplify_expr, gimple_boolify):
Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-ssa-operands.c (get_expr_operands): Likewise.
* tree-ssa-loop-niter.c (get_val_for): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
(verify_gimple_comparison_operands): New added.
(verify_gimple_comparison): Call verify_gimple_comparison_operands.
* fold-const.c (fold_truth_andor): Generate conditinal compare.
* lra-constraints.c (remove_inheritance_pseudos):
Initialize variable "set" to NULL_RTX.

basic-conditional-compare-support.patch
Description: Binary data


[PATCH 2/n] Handle conditional compare in uninit pass

2013-08-21 Thread Zhenqiang Chen
Hi,

The attached patch enables uninit pass to handle conditional compare (CCMP).
CCMP is a combine of BIT_AND_EXPR/BIT_IOR_EXPR and CMP expression. The codes
are similar with those to handle BIT_AND_EXPR/BIT_IOR_EXPR and CMP
expression.

Bootstrap on x86-64 and ARM Chromebook.
No ICE and runtime errors in regression tests.
Not test cases prefixed with uninit-pred- fail.

Thanks!
-Zhenqiang

ChangeLog:
2013-08-21  Zhenqiang Chen  

* tree-ssa-uninit.c (normalize_cond_1, normalize_cond,
is_gcond_subset_of, is_norm_cond_subset_of): Handle conditional
compare.

uninit-CCMP.patch
Description: Binary data


Re: [PATCH] Rerun df_analyze after delete_unmarked_insns during DCE

2013-08-21 Thread Bernd Schmidt
On 08/19/2013 11:05 PM, Jeff Law wrote:
> On 07/20/2013 03:02 AM, Alexey Makhalov wrote:
>> Hello!
>>
>> If delete_unmarked_insns deletes some insn, DF state might be
>> out of date, and, regs_ever_live might contain unused registers till
>> the end.

(I can't find the original mail either in my mailbox or in the archives).

If this happens after reload, then regs_ever_live really should stay the
same, since we used it to compute frame offsets and such.

Also, df_analyze calls df_compute_regs_ever_live with reset=false, so it
shouldn't delete anything from the set?


Bernd



[C++ testcase, committed] PR 56134

2013-08-21 Thread Paolo Carlini

Hi,

in mainline we don't ICE anymore. Tested x86_64-linux multilib.

Thanks,
Paolo.


2013-08-21  Paolo Carlini  

PR c++/56134
* g++.dg/ext/attr-alias-3.C: New.
Index: g++.dg/ext/attr-alias-3.C
===
--- g++.dg/ext/attr-alias-3.C   (revision 0)
+++ g++.dg/ext/attr-alias-3.C   (working copy)
@@ -0,0 +1,8 @@
+// PR c++/56134
+// { dg-require-alias "" }
+
+char a;
+class Q
+{
+  static char q __attribute__ ((alias ("a")));
+};


Improve jump threading using VRP information

2013-08-21 Thread Jeff Law


Just something else I saw while analyzing dumps from an unrelated set of 
changes.


It's relatively common to see sequences like this:

  # parent_1 = PHI 
  _11 = single_tree_10(D) != 0;
  _12 = parent_1 == 0B;
  _13 = _11 & _12;
  if (_13 != 0)
goto ;
  else
goto ;

Where VRP can deduce that the value of parent_6 has a nonzero value on 
one (or more) of the paths reaching this block (because those paths 
dereference parent_6).


Obviously when VRP knows parent_6 is nonzero, then we know the current 
block will always transfer control to BB 7.


Or something like this:

 :
  # prephitmp_49 = PHI <1(4), pretmp_48(5)>
  prev_line.31_6 = prev_line;
  _8 = prev_line.31_6 != line_7(D);
  line_differs_9 = (unsigned char) _8;
  _10 = prephitmp_49 | line_differs_9;
  if (_10 != 0)
goto ;
  else
goto ;


We may not know the exact value of _10, but VRP can determine that when 
BB6 is reached from BB4 that BB6 will always transfer control to BB8.



I never coded up the bits to utilize VRP information to simplify 
statements for threading except for COND_EXPRs.  It wasn't clear how 
often it would be useful, thus I took the easy way out.  This picks up a 
hundred or so additional threading opportunities in my testfiles.  Not 
huge, but worth it IMHO.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed onto the trunk.


* tree-vrp.c (simplify_stmt_for_jump_threading): Try to
simplify assignments too.  If the RHS collapses to a singleton
range, then return the value for the range.

* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
new file mode 100644
index 000..9d9473e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
+
+
+struct basic_block_def;
+typedef struct basic_block_def *basic_block;
+enum gimple_code
+{
+  LAST_AND_UNUSED_GIMPLE_CODE
+};
+struct omp_region
+{
+  struct omp_region *outer;
+  basic_block cont;
+};
+void
+build_omp_regions_1 (basic_block bb, struct omp_region *parent,
+unsigned char single_tree, enum gimple_code code)
+{
+  if (code == 25)
+parent = parent->outer;
+  else if (code == 42)
+parent->cont = bb;
+  if (single_tree && !parent)
+return;
+  oof ();
+}
+
+/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1" }  } */
+/* { dg-final { cleanup-tree-dump "vrp1" } } */
+
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index ec7ef8f..683a3db 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9109,15 +9109,27 @@ static vec equiv_stack;
 static tree
 simplify_stmt_for_jump_threading (gimple stmt, gimple within_stmt)
 {
-  /* We only use VRP information to simplify conditionals.  This is
- overly conservative, but it's unclear if doing more would be
- worth the compile time cost.  */
-  if (gimple_code (stmt) != GIMPLE_COND)
-return NULL;
+  if (gimple_code (stmt) == GIMPLE_COND)
+return vrp_evaluate_conditional (gimple_cond_code (stmt),
+gimple_cond_lhs (stmt),
+gimple_cond_rhs (stmt), within_stmt);
+
+  if (gimple_code (stmt) == GIMPLE_ASSIGN)
+{
+  value_range_t new_vr = VR_INITIALIZER;
+  tree lhs = gimple_assign_lhs (stmt);
+
+  if (TREE_CODE (lhs) == SSA_NAME
+ && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ || POINTER_TYPE_P (TREE_TYPE (lhs
+   {
+ extract_range_from_assignment (&new_vr, stmt);
+ if (range_int_cst_singleton_p (&new_vr))
+   return new_vr.min;
+   }
+}
 
-  return vrp_evaluate_conditional (gimple_cond_code (stmt),
-  gimple_cond_lhs (stmt),
-  gimple_cond_rhs (stmt), within_stmt);
+  return NULL_TREE;
 }
 
 /* Blocks which have more than one predecessor and more than


[Ping] VAX: Fix ICE during operand output

2013-08-21 Thread Jan-Benedict Glaw
On Wed, 2013-07-31 18:34:26 +0200, Jan-Benedict Glaw  wrote:
> Hi!
> 
> We've seen ICEs while outputting an operand (not even the excessive
> CISC of a VAX could do that), which should be fixed by this patch:
> 
> 2013-07-31  Jan-Benedict Glaw  
> 
>   * config/vax/constraints.md (T): Add missing CONSTANT_P check.

Ping?

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: Friends are relatives you make for yourself.
the second  :


signature.asc
Description: Digital signature


[C++ Patch] PR 56130

2013-08-21 Thread Paolo Carlini

Hi,

this bug points out that we fail to emit deprecated warnings when 
references are involved. Turns out that at the end of 
finish_id_expression the VAR_DECL is wrapped in INDIRECT_REF. The 
trivial patch below appears to work fine and should be pretty safe in 
terms of false positives, because the warning is enabled by default.


Booted and tested x86_64-linux.

Thanks,
Paolo.

//
/cp
2013-08-21  Paolo Carlini  

PR c++/56130
* semantics.c (finish_id_expression): Handle deprecated references.

/testsuite
2013-08-21  Paolo Carlini  

PR c++/56130
* g++.dg/warn/deprecated-7.C: New.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 201897)
+++ cp/semantics.c  (working copy)
@@ -3457,8 +3457,10 @@ finish_id_expression (tree id_expression,
}
 }
 
-  if (TREE_DEPRECATED (decl))
-warn_deprecated_use (decl, NULL_TREE);
+  /* Handle references (c++/56130).  */
+  tree t = INDIRECT_REF_P (decl) ? TREE_OPERAND (decl, 0) : decl;
+  if (TREE_DEPRECATED (t))
+warn_deprecated_use (t, NULL_TREE);
 
   return decl;
 }
Index: testsuite/g++.dg/warn/deprecated-7.C
===
--- testsuite/g++.dg/warn/deprecated-7.C(revision 0)
+++ testsuite/g++.dg/warn/deprecated-7.C(working copy)
@@ -0,0 +1,17 @@
+// PR c++/56130
+
+int g_nn;
+int& g_n __attribute__((deprecated)) = g_nn;
+
+void f()
+{
+  int f_nn;
+  int& f_n __attribute__((deprecated)) = f_nn;
+  f_n = 1;// { dg-warning "'f_n' is deprecated" }
+}
+
+int main()
+{
+  g_n = 1;// { dg-warning "'g_n' is deprecated" }
+  f();
+}


Re: [Google] Refine hot caller heuristic

2013-08-21 Thread Teresa Johnson
> +/* Knob to control hot-caller heuristic. 0 means it is turned off, 1 means
> +   it is always applied, and 2 means it is applied only if the footprint is
> +   smaller than PARAM_HOT_CALLER_CODESIZE_THRESHOLD.  */
>  DEFPARAM (PARAM_INLINE_HOT_CALLER,
> "inline-hot-caller",
> "Consider cold callsites for inlining if caller contains hot code",
> +   2, 0, 2)
> +
> +/* The maximum code size estimate under which hot caller heuristic is
> +   applied.  */
> +DEFPARAM(PARAM_HOT_CALLER_CODESIZE_THRESHOLD,
> + "hot-caller-codesize-threshold",
> + "Maximum profile-based code size footprint estimate for "
> + "hot caller heuristic  ",
> + 1, 0, 0)

Out of curiousity, how sensitive is performance to the value of this
parameter? I.e. is there a clear cutoff for the codes that benefit
from disabling this inlining vs those that benefit from enabling it?

Also, have you tried spec2006? I remember that the codesize of the gcc
benchmark was above the larger 15000 threshold I use for tuning down
unrolling/peeling, and I needed to refine my heuristics to identify
profitable loops to unroll/peel even in the case of large codesize.
I'm not sure if there are more benchmarks that will be above the
smaller 10K threshold.

> +
> +DEFPARAM (PARAM_INLINE_USEFUL_COLD_CALLEE,
> +   "inline-useful-cold-callee",
> +   "Consider cold callsites for inlining if caller contains hot code",
> 1, 0, 1)

The description of this param is wrong (it is the same as the
description of PARAM_INLINE_HOT_CALLER). It should probably be
something like
"Only consider cold callsites for inlining if analysis finds
optimization opportunities"

>
>  /* Limit of iterations of early inliner.  This basically bounds number of
> Index: gcc/ipa-inline.c
> ===
> --- gcc/ipa-inline.c  (revision 201768)
> +++ gcc/ipa-inline.c  (working copy)
> @@ -528,12 +528,60 @@ big_speedup_p (struct cgraph_edge *e)
>return false;
>  }
>
> +/* Returns true if callee of edge E is considered useful to inline
> +   even if it is cold. A callee is considered useful if there is at
> +   least one argument of pointer type that is not a pass-through.  */

Can you expand this comment a bit to add why such arguments indicate
useful inlining?

Thanks,
Teresa

> +
> +static inline bool
> +useful_cold_callee (struct cgraph_edge *e)
> +{
> +  gimple call = e->call_stmt;
> +  int n, arg_num = gimple_call_num_args (call);
> +  struct ipa_edge_args *args = IPA_EDGE_REF (e);
> +
> +  for (n = 0; n < arg_num; n++)
> +{
> +  tree arg = gimple_call_arg (call, n);
> +  if (POINTER_TYPE_P (TREE_TYPE (arg)))
> +{
> +   struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, n);
> +   if (jfunc->type != IPA_JF_PASS_THROUGH)
> +return true;
> +}
> +}
> +  return false;
> +}
> +
> +/* Returns true if hot caller heuristic should be used.  */
> +
> +static inline bool
> +enable_hot_caller_heuristic (void)
> +{
> +
> +  gcov_working_set_t *ws = NULL;
> +  int size_threshold = PARAM_VALUE (PARAM_HOT_CALLER_CODESIZE_THRESHOLD);
> +  int num_counters = 0;
> +  int param_inline_hot_caller = PARAM_VALUE (PARAM_INLINE_HOT_CALLER);
> +
> +  if (param_inline_hot_caller == 0)
> +return false;
> +  else if (param_inline_hot_caller == 1)
> +return true;
> +
> +  ws = find_working_set(PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE));
> +  if (!ws)
> +return false;
> +  num_counters = ws->num_counters;
> +  return num_counters <= size_threshold;
> +
> +}
>  /* Returns true if an edge or its caller are hot enough to
> be considered for inlining.  */
>
>  static bool
>  edge_hot_enough_p (struct cgraph_edge *edge)
>  {
> +  static bool use_hot_caller_heuristic = enable_hot_caller_heuristic ();
>if (cgraph_maybe_hot_edge_p (edge))
>  return true;
>
> @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge)
>if (flag_auto_profile && edge->callee->count == 0
>&& edge->callee->max_bb_count > 0)
>  return false;
> -  if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER)
> -  && maybe_hot_count_p (NULL, edge->caller->max_bb_count))
> -return true;
> +  if (use_hot_caller_heuristic)
> +{
> +  struct cgraph_node *where = edge->caller;
> +  if (maybe_hot_count_p (NULL, where->max_bb_count))
> +{
> +  if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE))
> +return useful_cold_callee (edge);
> +  else
> +return true;
> +}
> +}
>return false;
>  }

On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman  wrote:
> The current hot caller heuristic simply promotes edges whose caller is
> hot. This patch does the following:
> * Turn it off for applications with large footprint since the size
> increase hurts them
> * Be more selective by considering arguments to callee when the
> heuristic is enabled.
>
> This performs well on internal benchmarks. Ok for google

[RFC] Old school parallelization of WPA streaming

2013-08-21 Thread Jan Hubicka
Hi,
this is my attempt to bring GCC into wonderful era of multicore CPUs :)
It is a hack, but it seems to help quite a lot.  About 50% of WPA time is spent
by streaming the individual ltrans .o files.  This can be easily parallelized
by fork - we do nothing afterwards, just exit and pass the list to the linker.

So until we are thread safe, perhaps this may be a solution? (or on unixish
systems probably it can be solution forever)  I added a logic parsing -flto=24
and do number of streaming processes user asked for.

For -flto=jobserver I simply fork all 32 processes.  It may not be a disaster,
but perhaps we should figure out how to communicate with jobserver.  At first
glance on document on how it works, it seems easy to add. Perhaps we can even
convicne GNU Make folks to put simple helpers to libiberty?

We also may figure out number of CPUs (is it available i.e. from libgomp)
and use it by default even if user do not care to pass number of processes.
Naturally these streaming forks should be cheap memory wise. I hope Martin
will get me some actual numbers.

With the patch the WPA time of firefox goes down to 2 minutes (4.8 needs about
30 minutes and without the hack one needs about 5 minutes)

Before:
Execution times (seconds)
 phase setup :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1398 kB ( 0%) ggc
 phase opt and generate  :  39.73 (17%) usr   0.49 ( 3%) sys  40.26 (16%) wall  
347726 kB ( 5%) ggc
 phase stream in :  82.43 (35%) usr   2.15 (14%) sys  84.62 (34%) wall 
5970152 kB (94%) ggc
 phase stream out: 114.05 (48%) usr  12.86 (83%) sys 127.26 (50%) wall  
  6868 kB ( 0%) ggc
 garbage collection  :   3.07 ( 1%) usr   0.00 ( 0%) sys   3.08 ( 1%) wall  
 0 kB ( 0%) ggc
 callgraph optimization  :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall  
30 kB ( 0%) ggc
 ipa dead code removal   :   4.91 ( 2%) usr   0.11 ( 1%) sys   5.16 ( 2%) wall  
   113 kB ( 0%) ggc
 ipa inheritance graph   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall  
   927 kB ( 0%) ggc
 ipa virtual call target :   5.11 ( 2%) usr   0.05 ( 0%) sys   4.99 ( 2%) wall  
 55296 kB ( 1%) ggc
 ipa cp  :   2.65 ( 1%) usr   0.17 ( 1%) sys   2.80 ( 1%) wall  
188629 kB ( 3%) ggc
 ipa inlining heuristics :  18.49 ( 8%) usr   0.29 ( 2%) sys  18.79 ( 7%) wall  
439981 kB ( 7%) ggc
 ipa lto gimple in   :   0.12 ( 0%) usr   0.01 ( 0%) sys   0.15 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple out  :  16.66 ( 7%) usr   1.26 ( 8%) sys  17.97 ( 7%) wall  
 0 kB ( 0%) ggc
 ipa lto decl in :  68.70 (29%) usr   1.50 (10%) sys  70.23 (28%) wall 
5181795 kB (82%) ggc
 ipa lto decl out:  93.09 (39%) usr   4.93 (32%) sys  98.07 (39%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   1.65 ( 1%) usr   0.27 ( 2%) sys   1.92 ( 1%) wall  
428974 kB ( 7%) ggc
 ipa lto decl merge  :   3.66 ( 2%) usr   0.00 ( 0%) sys   3.65 ( 1%) wall  
  8288 kB ( 0%) ggc
 ipa lto cgraph merge:   3.42 ( 1%) usr   0.00 ( 0%) sys   3.42 ( 1%) wall  
 13725 kB ( 0%) ggc
 whopr wpa   :   3.58 ( 2%) usr   0.02 ( 0%) sys   3.59 ( 1%) wall  
  6871 kB ( 0%) ggc
 whopr wpa I/O   :   0.99 ( 0%) usr   6.65 (43%) sys   7.92 ( 3%) wall  
 0 kB ( 0%) ggc 
 whopr partitioning  :   2.63 ( 1%) usr   0.01 ( 0%) sys   2.66 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa reference   :   3.08 ( 1%) usr   0.08 ( 1%) sys   3.18 ( 1%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   2.63 ( 1%) usr   0.01 ( 0%) sys   2.66 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa reference   :   3.08 ( 1%) usr   0.08 ( 1%) sys   3.18 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.43 ( 0%) usr   0.05 ( 0%) sys   0.48 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   3.00 ( 1%) usr   0.06 ( 0%) sys   3.07 ( 1%) wall  
 0 kB ( 0%) ggc
 varconst:   0.03 ( 0%) usr   0.04 ( 0%) sys   0.06 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.48 ( 0%) usr   0.00 ( 0%) sys   0.50 ( 0%) wall  
 0 kB ( 0%) ggc
 TOTAL : 236.2215.50   252.15
6326146 kB

after:
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1399 kB ( 0%) ggc
 phase opt and generate  :  35.49 (28%) usr   0.44 ( 6%) sys  35.95 (26%) wall  
313971 kB ( 5%) ggc
 phase stream in :  82.98 (64%) usr   2.10 (30%) sys  85.13 (61%) wall 
5969191 kB (95%) ggc
 phase stream out:  10.37 ( 8%) usr   4.49 (64%) sys  17.33 (13%) wall  
  5813 kB ( 0%) ggc
 garbage collection  :   3.00 ( 2%) usr   0.00 ( 0%) sys   2.99 ( 2%) wall  
 0 kB ( 0%) ggc
 callgraph optimization  :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall  
30 kB ( 0%) ggc
 ipa dead code removal   :   4.91 ( 4%) usr   0.10 ( 1%) sys   5.04 ( 4%) wall  
   114 kB ( 0%) ggc
 ipa inheritance graph   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall  
   792 k

Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2013-08-21 Thread Rainer Orth
Alexander Ivchenko  writes:

> Hi Joseph, thanks for your comments.
>
> I updated the patch:
>
> 1) The function name as a second argument in libc_has_function target
> hook was removed - was not usefull so far.
> 2) By using contrib/config-list.mk (thanks for the hint - great tool!)
> and analysing tm.h files and what is included in them I have checked
> 197 targets. That analysis includes all issues that you raised in your
> comments - everything is fixed now. I don't like that sometimes we
> have to redefine the version of the hook back to the default one due
> to a poisoning of including elfos.h, but I couldn't find a better
> solution - I commented all those cases.
>
> Regtesting is in progress now. I have already tested the patch before,
> so I don't expect to see any new problems.
>
> If all the tests pass, is the patch OK for trunk?

Unfortunately, this patch broke Solaris 10+ bootstrap; it cannot have
been tested properly there:

In file included from ./tm.h:27:0,
 from /vol/gcc/src/hg/trunk/local/gcc/gencheck.c:23:
/vol/gcc/src/hg/trunk/local/gcc/config/sol2-10.h:21:4: error: "/*" within 
comment [-Werror=comment]
 /* /* Solaris 10 has the float and long double forms of math functions.
 ^
cc1plus: all warnings being treated as errors
make[3]: *** [build/gencheck.o] Error 1

Fixed as follows; bootstrapped without regressions on
i386-pc-solaris2.10, installed on mainline.

Rainer


2013-08-21  Rainer Orth  

* config/sol2-10.h (TARGET_LIBC_HAS_FUNCTION): Don't nest
comment.

diff --git a/gcc/config/sol2-10.h b/gcc/config/sol2-10.h
--- a/gcc/config/sol2-10.h
+++ b/gcc/config/sol2-10.h
@@ -18,7 +18,7 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-/* /* Solaris 10 has the float and long double forms of math functions.
+/* Solaris 10 has the float and long double forms of math functions.
We redefine this hook so the version from elfos.h header won't be used.  */
 #undef TARGET_LIBC_HAS_FUNCTION
 #define TARGET_LIBC_HAS_FUNCTION default_libc_has_function

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Fix buffer overflow in ipa_profile

2013-08-21 Thread Jan Hubicka
Hi,
as Martin noticed, there is bug in ipa_profile that first allocate order array
and then introduce new local aliases before calling ipa_reverse_postorder.
Fixed thus and committed as obvious.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 201891)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2013-08-20  Martin Liska  
+
+   * ipa.c (ipa_profile_read_summary): Fix buffer overflow.
+
 2013-08-20  Jan Hubicka  
 
PR bootstrap/58186
Index: ipa.c
===
--- ipa.c   (revision 201890)
+++ ipa.c   (working copy)
@@ -1397,7 +1397,7 @@ ipa_profile_read_summary (void)
 static unsigned int
 ipa_profile (void)
 {
-  struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
+  struct cgraph_node **order;
   struct cgraph_edge *e;
   int order_pos;
   bool something_changed = false;
@@ -1575,6 +1575,7 @@ ipa_profile (void)
 nuseless, nuseless * 100.0 / nindirect,
 nconverted, nconverted * 100.0 / nindirect);
 
+  order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
   order_pos = ipa_reverse_postorder (order);
   for (i = order_pos - 1; i >= 0; i--)
 {


Re: [PATCH] Rerun df_analyze after delete_unmarked_insns during DCE

2013-08-21 Thread David Edelsohn
This patch has caused a bootstrap failure for powerpc-aix and probably
powerpc64-linux.  GCC segfaults and core dumps during stage2
configure.

The motivation for this patch seems faulty and I strongly request that
it be reverted.

PR bootstrap/58206

Thanks, David


[PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Torvald Riegel
This patch adds a custom HTM fast path for RTM on x86_64, which moves
the core HTM fast path bits from gtm_thread::begin_transaction into the
x86-specific ITM_beginTransaction implementation.  It extends/changes
the previous patch by Andi:
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01228.html

The custom fast path decreases the overheads of using HW transactions.
gtm_thread::begin_transaction remains responsible for handling the retry
policy after aborts of HW transactions, including when to switch to the
fallback execution method.  Right now, the C++ retry code isn't aware of
the specific abort reason but just counts the number of retries for a
particular transaction; it might make sense to add this in the future.

Tested on Haswell with microbenchmarks and STAMP Vacation.  OK for
trunk?  (Please take a closer look at the asm pieces of this.)

(I've seen failures for STAMP Genome during my recent tests, but those
happen also with just ITM_DEFAULT_METHOD=serialirr and a single thread,
and AFAICT they don't seem to be related to the changes in
_ITM_beginTransaction.  I'll have a look...)

Andreas and Peter: Is this sufficient as a proof of concept for custom
fast paths on your architectures, or would you like to see any changes?

Torvald
commit 9329bd4504d13d415542d93418157d588b599b4e
Author: Torvald Riegel 
Date:   Wed Aug 21 11:40:54 2013 +0200

Add custom HTM fast path for RTM on x86_64.

* libitm_i.h (gtm_thread): Assign an asm name to serial_lock.
(htm_fastpath): Assign an asm name.
* libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom
HTM fast paths.
(_ITM_actions): Likewise.
* config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on
x86_64.
* config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path.
* config/posix/rwlock.h (gtm_rwlock): Update comments.  Move summary
field to the start of the structure.
* config/linux/rwlock.h (gtm_rwlock): Update comments.
* beginend.cc (gtm_thread::begin_transaction): Add retry policy
handling for custom HTM fast paths.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index a3bf549..bd7b19e 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   if (unlikely(prop & pr_undoLogCode))
 GTM_fatal("pr_undoLogCode not supported");
 
-#if defined(USE_HTM_FASTPATH) && !defined(HTM_CUSTOM_FASTPATH)
+#ifdef USE_HTM_FASTPATH
   // HTM fastpath.  Only chosen in the absence of transaction_cancel to allow
   // using an uninstrumented code path.
   // The fastpath is enabled only by dispatch_htm's method group, which uses
@@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   // indeed in serial mode, and HW transactions should never need serial mode
   // for any internal changes (e.g., they never abort visibly to the STM code
   // and thus do not trigger the standard retry handling).
+#ifndef HTM_CUSTOM_FASTPATH
   if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
 {
   for (uint32_t t = htm_fastpath; t; t--)
@@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
}
}
 }
+#else
+  // If we have a custom HTM fastpath in ITM_beginTransaction, we implement
+  // just the retry policy here.  We communicate with the custom fastpath
+  // through additional property bits and return codes, and either transfer
+  // control back to the custom fastpath or run the fallback mechanism.  The
+  // fastpath synchronization algorithm itself is the same.
+  // pr_HTMRetryableAbort states that a HW transaction started by the custom
+  // HTM fastpath aborted, and that we thus have to decide whether to retry
+  // the fastpath (returning a_tryHTMFastPath) or just proceed with the
+  // fallback method.
+  if (likely(htm_fastpath && (prop & pr_HTMRetryableAbort)))
+{
+  tx = gtm_thr();
+  if (unlikely(tx == NULL))
+{
+  // See below.
+  tx = new gtm_thread();
+  set_gtm_thr(tx);
+}
+  // If this is the first abort, reset the retry count.  We abuse
+  // restart_total for the retry count, which is fine because our only
+  // other fallback will use serial transactions, which don't use
+  // restart_total but will reset it when committing.
+  if (!(prop & pr_HTMRetriedAfterAbort))
+   tx->restart_total = htm_fastpath;
+
+  if (--tx->restart_total > 0)
+   {
+ // Wait until any concurrent serial-mode transactions have finished.
+ // Essentially the same code as above.
+ if (serial_lock.is_write_locked())
+   {
+ if (tx->nesting > 0)
+   goto stop_custom_htm_fastpath;
+ serial_lock.read_lock(tx);
+ serial_lock.read_unlock(tx);
+   }
+ // Let ITM_beginTransa

Committed: rename struct reg_equivs

2013-08-21 Thread Joern Rennecke

Having a C++ type with the same name as the variable reg_equivs causes
trouble with gdb, so I renamed the struct.

Bootstrapped on i686-pc-linux-gnu.

Committed as obvious.
2013-08-21  Joern Rennecke  

* reload.h (struct reg_equivs): Rename to ..
(struct reg_equivs_s): .. this.

Index: reload.h
===
--- reload.h(revision 201898)
+++ reload.h(working copy)
@@ -203,7 +203,7 @@ #define caller_save_initialized_p \
   (this_target_reload->x_caller_save_initialized_p)
 
 /* Register equivalences.  Indexed by register number.  */
-typedef struct reg_equivs
+typedef struct reg_equivs_s
 {
   /* The constant value to which pseudo reg N is equivalent,
  or zero if pseudo reg N is not equivalent to a constant.


Re: [RFC] Old school parallelization of WPA streaming

2013-08-21 Thread Andi Kleen
On Wed, Aug 21, 2013 at 04:17:48PM +0200, Jan Hubicka wrote:
> Hi,
> this is my attempt to bring GCC into wonderful era of multicore CPUs :)
> It is a hack, but it seems to help quite a lot.  About 50% of WPA time is 
> spent
> by streaming the individual ltrans .o files.  This can be easily parallelized
> by fork - we do nothing afterwards, just exit and pass the list to the linker.

One risk is if someone streams to a spinning disk it may add more seeks for 
the parallel IO. But I think it's a reasonable tradeoffs.

We should also use a faster compressor

> For -flto=jobserver I simply fork all 32 processes.  It may not be a disaster,
> but perhaps we should figure out how to communicate with jobserver.  At first
> glance on document on how it works, it seems easy to add. Perhaps we can even
> convicne GNU Make folks to put simple helpers to libiberty?

lto=jobserver is still broken and confuses tokens on large builds (ends
with a 0 read) I did some debugging recently, and I suspect a Linux kernel
bug now. Still haven't tracked it down.

Any workarounds would need make changs unfortunately.

> 
> We also may figure out number of CPUs (is it available i.e. from libgomp)

sysconf(_SC_NPROCESSORS_ONLN) ? 

> and use it by default even if user do not care to pass number of processes.
> Naturally these streaming forks should be cheap memory wise. I hope Martin
> will get me some actual numbers.
> 
> With the patch the WPA time of firefox goes down to 2 minutes (4.8 needs about
> 30 minutes and without the hack one needs about 5 minutes)

Cool!

I'll try it on my builds
>  
> +fparallelism=
> +LTO Joined
> +Run the link-time optimizer in whole program analysis (WPA) mode.

The description does not make sense

Rest of patch looks good from a quick read, although I would prefer to 
do the waiting for children in the "parent", not the "last one"

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Rerun df_analyze after delete_unmarked_insns during DCE

2013-08-21 Thread Jeff Law

On 08/21/2013 08:25 AM, David Edelsohn wrote:

This patch has caused a bootstrap failure for powerpc-aix and probably
powerpc64-linux.  GCC segfaults and core dumps during stage2
configure.

The motivation for this patch seems faulty and I strongly request that
it be reverted.
Instead of going instantly to "revert the patch", the right thing to do 
is spend a little time analyzing the problem then decide on the best 
course of action.  It could well be the issue Bernd noted about changing 
regs_ever_live after reload.  I'm looking at it now.


Jeff



Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-21 Thread Jan Hubicka
> >
> > Because offline COMDAT functoin will be porduced for every COMDAT used, I 
> > think
> > it is bad to porduce any COMDAT (or any reachable function via calls with 
> > non-0
> > count) that has empty profile (either because it got lost by COMDAT merging
> > or because of reading mismatch).
> 
> The approach this patch takes is to simply treat those functions the
> same as we would if we didn't feed back profile data in the first
> place, by using the frequencies. This is sufficient except when one is
> inlined, which is why I have the special handling in the inliner
> itself.

Yes, my orignal plan was to have per-function profile_status that 
specify if profile is read, guessed or absent and do function local
decision sanely with each setting.

Here we read the function, we set profile to READ (with all counts being 0).
We should drop it to GUESSED when we see that there are non-0 count edges
calling the function in question and probably we should see if it is obviously
hot (i.e. reachable by a hot call) and promote its function profile to HOT
then to get code placement less bad...
> >
> > Since new direct calls can be discovered later, inline may want to do that
> > again each time it inlines non-0 count call of COMDAT with 0 count...
> 
> How about an approach like this:
> - Invoke init_and_estimate_bb_frequencies as I am doing to guess the
> profiles at profile read time for functions with 0 counts.

I see, here we are out of sync. 
We always used to go with estimated frequencies for functions with 0 counts,
but it seems that this code broke when prediction was moved before profiling.
(we also should keep edge probabilities from predict.c in that case)

The esitmated profile is already there before reading the profile in, so we
only do not want to overwrite it.  Does the following work for you?

Index: tree-profile.c
===
--- tree-profile.c  (revision 201838)
+++ tree-profile.c  (working copy)
@@ -604,6 +604,34 @@
 
   pop_cfun ();
 }
+  /* See if 0 count function has non-0 count callers.  In this case we
+ lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  struct cgraph_edge *e;
+  bool called = false;
+  if (node->count)
+   continue;
+  for (e = node->callers; e; e = e->next_caller)
+   {
+ if (e->count)
+   called = true;
+ if (cgraph_maybe_hot_edge_p (e))
+   break;
+   }
+  if (e || called
+ && profile_status_for_function
+ (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+   {
+ if (dump_file)
+   fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on 
calls.\n",
+cgraph_node_name (node), node->symbol.order,
+e ? "function is hot" : "function is normal");
+ profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+   = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+ node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+   }
+}
 
   del_node_map();
   return 0;
Index: predict.c
===
--- predict.c   (revision 201838)
+++ predict.c   (working copy)
@@ -2715,6 +2715,9 @@
   gcov_type count_max, true_count_max = 0;
   basic_block bb;
 
+  if (!ENTRY_BLOCK_PTR->count)
+return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
 true_count_max = MAX (bb->count, true_count_max);
 

> - At inline time, invoke some kind of freqs_to_counts routine for any
> 0-count routine that is reached by non-zero call edges. It would take

We should not need that since frequencies ought to be there.

> the sum of all incoming call edge counts and synthesize counts for the
> bbs using the guessed profile frequencies applied earlier by
> init_and_estimate_bb_frequencies. Then the inliner can do its normal
> bb count scaling.

Yes, i guess we should go this way.  Still we will need to watch overly large 
values.
Recrusive inlining can probably easily produce quite a nonsense here.

We wil also need to solve problem that in this case cgraph edges will have 0 
profile.
We probably want to play the game there and just do the scaling for edge count,
since IPA passes probably do not want to care about partial profiles.
> 
> Does that seem like a reasonable approach?
> 
> There is one other fix in this patch:
> - The clone_inlined_nodes/update_noncloned_frequencies changes below
> are handling a different case: 0-count call edge in this module, with
> non-zero callee node count due to calls from other modules. It will
> allow update_noncloned_frequencies to scale down the edge counts in
> callee's cloned call tree. This was a fix I made for the
> callgraph-based linker plugin function reordering, and not splitting
> (since it is using both the node and edge weights to m

Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-08-21 Thread Aldy Hernandez

Even more review stuff.  Are you keeping track of all this Balaji? :)


+  if (warn)
+warning (0, "suspicious use of _Cilk_spawn");


First, as I've mentioned, this error message is very ambiguous.  You 
should strive to provide better error messages.  See my previous comment 
on this same line of code.


However... for all the checking you do in cilk_valid_spawn, I don't see 
a single corresponding test.


May I stress again the importance of tests-- which are especially
critical for new language features.  You don't want cilk silently
breaking thus rendering all your hard work moot, do you? :))

You particularly need tests for all quirks described in the Cilk Plus
language specification around here:

"A program is considered ill formed if the _Cilk_spawn form of this
expression appears other than in one of the following contexts: [blah
blah blah]".



+  /* Strip off any conversion to void.  It does not affect whether spawn
+ is supported here.  */
+  if (TREE_CODE (exp) == CONVERT_EXPR && VOID_TYPE_P (TREE_TYPE (exp)))
+exp = TREE_OPERAND (exp, 0);


Don't you need to strip off various levels here with a loop?  Also, 
could any of the following do the job? STRIP_NOPS, STRIP_TYPE_NOPS, 
STRIP_USELESS_TYPE_CONVERSION.



@@ -7086,6 +7087,19 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   else if (ret != GS_UNHANDLED)
break;

+  if (flag_enable_cilkplus && lang_hooks.cilkplus.cilk_valid_spawn 
(expr_p))
+   {
+ /* If there are errors, there is no point in expanding the
+_Cilk_spawn.  Just gimplify like a normal call expr.  */
+ if (!seen_error ())
+   {
+ ret = (enum gimplify_status)
+   lang_hooks.cilkplus.gimplify_cilk_spawn (expr_p, pre_p, post_p);
+ if (ret != GS_UNHANDLED)
+   continue;
+   }
+   }
+


Oh, hell no!  You do realize you are drilling down and walking every 
single expression being passed to the gimplifier to find your spawn? 
That's not cool.  You need to find some way to annotate expressions or 
do this more efficiently.  It may help to bootstrap with -fcilkplus and 
do performance analysis, to make sure you're not making the compiler 
slower on the non cilkplus code path.


Could you not let the gimplifier do its thing and add a case for 
CILK_SPAWN_STMT where you do the unwrapping and everything else?  I do 
realize that cilk_valid_spawn() is doing all sorts of type checking, and 
validation, but the gimplifier is really not the place to do this.  When 
possible, you should do type checking as close to the source as 
possible, thus-- at the parser.  See how c_finish_omp_for() is called 
from the FE to do type checking, build the OMP_FOR tree node, *and* do 
the add_stmt().  Perhaps you need corresponding a 
c_finish_cilk_{spawn,sync}.  Definitely worth looking into.  But I can 
tell you now, drilling down into every expression being gimplified is a 
no-go.


Also, do you realy need two hooks to recognize spawns: recognize_spawn 
and cilk_valid_spawn?  And are C/C++ so different that you need a hook

with different versions of each?


+/* Returns a setjmp CALL_EXPR with FRAME->context as its parameter.  */
+
+tree
+cilk_call_setjmp (tree frame)


Is this used anywhere else but in this file?  If not, please declare static.


+/* Expands the __cilkrts_pop_frame function call stored in EXP.
+   Returns const0_rtx.  */
+
+void
+expand_builtin_cilk_pop_frame (tree exp)

[snip]

+/* Expands the cilk_detach function call stored in EXP.  Returns const0_rtx.  
*/
+
+void
+expand_builtin_cilk_detach (tree exp)


Do these builtins really have to be expanded into rtl?  Can this not be 
modeled with trees or gimple?  Expansion into rtl should be used for 
truly architecture dependent stuff that cannot be modeled with anything 
higher level.


For the memory barrier stuff, we already have Andrew's atomic and memory 
model infrastructure which I think should be enough to model whatever 
you are expanding into RTL here.  But I may be wrong...


Aldy


Re: [PATCH] Rerun df_analyze after delete_unmarked_insns during DCE

2013-08-21 Thread Steven Bosscher
On Wed, Aug 21, 2013 at 5:10 PM, Jeff Law  wrote:
> On 08/21/2013 08:25 AM, David Edelsohn wrote:
>>
>> This patch has caused a bootstrap failure for powerpc-aix and probably
>> powerpc64-linux.  GCC segfaults and core dumps during stage2
>> configure.
>>
>> The motivation for this patch seems faulty and I strongly request that
>> it be reverted.
>
> Instead of going instantly to "revert the patch", the right thing to do is
> spend a little time analyzing the problem then decide on the best course of
> action.  It could well be the issue Bernd noted about changing
> regs_ever_live after reload.  I'm looking at it now.


Well, that was the purpose of the patch:

" Fixed by forcing regs_ever_live update and rerunning df_analyze ()
at fini_dce()."

Note that the patch was for an out-of-tree port. Meaning: trunk is
currently broken for a patch that  wasn't needed for any in-tree
ports.

Ciao!
Steven


Re: [RFC] Old school parallelization of WPA streaming

2013-08-21 Thread Richard Biener
Andi Kleen  wrote:
>On Wed, Aug 21, 2013 at 04:17:48PM +0200, Jan Hubicka wrote:
>> Hi,
>> this is my attempt to bring GCC into wonderful era of multicore CPUs
>:)
>> It is a hack, but it seems to help quite a lot.  About 50% of WPA
>time is spent
>> by streaming the individual ltrans .o files.  This can be easily
>parallelized
>> by fork - we do nothing afterwards, just exit and pass the list to
>the linker.
>
>One risk is if someone streams to a spinning disk it may add more seeks
>for 
>the parallel IO. But I think it's a reasonable tradeoffs.

It'll also wreck all WPA dump files.

>We should also use a faster compressor

And we should avoid uncompressing the function sections...

That said, the patch is enough of a hack that I don't ever want to debug a bug 
in it

I also fail to see why threads should not work here.  Maybe simply annotate gcc 
with openmp?

Richard.

>> For -flto=jobserver I simply fork all 32 processes.  It may not be a
>disaster,?
>> but perhaps we should figure out how to communicate with jobserver. 
>At first
>> glance on document on how it works, it seems easy to add. Perhaps we
>can even
>> convicne GNU Make folks to put simple helpers to libiberty?
>
>lto=jobserver is still broken and confuses tokens on large builds (ends
>with a 0 read) I did some debugging recently, and I suspect a Linux
>kernel
>bug now. Still haven't tracked it down.
>
>Any workarounds would need make changs unfortunately.
>
>> 
>> We also may figure out number of CPUs (is it available i.e. from
>libgomp)
>
>sysconf(_SC_NPROCESSORS_ONLN) ? 
>
>> and use it by default even if user do not care to pass number of
>processes.
>> Naturally these streaming forks should be cheap memory wise. I hope
>Martin
>> will get me some actual numbers.
>> 
>> With the patch the WPA time of firefox goes down to 2 minutes (4.8
>needs about
>> 30 minutes and without the hack one needs about 5 minutes)
>
>Cool!
>
>I'll try it on my builds
>>  
>> +fparallelism=
>> +LTO Joined
>> +Run the link-time optimizer in whole program analysis (WPA) mode.
>
>The description does not make sense
>
>Rest of patch looks good from a quick read, although I would prefer to 
>do the waiting for children in the "parent", not the "last one"
>
>-Andi




Re: [C++ Patch] PR 56130

2013-08-21 Thread Paolo Carlini

Hi again,

On 08/21/2013 03:45 PM, Paolo Carlini wrote:

Hi,

this bug points out that we fail to emit deprecated warnings when 
references are involved. Turns out that at the end of 
finish_id_expression the VAR_DECL is wrapped in INDIRECT_REF. The 
trivial patch below appears to work fine and should be pretty safe in 
terms of false positives, because the warning is enabled by default.
In fact, since we have an issue with *references* I think using 
REFERENCES_REF_P, per the below, would be more correct. Lightly tested 
so far, I'm booting and testing it.


Thanks again,
Paolo.

//
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 201902)
+++ cp/semantics.c  (working copy)
@@ -3457,8 +3457,10 @@ finish_id_expression (tree id_expression,
}
 }
 
-  if (TREE_DEPRECATED (decl))
-warn_deprecated_use (decl, NULL_TREE);
+  /* Handle references (c++/56130).  */
+  tree t = REFERENCE_REF_P (decl) ? TREE_OPERAND (decl, 0) : decl;
+  if (TREE_DEPRECATED (t))
+warn_deprecated_use (t, NULL_TREE);
 
   return decl;
 }
Index: testsuite/g++.dg/warn/deprecated-7.C
===
--- testsuite/g++.dg/warn/deprecated-7.C(revision 0)
+++ testsuite/g++.dg/warn/deprecated-7.C(working copy)
@@ -0,0 +1,17 @@
+// PR c++/56130
+
+int g_nn;
+int& g_n __attribute__((deprecated)) = g_nn;
+
+void f()
+{
+  int f_nn;
+  int& f_n __attribute__((deprecated)) = f_nn;
+  f_n = 1;// { dg-warning "'f_n' is deprecated" }
+}
+
+int main()
+{
+  g_n = 1;// { dg-warning "'g_n' is deprecated" }
+  f();
+}


Re: [RFC] Old school parallelization of WPA streaming

2013-08-21 Thread Andi Kleen
> I also fail to see why threads should not work here.  Maybe simply annotate 
> gcc with openmp?

Don't you have to set a environment variable to set the number of threads
for openmp?

Otherwise it sounds like a reasonable way to do it.

-Andi


Re: [PATCH, ARM] Fix handling of function arguments with excess alignment

2013-08-21 Thread Richard Earnshaw
On 08/08/13 14:38, Richard Earnshaw wrote:
> PR target/56979 is a bug where a parameter to a function has an
> alignment that is larger than its natural alignment.  In this case this
> causes the mid-end to generate a mode for the argument that is
> incompatible with the registers that are assigned for it.  We then end
> up creating invalid RTL and subsequently abort when the pattern cannot
> emit assembly code.
> 
> The fix is to decompose the assignment when this would happen in the
> same way that we handle other block mode arguments and handle each piece
> in turn.
> 
>   PR target/56979
>   * arm.c (aapcs_vfp_allocate): Decompose the argument if the
>   suggested mode for the assignment isn't compatible with the
>   registers required.
> 
> Committed to trunk.
> 

And back-ported to the 4.7 and 4.8 branches.




Re: [GOOGLE] Assign discriminators for different callsites at a same line within one BB

2013-08-21 Thread Cary Coutant
> You are right, we need discriminator for non-CALL stmts too. Patch updated:

OK for google branches. Thanks!

-cary


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Andi Kleen
Torvald Riegel  writes:
> +#endif
>   leaq8(%rsp), %rax
> - subq$56, %rsp
> - cfi_def_cfa_offset(64)
> + subq$64, %rsp
> + cfi_def_cfa_offset(72)

I don't see why you did this change and the addq change below.

The rest seems reasonable to me, although I haven't tried to untangle
the full dependencies between C++ and asm code for retries.
It would be likely cleaner to just keep the retries fully
in C++ like the original patch did. There's no advantage
of going back to assembler.

>   movq%rax, (%rsp)
>   movq%rbx, 8(%rsp)
>   movq%rbp, 16(%rsp)
> @@ -72,8 +127,21 @@ SYM(_ITM_beginTransaction):
>   movq%r15, 48(%rsp)
>   movq%rsp, %rsi
>   callSYM(GTM_begin_transaction)
> - addq$56, %rsp
> + addq$64, %rsp
>   cfi_def_cfa_offset(8)

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [patch] Adjust DECL_NAME of virtual clones

2013-08-21 Thread Eric Botcazou
> We already support the .blahblah syntax in the demangler in d_print_comp.
> (I think it deserve to be extended to special case some newly introduced
> mess, like .ltopriv and .local_alias)
> I wonder if we don't want to get the mangling here to be consistent, so
> users are facing the same type of obscurity?

IOW you'd prefer using the period as concatenation character here?  Fine with 
me as well, I only care about the uniform side.

> Otherwise the patch seems resonable to me.   With LTO we will still get the
> ugly assembler names right?

I'm afraid so, but that's a very minor caveat.

-- 
Eric Botcazou


Re: RFC - Refactor tree.h

2013-08-21 Thread Andrew MacLeod

On 08/10/2013 06:03 AM, Richard Biener wrote:

Mike Stump  wrote:

On Aug 9, 2013, at 3:36 PM, Diego Novillo  wrote:

This patch is still WIP.  It builds stage1, but I'm getting ICEs
during stage 2.

The patch splits tree.h into three files:

- tree-core.h: All data structures, enums and typedefs from
  tree.h

- tree-api.h: All extern function definitions from tree.h

- tree-macros.h: All macro accessors, tree checks and other
  inline functions.

I don't like this split.  You focus in on the details and sort code by
detail.  I think this is wrong.  I want code sorted by the features and
functions it provides, and all like this, go into explainable
functional bins.  One day, a function might be implemented by a data
structure, the next day, by a routine, or a macro, or an inline
function, the form of it doesn't matter.

I mostly agree - tree-macros.h is a red herring. It should be tree-core.h and 
tree.h only. What does not belong there should go to other appropriate places, 
like fold-const.h for example.



the reason for the tri-split is because many of the "appropriate" places 
for function prototypes don't exist yet...  they were thrown in tree.h 
or tree-flow.h because no one wanted to create a small header file as an 
appropriate place, or was too lazy to figure it out.


The idea here is to get all those into a file of their own so they can 
then be dealt with later, but not impact the code base much. They don't 
need any of the tree accessor macros, nor even the tree structural 
content, just the "struct tree *" from core-types. . This means the 
tree-api.h file can be included by tree.h for untouched files, and can 
also be included from gimple.h for those which have been converted and 
no longer include tree.h.Leaving them in tree.h defeats the purpose 
of the split since tree.h would have to be included by files using 
gimple.h in order to see the prototypes.. and that would then bring in 
all the tree macros again.


So really, the content of tree-macro.h should be called tree.h, and that 
should include the tree-core.h file as well as the tree-api.h file.   
Then all existing files which include tree.h get exactly what they have 
today.


We are then left with this tree-api.h file which will be included from 2 
places.. tree.h and gimple.h.   As files are converted to the new gimple 
form, any functions which use to have 'tree' in the prototype are going 
to be converted to GimpleType or whatever, and the tree prototype(s) 
will be removed from tree-api.h.  At that point, the prototype(s) will 
be put in an appropriate header file, creating one if need be, and 
included as needed.


So that is my rationale for the 3 way split of tree.h I proposed to Diego.

We could do that tree-api work upfront.. everything that is in 
tree-api.h could be given a new home, which would require creating some 
more header files and changing the #include footprint in various files.  
I was just trying to minimize the turmoil in the 4.9 source base... :-)


Andrew




Re: [PATCH] Rerun df_analyze after delete_unmarked_insns during DCE

2013-08-21 Thread Eric Botcazou
> (I can't find the original mail either in my mailbox or in the archives).

It's PR rtl-optimization/57940.

-- 
Eric Botcazou


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Torvald Riegel
On Wed, 2013-08-21 at 10:14 -0700, Andi Kleen wrote:
> Torvald Riegel  writes:
> > +#endif
> > leaq8(%rsp), %rax
> > -   subq$56, %rsp
> > -   cfi_def_cfa_offset(64)
> > +   subq$64, %rsp
> > +   cfi_def_cfa_offset(72)
> 
> I don't see why you did this change and the addq change below.

I need to store edi (ie, the properties of the transaction passed to the
function by the compiler) on the stack, so these two changes create
additional room for it.  (I wasn't sure about alignment requirements, so
I just used 8 bytes for it.)

> The rest seems reasonable to me, although I haven't tried to untangle
> the full dependencies between C++ and asm code for retries.

If anyone has any suggestions for how to improve the comments, let me
know.

> It would be likely cleaner to just keep the retries fully
> in C++ like the original patch did. There's no advantage
> of going back to assembler.

That's true for x86, but it seems that for s390, we can't easily put the
xbegin/tbegin into the C++ code because of floating point register
save/restore issues.  The added complexity on the x86 side seemed to be
a reasonable price for having a general HTM fast path retry handling on
the C++ side.

Torvald



Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Andi Kleen
> That's true for x86, but it seems that for s390, we can't easily put the
> xbegin/tbegin into the C++ code because of floating point register
> save/restore issues.  The added complexity on the x86 side seemed to be
> a reasonable price for having a general HTM fast path retry handling on
> the C++ side.

I don't see much point in trying to unify these fast paths for very
different implementations with different properties. It would
not surprise me if the highly tuned end result is different.

It's kind of like trying to write a portable high performance memcpy().

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Richard Henderson
> -#if defined(USE_HTM_FASTPATH) && !defined(HTM_CUSTOM_FASTPATH)
> +#ifdef USE_HTM_FASTPATH
>// HTM fastpath.  Only chosen in the absence of transaction_cancel to allow
>// using an uninstrumented code path.
>// The fastpath is enabled only by dispatch_htm's method group, which uses
> @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
> gtm_jmpbuf *jb)
>// indeed in serial mode, and HW transactions should never need serial mode
>// for any internal changes (e.g., they never abort visibly to the STM code
>// and thus do not trigger the standard retry handling).
> +#ifndef HTM_CUSTOM_FASTPATH
>if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
>  {
>for (uint32_t t = htm_fastpath; t; t--)
> @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
> gtm_jmpbuf *jb)
>   }
>   }
>  }
> +#else
> +  // If we have a custom HTM fastpath in ITM_beginTransaction, we implement
> +  // just the retry policy here.  We communicate with the custom fastpath

Don't you want this logic arranged as

#ifdef HTM_CUSTOM_FASTPATH
 ... your new code
#elif defined(USE_HTM_FASTPATH)
 ... existing htm code
#endif

> + /* Store edi for future HTM fast path retries.  */
> + movl%edi, -8(%rsp)
> + orl $pr_HTMRetryableAbort, %edi
> + /* Let the C++ code handle the retry policy.  */
> +no_htm:
Note for future porting of this code to i386 or win64 -- there is no redzone.
It might just be cleaner to go ahead and allocate the stack frame always, even
if we don't store anything into it along the htm fastpath.

> - subq$56, %rsp
> - cfi_def_cfa_offset(64)
> + subq$64, %rsp
> + cfi_def_cfa_offset(72)

You now have an abi-misaligned stack.  Since the return address is pushed by
the call, an aligned stack frame allocation is (N + 8) % 16 == 0.

> +  // Accessed from assembly language, thus the "asm" specifier on
> +  // the name, avoiding complex name mangling.
> +#ifdef __USER_LABEL_PREFIX__
> +#define UPFX1(t) UPFX(t)
> +#define UPFX(t) #t
> +  static gtm_rwlock serial_lock
> +__asm__(UPFX1(__USER_LABEL_PREFIX__) "gtm_serial_lock");
> +#else
> +  static gtm_rwlock serial_lock
> +__asm__("gtm_serial_lock");
> +#endif

Now that we have 3 copies of this, we should simplify things.  E.g.

#ifdef __USER_LABEL_PREFIX__
# define UPFX1(X) #X
# define UPFX UPFX1(__USER_LABEL_PREFIX__)
#else
# define UPFX
#endif

static gtm_rwlock serial_lock __asm__(UPFX "gtm_serial_lock");

or something.


r~


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Torvald Riegel
On Wed, 2013-08-21 at 19:41 +0200, Andi Kleen wrote:
> > That's true for x86, but it seems that for s390, we can't easily put the
> > xbegin/tbegin into the C++ code because of floating point register
> > save/restore issues.  The added complexity on the x86 side seemed to be
> > a reasonable price for having a general HTM fast path retry handling on
> > the C++ side.
> 
> I don't see much point in trying to unify these fast paths for very
> different implementations with different properties. It would
> not surprise me if the highly tuned end result is different.
> 
> It's kind of like trying to write a portable high performance memcpy().

I agree that highly tuned implementations might eventually be pretty
different from each other.  Nonetheless, I'm currently not aware of
anybody volunteering to really work on highly tuned implementations, in
particular regarding the retry policies.  Once we get there -- and it
would be great if somebody would work on that!, I'm completely fine with
separating those bits out.  Until then, I think it's preferable to keep
things as unified as possible as long as this doesn't come with
unreasonable complexity or runtime overheads.

Torvald



Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Richard Henderson
On 08/21/2013 10:14 AM, Andi Kleen wrote:
> The rest seems reasonable to me, although I haven't tried to untangle
> the full dependencies between C++ and asm code for retries.
> It would be likely cleaner to just keep the retries fully
> in C++ like the original patch did. There's no advantage
> of going back to assembler.

Isn't there?  The transaction will be fractionally smaller, for not
having had to see the unwinding of the c++ stack frame.


r~



[C++ Patch, obvious] Consistently use INDIRECT_REF_P

2013-08-21 Thread Paolo Carlini

Hi,

earlier today noticed three spots where we don't use the existing 
INDIRECT_REF_P predicate. I think the patch qualifies as obvious, and 
I'm going to apply it later today.


Thanks,
Paolo.

/
2013-08-21  Paolo Carlini  

* call.c (build_new_method_call_1): Use INDIRECT_REF_P.
* cp-tree.h (REFERENCE_REF_P): Likewise.
* semantics.c (finish_offsetof): Likewise.
Index: cp/call.c
===
--- cp/call.c   (revision 201902)
+++ cp/call.c   (working copy)
@@ -7668,7 +7668,7 @@ build_new_method_call_1 (tree instance, tree fns,
 
   if (init)
{
- if (TREE_CODE (instance) == INDIRECT_REF
+ if (INDIRECT_REF_P (instance)
  && integer_zerop (TREE_OPERAND (instance, 0)))
return get_target_expr_sfinae (init, complain);
  init = build2 (INIT_EXPR, TREE_TYPE (instance), instance, init);
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 201902)
+++ cp/cp-tree.h(working copy)
@@ -2975,7 +2975,7 @@ extern void decl_shadowed_for_var_insert (tree, tr
 
 /* True if NODE is an implicit INDIRECT_EXPR from convert_from_reference.  */
 #define REFERENCE_REF_P(NODE)  \
-  (TREE_CODE (NODE) == INDIRECT_REF\
+  (INDIRECT_REF_P (NODE)   \
&& TREE_TYPE (TREE_OPERAND (NODE, 0))   \
&& (TREE_CODE (TREE_TYPE (TREE_OPERAND ((NODE), 0)))\
== REFERENCE_TYPE))
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 201902)
+++ cp/semantics.c  (working copy)
@@ -3691,7 +3693,7 @@ finish_offsetof (tree expr)
   || TREE_CODE (TREE_TYPE (expr)) == METHOD_TYPE
   || TREE_TYPE (expr) == unknown_type_node)
 {
-  if (TREE_CODE (expr) == INDIRECT_REF)
+  if (INDIRECT_REF_P (expr))
error ("second operand of % is neither a single "
   "identifier nor a sequence of member accesses and "
   "array references");


Re: [C++ Patch] PR 56130

2013-08-21 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Rerun df_analyze after delete_unmarked_insns during DCE

2013-08-21 Thread Jeff Law

On 08/21/2013 11:32 AM, Eric Botcazou wrote:

(I can't find the original mail either in my mailbox or in the archives).


It's PR rtl-optimization/57940.
It was also on the mailing list too.  If I'd had the reference to 57940, 
I wouldn't have approved the patch given your comment from July 20.


jeff



Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-08-21 Thread Jeff Law

On 08/21/2013 09:31 AM, Aldy Hernandez wrote:


May I stress again the importance of tests-- which are especially
critical for new language features.  You don't want cilk silently
breaking thus rendering all your hard work moot, do you? :))
Agreed.  While we don't have a strict policy for testing new features, 
adding tests for this kind of stuff is highly encouraged.  Not everyone 
doing GCC development is going to be familiar enough with Cilk+ and how 
their patch might interact with the Cilk+ support.


Having tests in the testsuite makes it much less likely someone will 
break the Cilk+ support accidentally.



+  if (flag_enable_cilkplus &&
lang_hooks.cilkplus.cilk_valid_spawn (expr_p))
+{
+  /* If there are errors, there is no point in expanding the
+ _Cilk_spawn.  Just gimplify like a normal call expr.  */
+  if (!seen_error ())
+{
+  ret = (enum gimplify_status)
+lang_hooks.cilkplus.gimplify_cilk_spawn (expr_p, pre_p, post_p);
+  if (ret != GS_UNHANDLED)
+continue;
+}
+}
+


Oh, hell no!  You do realize you are drilling down and walking every
single expression being passed to the gimplifier to find your spawn?
That's not cool.  You need to find some way to annotate expressions or
do this more efficiently.  It may help to bootstrap with -fcilkplus and
do performance analysis, to make sure you're not making the compiler
slower on the non cilkplus code path.
Yea, that would definitely be a problem (walking every expression 
looking for cilk spawns inside).  Having gimple nodes for these things 
would seem to make sense to me as well.




Could you not let the gimplifier do its thing and add a case for
CILK_SPAWN_STMT where you do the unwrapping and everything else?  I do
realize that cilk_valid_spawn() is doing all sorts of type checking, and
validation, but the gimplifier is really not the place to do this.  When
possible, you should do type checking as close to the source as
possible, thus-- at the parser.  See how c_finish_omp_for() is called
from the FE to do type checking, build the OMP_FOR tree node, *and* do
the add_stmt().  Perhaps you need corresponding a
c_finish_cilk_{spawn,sync}.  Definitely worth looking into.  But I can
tell you now, drilling down into every expression being gimplified is a
no-go.
Yea, we *really* want the gimplification & checking separated.   If we 
think about where Andrew's proposals take us, then the checking needs to 
move into the front-end.  gimplification should be relegated, to the 
fullest extent possible, to converting the IL down to gimple.





Jeff


Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2013-08-21 Thread Alexander Ivchenko
Hi, there are still a couple of problems with my patch:

The build is broken for the following targets:
1) *linux* targets that do not include config/linux.h in their tm.h
(e.g alpha-linux, ppc64-linux etc). For them we have:

../../../../gcc/gcc/config/linux-android.c: In function ‘bool
linux_android_libc_has_function(function_class)’:
../../../../gcc/gcc/config/linux-android.c:40:7: error:
‘OPTION_BIONIC’ was not declared in this scope
   if (OPTION_BIONIC)
   ^
make[2]: *** [linux-android.o] Error 1

This is adressed in the changes of config/linux-android.c: linux_libc,
LIBC_GLIBC and LIBC_BIONIC seem to be declared for all *linux*
targets.

2) *uclinux* targets that include config/linux.h. For *uclinux* we do
not use linux-protos.h, and therefore linux_android_libc_has_function
is not declared there.
I don't want to add aditional tmake_file, tm_p_file and extra_objs, so
I added explicit define of TARGET_LIBC_HAS_FUNCTION as
no_c99_libc_has_function for those targets.

I'm sorry for that. The following patch cured my build of those
targets; it is also preserving the initial presence of c99. There were
plenty of targets that were changed by my patch, I hope this time I
didn't miss anything.

Is it ok?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6e27be2..02679f3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2013-08-21  Alexander Ivchenko  
+
+ * config/linux-android.c (linux_android_libc_has_function): Fix
+ checks for libc.
+ * config/bfin/uclinux.h: Define TARGET_LIBC_HAS_FUNCTION as
+ no_c99_libc_has_function.
+ * config/c6x/uclinux-elf.h: Ditto.
+ * config/lm32/uclinux-elf.h: Ditto.
+ * config/m68k/uclinux.h: Ditto.
+ * config/moxie/uclinux.h: Ditto.
+
 2013-08-21  Joern Rennecke  

  * reload.h (struct reg_equivs): Rename to ..
diff --git a/gcc/config/bfin/uclinux.h b/gcc/config/bfin/uclinux.h
index ca0f4ee..63cba99 100644
--- a/gcc/config/bfin/uclinux.h
+++ b/gcc/config/bfin/uclinux.h
@@ -44,3 +44,6 @@ see the files COPYING3 and COPYING.RUNTIME
respectively.  If not, see
 #define TARGET_SUPPORTS_SYNC_CALLS 1

 #define SUBTARGET_FDPIC_NOT_SUPPORTED
+
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
diff --git a/gcc/config/c6x/uclinux-elf.h b/gcc/config/c6x/uclinux-elf.h
index 5d61f4d..fa0937e 100644
--- a/gcc/config/c6x/uclinux-elf.h
+++ b/gcc/config/c6x/uclinux-elf.h
@@ -62,3 +62,5 @@
 : "0" (_beg), "b" (_end), "b" (_scno)); \
 }

+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
diff --git a/gcc/config/linux-android.c b/gcc/config/linux-android.c
index 4a4b48d..e9d9e9a 100644
--- a/gcc/config/linux-android.c
+++ b/gcc/config/linux-android.c
@@ -35,9 +35,9 @@ linux_android_has_ifunc_p (void)
 bool
 linux_android_libc_has_function (enum function_class fn_class)
 {
-  if (OPTION_GLIBC)
+  if (linux_libc == LIBC_GLIBC)
 return true;
-  if (OPTION_BIONIC)
+  if (linux_libc == LIBC_BIONIC)
 if (fn_class == function_c94
  || fn_class == function_c99_misc
  || fn_class == function_sincos)
diff --git a/gcc/config/lm32/uclinux-elf.h b/gcc/config/lm32/uclinux-elf.h
index 3a556d7..a5e8163 100644
--- a/gcc/config/lm32/uclinux-elf.h
+++ b/gcc/config/lm32/uclinux-elf.h
@@ -77,3 +77,5 @@
 #undef  CC1_SPEC
 #define CC1_SPEC "%{G*} %{!fno-PIC:-fPIC}"

+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
diff --git a/gcc/config/m68k/uclinux.h b/gcc/config/m68k/uclinux.h
index 8d74312..b1af7d2 100644
--- a/gcc/config/m68k/uclinux.h
+++ b/gcc/config/m68k/uclinux.h
@@ -67,3 +67,6 @@ along with GCC; see the file COPYING3.  If not see
sections.  */
 #undef M68K_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
 #define M68K_OFFSETS_MUST_BE_WITHIN_SECTIONS_P 1
+
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
diff --git a/gcc/config/moxie/uclinux.h b/gcc/config/moxie/uclinux.h
index 498037e..85c65f2 100644
--- a/gcc/config/moxie/uclinux.h
+++ b/gcc/config/moxie/uclinux.h
@@ -37,3 +37,6 @@ see the files COPYING3 and COPYING.RUNTIME
respectively.  If not, see
  --wrap=mmap --wrap=munmap --wrap=alloca\
  %{fmudflapth: --wrap=pthread_create\
 }} %{fmudflap|fmudflapth: --wrap=main}"
+
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function

thanks,
Alexander

2013/8/21 Rainer Orth :
> Alexander Ivchenko  writes:
>
>> Hi Joseph, thanks for your comments.
>>
>> I updated the patch:
>>
>> 1) The function name as a second argument in libc_has_function target
>> hook was removed - was not usefull so far.
>> 2) By using contrib/config-list.mk (thanks for the hint - great tool!)
>> and analysing tm.h files and what is included in them I have checked
>> 197 targets. That analysis includes all issues that you raised in your
>> comments - everything is fixed now. I don't like that sometimes we
>> have to redefine the version of the hook back to the default one due
>> to a poisoning of including elfos.h, but I

Change API for register_jump_thread to pass in an entire path

2013-08-21 Thread Jeff Law


Right now, the API to register a requested jump thread passes three 
edges.  The incoming edge we traverse, the outgoing edge we know will be 
taken as a result of traversing the incoming edge and an optional edge 
to allow us to find the joiner block when we thread through a join node.


Note that incoming_edge->dest does not have to reference the same block 
as outgoing_edge->src as we allow the threader to thread across multiple 
blocks.  When we thread through multiple blocks, the latter blocks must 
be empty (no side effects other than control transfer).


The limitations we place when threading around empty blocks have kept 
the updating code relatively simple as we do not need to clone those 
empty blocks -- we just need to know where they're going to transfer 
control to.


As a result, the current code to update the SSA and CFG representations 
after threading a jump just needs to clone a single block and its side 
effects and perform minimal PHI node updates.



The general form of the FSA optimization changes things a bit in that we 
need to clone two blocks.  This entails additional PHI node updates.  To 
enable proper updating of the PHI nodes we need more pieces of the 
threaded path.  Rather than add another special argument to the 
register_jump_thread API, I'm changing the API to record the entire 
threaded path.


This patch just changes the API so the full path is available.  The 
SSA/CFG updating code (right now) just converts the full path into the 
3-edge form.  This does not change the generated code in any way shape 
or form.



Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Installed.


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7162f34..ba9c7c9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,12 @@
 2013-08-21  Jeff Law  
 
+   * tree-flow.h (register_jump_thread): Pass vector of edges
+   instead of each important edge.
+   * tree-ssa-threadedge.c (thread_across_edge): Build the jump
+   thread path into a vector and pass that to register_jump_thread.
+   * tree-ssa-threadupdate.c (register_jump_thread): Conver the
+   passed in edge vector to the current 3-edge form.
+
Revert:
2013-08-20  Alexey Makhalov  
 
diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index caa8d74..01e6562 100644
--- a/gcc/tree-flow.h
+++ b/gcc/tree-flow.h
@@ -750,7 +750,7 @@ bool may_be_nonaddressable_p (tree expr);
 
 /* In tree-ssa-threadupdate.c.  */
 extern bool thread_through_all_blocks (bool);
-extern void register_jump_thread (edge, edge, edge);
+extern void register_jump_thread (vec);
 
 /* In gimplify.c  */
 tree force_gimple_operand_1 (tree, gimple_seq *, gimple_predicate, tree);
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 357b671..320dec5 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -937,10 +937,15 @@ thread_across_edge (gimple dummy_cond,
}
 
  remove_temporary_equivalences (stack);
- if (!taken_edge)
-   return;
- propagate_threaded_block_debug_into (taken_edge->dest, e->dest);
- register_jump_thread (e, taken_edge, NULL);
+ if (taken_edge)
+   {
+ vec path = vNULL;
+ propagate_threaded_block_debug_into (taken_edge->dest, e->dest);
+ path.safe_push (e);
+ path.safe_push (taken_edge);
+ register_jump_thread (path);
+ path.release ();
+   }
  return;
}
 }
@@ -969,9 +974,12 @@ thread_across_edge (gimple dummy_cond,
bitmap_clear (visited);
bitmap_set_bit (visited, taken_edge->dest->index);
bitmap_set_bit (visited, e->dest->index);
+vec path = vNULL;
 
/* Record whether or not we were able to thread through a successor
   of E->dest.  */
+   path.safe_push (e);
+   path.safe_push (taken_edge);
found = false;
e3 = taken_edge;
do
@@ -988,6 +996,7 @@ thread_across_edge (gimple dummy_cond,
 
if (e2)
  {
+   path.safe_push (e2);
e3 = e2;
found = true;
  }
@@ -1008,10 +1017,11 @@ thread_across_edge (gimple dummy_cond,
  {
propagate_threaded_block_debug_into (e3->dest,
 taken_edge->dest);
-   register_jump_thread (e, taken_edge, e3);
+   register_jump_thread (path);
  }
  }
 
+path.release();
   }
 BITMAP_FREE (visited);
   }
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 0e4cbc9..e84542c 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1264,8 +1264,19 @@ thread_through_all_blocks (bool may_peel_loop_headers)
after fixing the SSA graph.  */
 
 void
-register_jump_thread (edge e, edge e2, edge e3)
+register_jump_thread (vec path)
 {
+  /* Convert PATH int

Wrong patch version in last message

2013-08-21 Thread Jeff Law


I attached the wrong version of the patch in my last message.  It was 
missing the declaration of "e3".  Fixed obviously.


Sorry,
Jeff


RE: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-08-21 Thread Iyer, Balaji V


> -Original Message-
> From: Aldy Hernandez [mailto:al...@redhat.com]
> Sent: Wednesday, August 21, 2013 11:31 AM
> To: Iyer, Balaji V
> Cc: r...@redhat.com; Jeff Law; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C
> 
> Even more review stuff.  Are you keeping track of all this Balaji? :)
> 

Yes I am. Please keep an eye out for a fixed patch soon.

> > +  if (warn)
> > +warning (0, "suspicious use of _Cilk_spawn");
> 
> First, as I've mentioned, this error message is very ambiguous.  You should 
> strive
> to provide better error messages.  See my previous comment on this same line
> of code.
> 
> However... for all the checking you do in cilk_valid_spawn, I don't see a 
> single
> corresponding test.
> 

Well, the name of the function is misleading.  I will fix that. I think it 
should call it "detect_cilk_spawn" instead

What the function does it NOT to find whether there are syntax or other issues 
in the spawned statement, but to check if spawn is used in appropriate location.
Here are some cases were you can use spawn (I am sure I am missing something):

X = _Cilk_spawn foo ();

_Cilk_spawn foo ()

operator=(x, _Cilk_spawn foo ())

and these things can be kept in different kind of trees and so adding this in 
individual tree's case statement can be a lot of code-addition and is error 
prone.

The warning you see is more like an "heads up." I can take out of if you like. 
If you notice, when I see an error, I don't bother gimplifying the spawned 
function (but just let the compiler go ahead as a regular function call) 
thereby not creating a new nested function etc.

> May I stress again the importance of tests-- which are especially critical 
> for new
> language features.  You don't want cilk silently breaking thus rendering all 
> your
> hard work moot, do you? :))
> 
> You particularly need tests for all quirks described in the Cilk Plus language
> specification around here:
> 
> "A program is considered ill formed if the _Cilk_spawn form of this expression
> appears other than in one of the following contexts: [blah blah blah]".
>

I have several of those already (e.g. using spawn outside a function, spawning 
something that is not a function, etc)
 
> 
> > +  /* Strip off any conversion to void.  It does not affect whether spawn
> > + is supported here.  */
> > +  if (TREE_CODE (exp) == CONVERT_EXPR && VOID_TYPE_P (TREE_TYPE
> (exp)))
> > +exp = TREE_OPERAND (exp, 0);
> 
> Don't you need to strip off various levels here with a loop?  Also, could any 
> of
> the following do the job? STRIP_NOPS, STRIP_TYPE_NOPS,
> STRIP_USELESS_TYPE_CONVERSION.
> 
> > @@ -7086,6 +7087,19 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p,
> gimple_seq *post_p,
> >else if (ret != GS_UNHANDLED)
> > break;
> >
> > +  if (flag_enable_cilkplus && lang_hooks.cilkplus.cilk_valid_spawn 
> > (expr_p))
> > +   {
> > + /* If there are errors, there is no point in expanding the
> > +_Cilk_spawn.  Just gimplify like a normal call expr.  */
> > + if (!seen_error ())
> > +   {
> > + ret = (enum gimplify_status)
> > +   lang_hooks.cilkplus.gimplify_cilk_spawn (expr_p, pre_p,
> post_p);
> > + if (ret != GS_UNHANDLED)
> > +   continue;
> > +   }
> > +   }
> > +
> 
> Oh, hell no!  You do realize you are drilling down and walking every single
> expression being passed to the gimplifier to find your spawn?
> That's not cool.  You need to find some way to annotate expressions or do this
> more efficiently.  It may help to bootstrap with -fcilkplus and do performance
> analysis, to make sure you're not making the compiler slower on the non 
> cilkplus
> code path.
> 
> Could you not let the gimplifier do its thing and add a case for
> CILK_SPAWN_STMT where you do the unwrapping and everything else?  I do
> realize that cilk_valid_spawn() is doing all sorts of type checking, and 
> validation,
> but the gimplifier is really not the place to do this.  When possible, you 
> should do
> type checking as close to the source as possible, thus-- at the parser.  See 
> how
> c_finish_omp_for() is called from the FE to do type checking, build the 
> OMP_FOR
> tree node, *and* do the add_stmt().  Perhaps you need corresponding a
> c_finish_cilk_{spawn,sync}.  Definitely worth looking into.  But I can tell 
> you
> now, drilling down into every expression being gimplified is a no-go.
> 

Well, I think the name of the function is what that is misleading here.

I am not recursing through the entire tree to find the spawn keyword here. What 
I am trying to see is if "*expr_p" is an INIT_EXPR, or TARGET_EXPR or a 
CALL_EXPR etc.

I do agree with you about one thing. I should first check to see if the 
function has a _Cilk_spawn before I go through and check for individual trees. 
That can be done easily by looking at cfun->cilk_frame_decl != NULL_TREE. That 
change I will make in the next patch. 

> Also, do yo

Re: [Patch, fortran] PR57798 uninitialized loop bound with sum and array-returning function.

2013-08-21 Thread Thomas Koenig
Hi Mikael,

> Regression tested on x86_64-unknown-linux-gnu.  OK for trunk/4.8?

OK for both. Thanks for the patch!

Thomas



Re: [PATCH i386 1/8] [AVX512] Adjust register classes.

2013-08-21 Thread Richard Henderson
On 08/21/2013 11:28 AM, Kirill Yukhin wrote:
>>> + && (mode == XImode
>>> + || VALID_AVX512F_REG_MODE (mode)
>>> + || VALID_AVX512F_SCALAR_MODE (mode)))
>>> +   return true;
>>> +
>>> +  /* In xmm16-xmm31 we can store only 512 bit modes.  */
>>> +  if (EXT_REX_SSE_REGNO_P (regno))
>>> +   return false;
>>
>> You're rejecting scalar modes here.  Not what you wanted, surely.
> Actually, I believe comment for AVX-512 part is confusing.
> We're not rejecting scalar modes, VALID_AVX512F_SCALAR_MODE allows that.
> We are rejecting all extra SSE registers, when there is no AVX-512 or
> mode is not fit for it.

Yes, I did mis-read the test.  What you have now is correct, but
I still think it could be improved.  We'll do that with followups.

>  (define_insn "*movdi_internal"
>[(set (match_operand:DI 0 "nonimmediate_operand"
> -"=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r ,?*Ym,*x,*x,*x,m ,?r 
> ,?r,?*Yi,?*Ym,?*Yi")
> +"=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r ,?*Ym,*v,*v,*v,m ,?r 
> ,?r,?*Yi,?*Ym,?*Yi")
>   (match_operand:DI 1 "general_operand"
> -"riFo,riF,Z,rem,i,re,C ,*y,m  ,*y,*Yn,r   ,C ,*x,m ,*x,*Yj,*x,r   ,*Yj 
> ,*Yn"))]
> +"riFo,riF,Z,rem,i,re,C ,*y,m  ,*y,*Yn,r   ,C ,*v,m ,*v,*Yj,*v,r   ,*Yj 
> ,*Yn"))]
>"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
>  {
>switch (get_attr_type (insn))
> @@ -1896,6 +1964,8 @@
> return "%vmovq\t{%1, %0|%0, %1}";
>   case MODE_TI:
> return "%vmovdqa\t{%1, %0|%0, %1}";
> + case MODE_XI:
> +   return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
>  
>   case MODE_V2SF:
> gcc_assert (!TARGET_AVX);
> @@ -1989,7 +2059,11 @@
>   (cond [(eq_attr "alternative" "2")
> (const_string "SI")
>   (eq_attr "alternative" "12,13")
> -   (cond [(ior (not (match_test "TARGET_SSE2"))
> +   (cond [(ior (match_test "EXT_REX_SSE_REGNO_P (REGNO 
> (operands[0]))")
> +   (and (match_test "REG_P (operands[1])")
> +(match_test "EXT_REX_SSE_REGNO_P (REGNO 
> (operands[1]))")))
> +(const_string "XI")
> +  (ior (not (match_test "TARGET_SSE2"))
> (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
>  (const_string "V4SF")
>(match_test "TARGET_AVX")

Better.  And while it produces the correct results, using match_operand would
be better than embedding a reference to operands within a match_test.

But since I don't want to see another 2000 line patch, I'd like you to address
this with a followup as well.

The patch is ok to commit.


r~


v2 of GDB hooks for debugging GCC

2013-08-21 Thread David Malcolm
On Mon, 2013-08-05 at 08:26 -0600, Tom Tromey wrote:
> > "David" == David Malcolm  writes:
> 
> David> GDB 7.0 onwards supports hooks written in Python to improve the
> David> quality-of-life within the debugger.  The best known are the
> David> pretty-printing hooks [1], which we already use within libstdc++ for
> David> printing better representations of STL containers.
> 
> Nice!
Thanks.

> A few suggestions.
> 
> David>   (note how the rtx_def* is printed inline.  This last one is actually 
> a
> David> kludge; all the other pretty-printers work inside gdb, whereas this one
> David> injects a call into print-rtl.c into the inferior).
> 
> Naughty.
We chatted about this at Cauldron; I haven't yet had a chance to
implement the magic bullet approach we discussed there.  In the
meantime, is there a API I can call to determine how safe this kludge
is?

> David>   * it hardcoded values in a few places rather than correctly looking 
> up
> David> enums
> 
> If you have a new-enough gdb (I don't recall the exact version -- I can
> look if you want, but recall that gcc changes mean that gcc developers
> generally have to use a very recent gdb) you can use
> gdb.types.make_enum_dict to get this very easily.

Thanks, I've rewritten to use these; works great on this box (with
gdb-7.4.50.20120120-54.fc17.x86_64 fwiw).


> David> You may see a message from gdb of the form:
> David>   cc1-gdb.py auto-loading has been declined by your `auto-load 
> safe-path'
> David> as a protection against untrustworthy python scripts.  See
> David>   
> http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html
> 
> I think you could set up the safe-path in the gcc .gdbinit.

Interesting idea - but .gdbinit itself seems to get declined, so I don't
think this can help.

> David> Note that you can suppress pretty-printers using /r (for "raw"):
> David>   (gdb) p /r pass
> David>   $3 = (opt_pass *) 0x188b600
> David> and dereference the pointer in the normal way:
> David>   (gdb) p *pass
> David>   $4 = {type = RTL_PASS, name = 0x120a312 "expand",
> David>   [etc, ...snipped...]
> 
> I just wanted to clarify here that you can "p *pass" *without* first
> using "p/r".  Pretty-printing applies just to printing -- it does not
> affect what is in the value history.  The values there still have the
> correct type and such.

I've reversed the order of these in the docstring to make this more
clear.


> David> def pretty_printer_lookup(gdbval):
> [...]
> 
> David> def register (obj):
> David> if obj is None:
> David> obj = gdb
> 
> David> # Wire up the pretty-printer
> David> obj.pretty_printers.append(pretty_printer_lookup)
> 
> It's better to use the gdb.printing facilities now.  These let user
> disable pretty-printers if they prefer.  The libstdc++ printers go out
> of their way to use gdb.printing only if available; but you can probably
> just assume it exists.

I initially tried using gdb.printing.RegexpCollectionPrettyPrinter, with
code like this:

pp.add_printer('rtx_def', r'^rtx_def \*$', RtxPrinter)

but it didn't work.  On debugging, I note that an "rtx_def*" has a
pointer type, and hence this code in
RegexpCollectionPrettyPrinter.__call__ fails:

# Get the type name.
typename = gdb.types.get_basic_type(val.type).tag
if not typename:
return None

since the basic_type has a None tag.

So I implemented a similar scheme, with all the prettyprinters in a
top-level "gcc" holder, but doing precise string matching on the
"unqualified" type, like in the original patch.

This works as before, and presumably works with the pretty-printer
management facilities; running this gives sane-looking output:

(gdb) info pretty-printer 
  objfile /home/david/coding/gcc-python/gcc-git-prettyprinters/build/gcc/cc1 
pretty-printers:
  gcc
basic_block
cgraph_node
edge
gimple
opt_pass
rtx_def
tree
  objfile /lib64/libstdc++.so.6 pretty-printers:
  libstdc++-v6
__gnu_cxx::_Slist_iterator
__gnu_cxx::__7::_Slist_iterator
---Type  to continue, or q  to quit---

How would one go about toggling the enabledness of a prettyprinter?  Is
this something that can only be done from python?

> David> print('Successfully loaded GDB hooks for GCC')
> 
> I wonder whether gdb ought to do this.

FWIW given all of the different gdb and python builds, the vagaries of
getting the hooks into the correct location for autoload to work,
setting up auto-load safe-path, and ensuring that the script actually
parses and runs to completion, I've found it very useful to have an
explicit "yes I'm working" message like this at the end of such scripts,
especially when dealing with bug reports: it's very useful to know
whether or not this message was printed when diagnosing problems
reported by 3rd parties.

I did see references to gdb.parameter("verbose") in gdb.printing - how
would one go about setting this?  (then again, the final print is a good
sanity check, for the reasons

[wide-int] Fix rtl-checking build error

2013-08-21 Thread Richard Sandiford
Hit this building with --enable-checking=yes,rtl,df.  Seemed obvious
enough so I went ahead and applied it.  The definition in rtl.c was
already OK.

Richard


gcc/
* rtl.h (hwivec_check_failed_bounds): Fix prototype.

Index: gcc/rtl.h
===
--- gcc/rtl.h   (revision 201904)
+++ gcc/rtl.h   (working copy)
@@ -711,7 +711,7 @@
 ATTRIBUTE_NORETURN;
 extern void rtl_check_failed_block_symbol (const char *, int, const char *)
 ATTRIBUTE_NORETURN;
-extern void hwivec_check_failed_bounds (const_rtvec, int, const char *, int,
+extern void hwivec_check_failed_bounds (const_hwivec, int, const char *, int,
const char *)
 ATTRIBUTE_NORETURN;
 extern void rtvec_check_failed_bounds (const_rtvec, int, const char *, int,




Re: [PATCH, vtv update] Change fixed size array to a vector; fix diagnostic messages.

2013-08-21 Thread Caroline Tice
Ping? Ping?

On Wed, Aug 14, 2013 at 12:14 PM, Caroline Tice  wrote:
> Ping?
>
> On Thu, Aug 8, 2013 at 3:16 PM, Caroline Tice  wrote:
>> This patch replaces the fixed sized array that was holding vtable
>> pointers for a particular class hierarchy with a vector, allowing for
>> dynamic resizing.  It also fixes issues with the warning diagnostics.
>> I am in the process of running regression tests with this patch;
>> assuming they all pass, is this patch OK to commit?
>>
>> -- Caroline Tice
>> cmt...@google.com
>>
>> 2013-08-08  Caroline Tice  
>>
>> * vtable-class-hierarchy.c: Remove unnecessary include statements.
>> (MAX_SET_SIZE): Remove unnecessary constant.
>> (register_construction_vtables):  Make vtable_ptr_array parameter
>> into a vector; remove num_args parameter. Change array accesses to
>> vector accesses.
>> (register_other_binfo_vtables): Ditto.
>> (insert_call_to_register_set): Ditto.
>> (insert_call_to_register_pair): Ditto.
>> (output_set_info):  Ditto.  Also change warning calls to warning_at
>> calls, and fix format of warning messages.
>> (register_all_pairs): Change vtbl_ptr_array from an array into a
>> vector.  Remove num_vtable_args (replace with calls to vector 
>> length).
>> Change array stores & accesses to vector functions. Change calls to
>> register_construction_vtables, register_other_binfo_vtables,
>> insert_call_to_register_set, insert_call_to_register_pair and
>> output_set_info to match their new signatures.  Change warning to
>> warning_at and fix the format of the warning message.


[wide-int] Fix signed vs. unsigned warning/error in mips.c

2013-08-21 Thread Richard Sandiford
In trunk this is a comparison between signed values, but uhwi probably
does make more sense here.

Tested on mips64-linux-gnu, where it fixes the build, and applied.

Richard


gcc/
* config/mips/mips.c (r10k_safe_mem_expr_p): Fixed signed vs.
unsigned warning.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  (revision 201909)
+++ gcc/config/mips/mips.c  (working copy)
@@ -14902,7 +14902,7 @@
a link-time-constant address.  */
 
 static bool
-r10k_safe_mem_expr_p (tree expr, HOST_WIDE_INT offset)
+r10k_safe_mem_expr_p (tree expr, unsigned HOST_WIDE_INT offset)
 {
   HOST_WIDE_INT bitoffset, bitsize;
   tree inner, var_offset;
@@ -14915,7 +14915,7 @@
 return false;
 
   offset += bitoffset / BITS_PER_UNIT;
-  return offset >= 0 && offset < tree_to_uhwi (DECL_SIZE_UNIT (inner));
+  return offset < tree_to_uhwi (DECL_SIZE_UNIT (inner));
 }
 
 /* A for_each_rtx callback for which DATA points to the instruction


Re: v2 of GDB hooks for debugging GCC

2013-08-21 Thread Tom Tromey
> "David" == David Malcolm  writes:

Tom> Naughty.

David> We chatted about this at Cauldron; I haven't yet had a chance to
David> implement the magic bullet approach we discussed there.  In the
David> meantime, is there a API I can call to determine how safe this kludge
David> is?

Not right now.  You can just call the function and catch the exception
that occurs if it can't be done.

I think you can still run into trouble sometimes.  For example if the
user puts a breakpoint in one of the functions used by the
pretty-printer, and then does "bt", hitting the breakpoint while
printing the backtrace... not sure what happens then, maybe a crash.

Tom> I think you could set up the safe-path in the gcc .gdbinit.

David> Interesting idea - but .gdbinit itself seems to get declined, so I don't
David> think this can help.

Haha, I didn't think of that :-)

David> So I implemented a similar scheme, with all the prettyprinters in a
David> top-level "gcc" holder, but doing precise string matching on the
David> "unqualified" type, like in the original patch.

David> This works as before, and presumably works with the pretty-printer
David> management facilities; running this gives sane-looking output:
[...]

Nice.

David> How would one go about toggling the enabledness of a prettyprinter?  Is
David> this something that can only be done from python?

You can use "enable pretty-printer" and "disable pretty-printer".

David> I did see references to gdb.parameter("verbose") in gdb.printing
David> - how would one go about setting this?

"set verbose on"

I think few people use this setting though; probably best to do what
you're doing now.

David> +# Convert "enum tree_code" (tree.def and tree.h) to a dict:
David> +tree_code_dict = gdb.types.make_enum_dict(gdb.lookup_type('enum 
tree_code'))

One other subtlety is that this doesn't interact well with all kinds of
uses of gdb.  For example if you have a running gdb, then modify enum
tree_code and rebuild, then the pretty-printers won't notice this
change.

I guess it would be nice if we had pre-built caches for this kind of
this available upstream.  But meanwhile, if you care, you can roll your
own using events to notice when to invalidate data.

David> +def __call__(self, gdbval):
David> +type_ = gdbval.type.unqualified()
David> +str_type_ = str(type_)

FWIW I think for RegexpCollectionPrettyPrinter you could write a
subclass whose __call__ first dereferenced a pointer, then called
super's __call__.  But your approach is just fine.

Tom


Re: [Patch, Fortran, OOP] PR 58185: [4.8/4.9 Regression] ICE when selector in SELECT TYPE is non-polymorphic

2013-08-21 Thread Mikael Morin
Le 19/08/2013 13:38, Janus Weil a écrit :
> Hi all,
> 
> here is a small patch which does some cleanup to avoid an ICE on
> invalid SELECT TYPE statements.
> 
> The first three hunks are just cosmetics, and the fourth one also
> contains minor refactoring, where I pull some common code out of the
> two branches of an if statement. The important part, however, is that
> I prevent gfc_build_class_symbol from being called if no type symbol
> is available.
> 
> Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8?
> 

> +  else if (selector->ts.u.derived)

Hum, accessing ts.u.derived is correct only if selector->ts.type is
BT_DERIVED or BT_CLASS, isn't it?  Thus the condition should probably be
else if (selector->ts.type == BT_DERIVED) as the BT_CLASS was handled
before?  OK with that change (if it works).  Thanks.

Mikael







Re: RFC - Refactor tree.h

2013-08-21 Thread Mike Stump
On Aug 21, 2013, at 10:23 AM, Andrew MacLeod  wrote:
> On 08/10/2013 06:03 AM, Richard Biener wrote:
>> Mike Stump  wrote:
>>> On Aug 9, 2013, at 3:36 PM, Diego Novillo  wrote:
 This patch is still WIP.  It builds stage1, but I'm getting ICEs
 during stage 2.
 
 The patch splits tree.h into three files:
 
 - tree-core.h: All data structures, enums and typedefs from
  tree.h
 
 - tree-api.h: All extern function definitions from tree.h
 
 - tree-macros.h: All macro accessors, tree checks and other
  inline functions.
>>> I don't like this split.  You focus in on the details and sort code by
>>> detail.  I think this is wrong.  I want code sorted by the features and
>>> functions it provides, and all like this, go into explainable
>>> functional bins.  One day, a function might be implemented by a data
>>> structure, the next day, by a routine, or a macro, or an inline
>>> function, the form of it doesn't matter.
>> I mostly agree - tree-macros.h is a red herring. It should be tree-core.h 
>> and tree.h only. What does not belong there should go to other appropriate 
>> places, like fold-const.h for example.
>> 
> 
> the reason for the tri-split is because many of the "appropriate" places for 
> function prototypes don't exist yet...  they were thrown in tree.h or 
> tree-flow.h because no one wanted to create a small header file as an 
> appropriate place, or was too lazy to figure it out.

Ok.  But creating a new bad place for them to live, just to eject them from 
tree.h, is wasteful.  Design a better spot that is good, then make it happen.  
tree-api.h isn't good.  It is just another random stop-gap; let's live with the 
tree.h random stop-gap.  When you are done with stop gaps, then move them.

> The idea here is to get all those into a file of their own so they can then 
> be dealt with later, but not impact the code base much. They don't need any 
> of the tree accessor macros, nor even the tree structural content, just the 
> "struct tree *" from core-types. . This means the tree-api.h file can be 
> included by tree.h for untouched files, and can also be included from 
> gimple.h for those which have been converted and no longer include tree.h.
> Leaving them in tree.h defeats the purpose of the split since tree.h would 
> have to be included by files using gimple.h in order to see the prototypes.. 
> and that would then bring in all the tree macros again.

No.  tree-core.h has the api and the data structures.  gimple.h uses this, as 
it wants all in it.  gimple.h doesn't use tree.h, as it doesn't want the macro 
and any other content you want to exclude.  This is two use cases, so, two 
files.

> So really, the content of tree-macro.h should be called tree.h,

Yes.

> that should include the tree-core.h file as well as the tree-api.h file.

No…  I don't see a client for tree-api.  Logically, you can segregate them down 
at the bottom of tree-core.h.

> We are then left with this tree-api.h file which will be included from 2 
> places.. tree.h and gimple.h.   As files are converted to the new gimple 
> form, any functions which use to have 'tree' in the prototype are going to be 
> converted to GimpleType or whatever, and the tree prototype(s) will be 
> removed from tree-api.h.  At that point, the prototype(s) will be put in an 
> appropriate header file, creating one if need be, and included as needed.

One can move them from tree-core.h or to any other home as appropriate.

Re: [patch] Adjust DECL_NAME of virtual clones

2013-08-21 Thread Jan Hubicka
> > We already support the .blahblah syntax in the demangler in d_print_comp.
> > (I think it deserve to be extended to special case some newly introduced
> > mess, like .ltopriv and .local_alias)
> > I wonder if we don't want to get the mangling here to be consistent, so
> > users are facing the same type of obscurity?
> 
> IOW you'd prefer using the period as concatenation character here?  Fine with 
> me as well, I only care about the uniform side.

I think demangler will output BLAH.xyz as BLAH [clone xyz]
> 
> > Otherwise the patch seems resonable to me.   With LTO we will still get the
> > ugly assembler names right?
> 
> I'm afraid so, but that's a very minor caveat.

Yep, i guess we will care about this later.

Honza
> 
> -- 
> Eric Botcazou


Re: [RFC] Old school parallelization of WPA streaming

2013-08-21 Thread Jan Hubicka
> >
> >One risk is if someone streams to a spinning disk it may add more seeks
> >for 
> >the parallel IO. But I think it's a reasonable tradeoffs.
> 
> It'll also wreck all WPA dump files.

We do not dump anything during the main streaming.  If we now stream 2GB for 
firefox,
I think we can hope to mostly fit in cache with the whole machinery.

We will need to flush cgraph file prior forking and close it in forked process.
It is only one that remains cross fork boundary IMO.
> 
> >We should also use a faster compressor
> 
> And we should avoid uncompressing the function sections...

Yep, we also need to avoid carring whole tree stream of the original source
unit whenever we stream out function from it.  I think function sections should
have two parts - the references to global trees that is uncompressed and
transleted during WPA streaming plus compressed binary blob with the body that
is copied over.
> 
> That said, the patch is enough of a hack that I don't ever want to debug a 
> bug in it
> 
> I also fail to see why threads should not work here.  Maybe simply annotate 
> gcc with openmp?

It means pushing global state of lto-streamer into a context variable + moving
it out of GGC or making GGC thread safe.  I would hope that David Malcolm would
be interested in doing this, but it is bit more I have time for right now during
the labs conference.

To be honest I fail to see how bug in openmp annotated program would be easier
to debug than the fork variant.

Honza


Re: [RFC] Old school parallelization of WPA streaming

2013-08-21 Thread Jan Hubicka
> 
> We should also use a faster compressor

Yep, at least once it arrives higher in profiles. So far other stuff is a lot 
slower.
> 
> > For -flto=jobserver I simply fork all 32 processes.  It may not be a 
> > disaster,
> > but perhaps we should figure out how to communicate with jobserver.  At 
> > first
> > glance on document on how it works, it seems easy to add. Perhaps we can 
> > even
> > convicne GNU Make folks to put simple helpers to libiberty?
> 
> lto=jobserver is still broken and confuses tokens on large builds (ends
> with a 0 read) I did some debugging recently, and I suspect a Linux kernel
> bug now. Still haven't tracked it down.
> 
> Any workarounds would need make changs unfortunately.
> 
> > 
> > We also may figure out number of CPUs (is it available i.e. from libgomp)
> 
> sysconf(_SC_NPROCESSORS_ONLN) ? 

OK, thanks :)

> >  
> > +fparallelism=
> > +LTO Joined
> > +Run the link-time optimizer in whole program analysis (WPA) mode.
> 
> The description does not make sense

Yup, a psto.
> 
> Rest of patch looks good from a quick read, although I would prefer to 
> do the waiting for children in the "parent", not the "last one"

The parent process does all the forking + waiting.  Only the last section
is streamed by the parent process since I do not see reason for forking for
it.

Honza


ipa-devirt.c TLC

2013-08-21 Thread Jan Hubicka
Hi,
this patch fixes some issues with ipa-devirt I noticed since the initial
commit.

 1) I added fixes and comments Doji suggested (thanks!)

 2) During the callgraph construction new external nodes may be created that
needs to be added into the inheritance graph get more complette andwers
from possible_polymorphic_call_targets.

While the answer is never complette for external functions, it is better
to have more of them than fewer so speculation is not trying to work
too hard.

I added function that collects all new nodes into the graph.

 3) I merged in flag for anonymous_namespace types, but forgot to initialize
it (it is used to decide if set of targets is complette and I am not
using that flag in mainline ATM)

 4) Added possible_polymorphic_call_target_p that can be used for sanity 
checking
of devirtualization machinery.  The testing now passes firefox and couple
other testcases, but I will submit the verification bits separately of this
change.

 5) -fdump-ipa-type-inheritance ICEs when there are no polynmorphic types.

 6) I managed to get call target cache wrong in a way that is is never freed
on node removal as it is supposed to.

Bootstrapped/regtested ppc64-linux, will commit it tomorrow if there are no
complains and x86_64-linux testing completes.

Honza

* cgraphunit.c (analyze_functions) Use update_type_inheritance_graph.
* ipa-utils.h (update_type_inheritance_graph): Declare.
(possible_polymorphic_call_target_p): Declare.
(possible_polymorphic_call_target_p): New.
* ipa-devirt.c: Update toplevel comments.
(cached_polymorphic_call_targets): Move up.
(odr_type_d): Move ID down.
(polymorphic_type_binfo_p): Update comment.
(odr_hasher::remove): Likewise;
(get_odr_type): Set anonymous_namespace.
(dump_odr_type): Dump it.
(dump_type_inheritance_graph): Do not ICE when there are no ODR types.
(maybe_record_node): Record node in cached_polymorphic_call_targets.
(record_binfo): Add comment.
(free_polymorphic_call_targets_hash): Do not ICE when cache is not 
built.
(devirt_node_removal_hook): Do not iCE when cache is freed.
(possible_polymorphic_call_target_p): New predicate.
(update_type_inheritance_graph): New function.
Index: cgraphunit.c
===
--- cgraphunit.c(revision 201899)
+++ cgraphunit.c(working copy)
@@ -976,6 +976,8 @@ analyze_functions (void)
   cgraph_process_new_functions ();
}
 }
+  if (optimize && flag_devirtualize)
+update_type_inheritance_graph ();
 
   /* Collect entry points to the unit.  */
   if (cgraph_dump_file)
Index: ipa-utils.h
===
--- ipa-utils.h (revision 201899)
+++ ipa-utils.h (working copy)
@@ -50,12 +50,15 @@ tree get_base_var (tree);
 struct odr_type_d;
 typedef odr_type_d *odr_type;
 void build_type_inheritance_graph (void);
+void update_type_inheritance_graph (void);
 vec 
 possible_polymorphic_call_targets (tree, HOST_WIDE_INT,
   bool *final = NULL,
   void **cache_token = NULL);
 odr_type get_odr_type (tree, bool insert = false);
 void dump_possible_polymorphic_call_targets (FILE *, tree, HOST_WIDE_INT);
+bool possible_polymorphic_call_target_p (tree, HOST_WIDE_INT,
+struct cgraph_node *n);
 
 /* Return vector containing possible targets of polymorphic call E.
If FINALP is non-NULL, store true if the list is complette. 
@@ -87,6 +90,17 @@ dump_possible_polymorphic_call_targets (
   dump_possible_polymorphic_call_targets (f, e->indirect_info->otr_type,
  e->indirect_info->otr_token);
 }
+
+/* Return true if N can be possibly target of a polymorphic call of
+   E.  */
+
+inline bool
+possible_polymorphic_call_target_p (struct cgraph_edge *e,
+   struct cgraph_node *n)
+{
+  return possible_polymorphic_call_target_p (e->indirect_info->otr_type,
+e->indirect_info->otr_token, n);
+}
 #endif  /* GCC_IPA_UTILS_H  */
 
 
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 201899)
+++ ipa-devirt.c(working copy)
@@ -37,10 +37,10 @@ along with GCC; see the file COPYING3.
  names and types are the same.
 
  OTR = OBJ_TYPE_REF
-   This is Gimple representation of type information of a polymorphic call.
+   This is the Gimple representation of type information of a polymorphic 
call.
It contains two parameters:
 otr_type is a type of class whose method is called.
-otr_token is index into virtual table where address is taken.
+otr_token is the index into virtual table where 

Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-08-21 Thread Mike Stump
On Aug 1, 2013, at 8:09 AM, Jan Hubicka  wrote:
> Now when we have abstract origins tracked, this patch makes DECL_ARGUMENTS and
> DECL_RESULT to be removed from FUNCTION_DECLs that are never passed to symbol
> table.  This reduces LTO streaming effort (by about 1/3rd of PARM_DECls)

So, I was tracking down an lto failure in the C++ test suite, 
g++.dg/ipa/pr46984.C, and it appears to be caused by 
4df870fdeec85907db3dcabf1992cf8b63e1d562 aka 
svn+ssh://gcc.gnu.org/svn/gcc/trunk@201468.  I was trying to find the 
gcc-patches email for this work, but could not.  :-(  Anyway, the above is the 
closest work to that, that I can find.

The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) is 
used during thunk code-generation, and thunk code-generation happens during the 
output of D::foo.

My port is a targetm.asm_out.can_output_mi_thunk false port.

RESULT_DECL is synthesized on the fly like this:

  /* Build the return declaration for the function.  */
  restype = TREE_TYPE (TREE_TYPE (thunk_fndecl));
  if (DECL_RESULT (thunk_fndecl) == NULL_TREE)
{
  resdecl = build_decl (input_location, RESULT_DECL, 0, restype);
  DECL_ARTIFICIAL (resdecl) = 1;
  DECL_IGNORED_P (resdecl) = 1;
  DECL_RESULT (thunk_fndecl) = resdecl;
}
  else
resdecl = DECL_RESULT (thunk_fndecl);

in expand_thunk, but DECL_ARGUMENTS is not.  Either it needs to be, or it has 
to be saved and restored in some fashion.

input_function is never called for the thunk.  If it had been, then it would 
have worked I think.  I think this translates into output_function not being 
called on the thunk.

Anyway, the core dump looks like:

In member function 'non-virtual thunk to D::foo()':
lto1: internal compiler error: Segmentation fault
0x995d3f crash_signal
   ../../gcc/gcc/toplev.c:335
0x6a5828 contains_struct_check
   ../../gcc/gcc/tree.h:3804
0x6a5828 fold_build_pointer_plus_hwi_loc
   ../../gcc/gcc/tree.h:5871
0x6a5828 thunk_adjust
   ../../gcc/gcc/cgraphunit.c:1240
0x6a6ffd expand_thunk(cgraph_node*)
   ../../gcc/gcc/cgraphunit.c:1440
0x6a799d assemble_thunks_and_aliases
   ../../gcc/gcc/cgraphunit.c:1549
0x6a7a32 assemble_thunks_and_aliases
   ../../gcc/gcc/cgraphunit.c:1565
0x6a7bce expand_function
   ../../gcc/gcc/cgraphunit.c:1675
0x6a983c expand_all_functions
   ../../gcc/gcc/cgraphunit.c:1717
0x6a983c compile()
   ../../gcc/gcc/cgraphunit.c:2054
0x62ccd7 lto_main()
   ../../gcc/gcc/lto/lto.c:3843

It appears that 

> 
> Bootstrapped/regtested ppc64-linux, will commit it after further testing on 
> x86_64-linux.
> 
> Honza
> 
>   * cgraph.h (release_function_body): Declare.
>   * tree.c (free_lang_data_in_decl): Free, parameters and return values
>   of unused delcarations.
> Index: cgraph.h
> ===
> --- cgraph.h  (revision 201408)
> +++ cgraph.h  (working copy)
> @@ -606,6 +606,7 @@ void debug_cgraph_node (struct cgraph_no
> void cgraph_remove_edge (struct cgraph_edge *);
> void cgraph_remove_node (struct cgraph_node *);
> void cgraph_release_function_body (struct cgraph_node *);
> +void release_function_body (tree);
> void cgraph_node_remove_callees (struct cgraph_node *node);
> struct cgraph_edge *cgraph_create_edge (struct cgraph_node *,
>   struct cgraph_node *,
> Index: tree.c
> ===
> --- tree.c(revision 201367)
> +++ tree.c(working copy)
> @@ -4886,6 +4886,20 @@ free_lang_data_in_decl (tree decl)
> 
>  if (TREE_CODE (decl) == FUNCTION_DECL)
> {
> +  struct cgraph_node *node;
> +  if (!(node = cgraph_get_node (decl))
> +   || (!node->symbol.definition && !node->clones))
> + {
> +   if (node)
> + cgraph_release_function_body (node);
> +   else
> + {
> +   release_function_body (decl);
> +   DECL_ARGUMENTS (decl) = NULL;
> +   DECL_RESULT (decl) = NULL;
> +   DECL_INITIAL (decl) = error_mark_node;
> + }
> + }
>   if (gimple_has_body_p (decl))
>   {
> tree t;



Re: [Patch] Fix empty grouping problem in regex

2013-08-21 Thread Tim Shen
Change vector::at() call to vector::operator[]().

Booted and tested with -m32, -m64 and -m64 debug-mode under x86_64
GNU/Linux and committed(r201914).


-- 
Tim Shen


powerpc64le multilibs and multiarch dir

2013-08-21 Thread Alan Modra
This patch corrects the powerpc64le-linux multiarch directory, adds
opposite-endian multilibs, and chooses non-multiarch os dirs for the
new multilibs.

For multiarch, powerpc64le-linux now will use powerpc64le-linux-gnu.
Given a typical big-endian native toolchain with os dirs /lib and
/lib64, we'll use /lible and /lib64le if supporting little-endian as
well.  If you happen to use /lib and /lib32, then the little-endian
variants are /lible and /lib32le.  For completeness I also support
building big-endian multilibs on a little-endian host.

All of this is done with a dose of "make" string manipulation
functions guaranteed to make your eye glaze over, but there are just
too many combinations of different configurations to simply
enumerate them all.  I use ':=' assignment for the multilib make
variables because you can't have a recursively evaluated variable
(ie. one assigned with '=') reference itself, as is done with
MULTILIB_OSDIRNAMES, and we may as well use the same for all the
multilib vars.

I also remove fPIC from MULTILIB_EXTRA_OPTS because -fPIC is already
added where necessary by the library Makefiles.  Also,
MULTILIB_EXTRA_OPTS is only for options that apply to the non-default
multilibs.  It isn't used for the default multilib, so this isn't the
place to put options common to all multilibs.  (I've been building
gcc with this particular change for years.)

Bootstrapped and regression tested powerpc64-linux.  OK for mainline
and 4.8?

* config.gcc (powerpc*-*-linux*): Add support for little-endian
multilibs to big-endian target and vice versa.
* config/rs6000/t-linux64: Use := assignment on all vars.
(MULTILIB_EXTRA_OPTS): Remove fPIC.
(MULTILIB_OSDIRNAMES): Specify using mapping from multilib_options.
* config/rs6000/t-linux64le: New file.
* config/rs6000/t-linux64bele: New file.
* config/rs6000/t-linux64lebe: New file.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 201834)
+++ gcc/config.gcc  (working copy)
@@ -2139,7 +2139,7 @@
tmake_file="rs6000/t-fprules rs6000/t-ppcos ${tmake_file} 
rs6000/t-ppccomm"
case ${target} in
powerpc*le-*-*)
-   tm_file="${tm_file} rs6000/sysv4le.h" ;;
+   tm_file="${tm_file} rs6000/sysv4le.h" ;;
esac
maybe_biarch=yes
case ${target} in
@@ -2162,6 +2162,19 @@
fi
tm_file="rs6000/biarch64.h ${tm_file} rs6000/linux64.h 
glibc-stdint.h"
tmake_file="$tmake_file rs6000/t-linux64"
+   case ${target} in
+   powerpc*le-*-*)
+   tmake_file="$tmake_file rs6000/t-linux64le"
+   case ${enable_targets} in
+   all | *powerpc64-* | *powerpc-*)
+   tmake_file="$tmake_file rs6000/t-linux64lebe" ;;
+   esac ;;
+   *)
+   case ${enable_targets} in
+   all | *powerpc64le-* | *powerpcle-*)
+   tmake_file="$tmake_file rs6000/t-linux64bele" ;;
+   esac ;;
+   esac
extra_options="${extra_options} rs6000/linux64.opt"
;;
*)
Index: gcc/config/rs6000/t-linux64
===
--- gcc/config/rs6000/t-linux64 (revision 201834)
+++ gcc/config/rs6000/t-linux64 (working copy)
@@ -25,8 +25,8 @@
 # it doesn't tell anything about the 32bit libraries on those systems.  Set
 # MULTILIB_OSDIRNAMES according to what is found on the target.
 
-MULTILIB_OPTIONS= m64/m32
-MULTILIB_DIRNAMES   = 64 32
-MULTILIB_EXTRA_OPTS = fPIC
-MULTILIB_OSDIRNAMES= ../lib64$(call if_multiarch,:powerpc64-linux-gnu)
-MULTILIB_OSDIRNAMES+= $(if $(wildcard $(shell echo 
$(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call 
if_multiarch,:powerpc-linux-gnu)
+MULTILIB_OPTIONS:= m64/m32
+MULTILIB_DIRNAMES   := 64 32
+MULTILIB_EXTRA_OPTS := 
+MULTILIB_OSDIRNAMES := m64=../lib64$(call if_multiarch,:powerpc64-linux-gnu)
+MULTILIB_OSDIRNAMES += m32=$(if $(wildcard $(shell echo 
$(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call 
if_multiarch,:powerpc-linux-gnu)
Index: gcc/config/rs6000/t-linux64le
===
--- gcc/config/rs6000/t-linux64le   (revision 0)
+++ gcc/config/rs6000/t-linux64le   (revision 0)
@@ -0,0 +1,3 @@
+#rs6000/t-linux64le
+
+MULTILIB_OSDIRNAMES = $(subst -linux,le-linux,$(MULTILIB_OSDIRNAMES))
Index: gcc/config/rs6000/t-linux64bele
===
--- gcc/config/rs6000/t-linux64bele (revision 0)
+++ gcc/config/rs6000/t-linux64bele (revision 0)
@@ -0,0 +1,7 @@
+#rs6000/t-linux64end
+
+MULTILIB_OPTIONS+= mlittle
+MULTILIB_DIRNAMES   += le

Re: [PATCH i386 1/8] [AVX512] Adjust register classes.

2013-08-21 Thread Kirill Yukhin
Hello,

> The patch is ok to commit.

Thanks a lot! Checked in to main trunk: 
http://gcc.gnu.org/ml/gcc-cvs/2013-08/msg00524.html

--
K


Re: Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

2013-08-21 Thread Jan Hubicka
> So, I was tracking down an lto failure in the C++ test suite, 
> g++.dg/ipa/pr46984.C, and it appears to be caused by 
> 4df870fdeec85907db3dcabf1992cf8b63e1d562 aka 
> svn+ssh://gcc.gnu.org/svn/gcc/trunk@201468.  I was trying to find the 
> gcc-patches email for this work, but could not.  :-(  Anyway, the above is 
> the closest work to that, that I can find.

I definitely wrote email with the description and I even remember Richard's 
reply on this.
I will dig it out.

> 
> The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) is 
> used during thunk code-generation, and thunk code-generation happens during 
> the output of D::foo.
> 
> My port is a targetm.asm_out.can_output_mi_thunk false port.
> 
> RESULT_DECL is synthesized on the fly like this:
> 
>   /* Build the return declaration for the function.  */
>   restype = TREE_TYPE (TREE_TYPE (thunk_fndecl));
>   if (DECL_RESULT (thunk_fndecl) == NULL_TREE)
> {
>   resdecl = build_decl (input_location, RESULT_DECL, 0, restype);
>   DECL_ARTIFICIAL (resdecl) = 1;
>   DECL_IGNORED_P (resdecl) = 1;
>   DECL_RESULT (thunk_fndecl) = resdecl;
> }
>   else
> resdecl = DECL_RESULT (thunk_fndecl);
> 
> in expand_thunk, but DECL_ARGUMENTS is not.  Either it needs to be, or it has 
> to be saved and restored in some fashion.
> 
> input_function is never called for the thunk.  If it had been, then it would 
> have worked I think.  I think this translates into output_function not being 
> called on the thunk.

I see, I will try to modify i386 backend to not output thunks.  The problem
indeed is that thunks' arguments are built by the front-end and they are no
longer streamed.  I am surprised i386 survives this given that it also
produces some gimple thunks.

I guess easiest way around is to make them to be streamed same way as we stream
functions that are used as abstract origin.  I have different plans in this
direction - I want to lower thunks to gimple form early so they go through the
usual channel and get i.e. the profile read correctly.

I however recall some nasty issues with linker on Solaris triggering when the
relative order of thunk and function it is associated to in the assembly file
is changed.  So this probably should go in independently and with a care.

Honza