date:20131211

Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution

2013-12-11 Thread Jakub Jelinek

On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote:
> On 12/10/13 23:35, Bin.Cheng wrote:
> >I know little about GC, so when ggc_collect may be called (between two
> >passes)? If so, I have to call free_affine_expand_cache just after the
> >calling to tree_to_affine_combination_expand in SCEV because it's an
> >analyzer and as you pointed out, the analyzing result is used by
> >different optimizers.  The pointer map would be maintained between
> >different passes if it's not instantly freed.
> The garbage collector only runs between passes.  If you have roots
> in static storage, then  you'll need to decorate them so the garbage
> collector scans them.
> 
> There's a fairly extensive discussion of the garbage collector in
> the GCC internals manual.

It isn't that easy, pointer_map isn't a data structure recognized by the
garbage collector at all (plus, as I've said before, the code is inserting
XNEW allocated structures into the pointer_map, which isn't GC friendly
either, but making them GC allocated might increase garbage unnecessarily,
as all the structs are short lived).

I've looked through scev_{initialize,finalize} callers, and almost all
passes initialize it and finalize within the same pass, it is only the loop
optimizations:
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_dce_loop);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_check_data_deps);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_graphite);
  PUSH_INSERT_PASSES_WITHIN (pass_graphite)
  NEXT_PASS (pass_graphite_transforms);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_dce_loop);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_iv_canon);
  NEXT_PASS (pass_parallelize_loops);
  NEXT_PASS (pass_if_conversion);
  /* pass_vectorize must immediately follow pass_if_conversion.
 Please do not add any other passes in between.  */
  NEXT_PASS (pass_vectorize);
  PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
  NEXT_PASS (pass_dce_loop);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_predcom);
  NEXT_PASS (pass_complete_unroll);
  NEXT_PASS (pass_slp_vectorize);
  NEXT_PASS (pass_loop_prefetch);
  NEXT_PASS (pass_iv_optimize);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_tree_loop_done);
where scev is live in between the passes, and unfortunately there
is no call that each pass calls that you could easily add the free_affine.*
call to (except for execute_function_todo but you'd need to invent another
TODO_* flag for it and the question is how you'd ensure it will be handled
for the non-loop specific passes that are just run in between these
passes (say pass_copy_prop, would we need pass_copy_prop_loop?).

Perhaps you could replace in tree-affine.c and all it's users
{,struct} pointer_map_t * with
struct GTY((user)) affine_expand_cache {
  pointer_map_t *cache;
};
and write custom gt_ggc_mx (affine_expand_cache *) function that would
would either pointer_map_traverse the cache and for each cache entry
call gt_ggc_mx on the embedded trees in the structures, or drop the cache
(dunno if the cache is really just a cache and it doesn't matter if it is
dropped any time, or if it can affect code generation).  There are also PCH
related functions that need to be defined, though I guess you could just
assert there that the cache is NULL and not walk anything.

Jakub

[C++ PATCH] Fix GC related issues in C++ FE (PR c++/58627)

2013-12-11 Thread Jakub Jelinek

Hi!

fn_type_unification does:
  /* We can't free this if a pending_template entry or
   * last_error_tinst_level 
 
 is pointing at it.  */ 
   
  if (last_pending_template == old_last_pend
   
  && last_error_tinst_level == old_error_tinst) 
   
ggc_free (tinst);   
   
because sometimes tinst (and targs) is added to pending templates or
tinst level and thus still GC reachable, but often it isn't and it is
called often enough that it creates too much garbage.  But, the
resolve_address_of_overloaded_function caller also calls ggc_free (targs);
for similar reasons, but without checking if it might be stored somewhere.

The options I see:
1) drop the ggc_free (targs); call from
   resolve_address_of_overloaded_function altogether
2) remember and check last_pending_template/last_error_tinst_level
   also in resolve_address_of_overloaded_function for the same
   purposes as fn_type_unification does; the problem with that is
   that those vars are private to pt.c
3) what the patch does, add another parameter to fn_type_unification
   which will tell the caller whether it is safe to ggc_free targs or
   not
The following patch implements 3), and adds a bunch of ggc_free calls
to other callers of fn_type_unification when it has that info already.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-12-11  Jakub Jelinek  

PR c++/58627
* cp-tree.h (fn_type_unification): Add bool * argument.
* pt.c (fn_type_unification): Add targs_unused_p argument, set
what it points to to true if targs argument isn't stored anywhere.
(get_bindings): Adjust fn_type_unification caller.  If targs_unused_p,
ggc_free targs.
* call.c (add_template_candidate_real): Likewise.
(print_z_candidate): Likewise.
* class.c (resolve_address_of_overloaded_function): Don't ggc_free
targs if targs_unused_p is false.

--- gcc/cp/cp-tree.h.jj 2013-11-25 10:20:12.0 +0100
+++ gcc/cp/cp-tree.h2013-12-10 09:13:55.202703361 +0100
@@ -5495,7 +5495,7 @@ extern tree instantiate_template  (tree,
 extern tree fn_type_unification(tree, tree, tree,
 const tree *, unsigned int,
 tree, unification_kind_t, int,
-bool, bool);
+bool, bool, bool *);
 extern void mark_decl_instantiated (tree, int);
 extern int more_specialized_fn (tree, tree, int);
 extern void do_decl_instantiation  (tree, tree);
--- gcc/cp/call.c.jj2013-11-25 10:20:12.0 +0100
+++ gcc/cp/call.c   2013-12-10 09:19:20.034025608 +0100
@@ -2919,22 +2919,25 @@ add_template_candidate_real (struct z_ca
   gcc_assert (ia == nargs_without_in_chrg);
 
   errs = errorcount+sorrycount;
+  bool targs_unused_p;
   fn = fn_type_unification (tmpl, explicit_targs, targs,
args_without_in_chrg,
nargs_without_in_chrg,
return_type, strict, flags, false,
-   complain & tf_decltype);
+   complain & tf_decltype, &targs_unused_p);
 
   if (fn == error_mark_node)
 {
   /* Don't repeat unification later if it already resulted in errors.  */
-  if (errorcount+sorrycount == errs)
+  if (errorcount + sorrycount == errs)
reason = template_unification_rejection (tmpl, explicit_targs,
 targs, args_without_in_chrg,
 nargs_without_in_chrg,
 return_type, strict, flags);
   else
reason = template_unification_error_rejection ();
+  if (targs_unused_p)
+   ggc_free (targs);
   goto fail;
 }
 
@@ -3230,16 +3233,19 @@ print_z_candidate (location_t loc, const
}
  /* Re-run template unification with diagnostics.  */
  inform (cloc, "  template argument deduction/substitution failed:");
+ bool targs_unused_p;
+ tree targs;
+ targs = make_tree_vec (r->u.template_unification.num_targs);
  fn_type_unification (r->u.template_unification.tmpl,
-  r->u.template_unification.explicit_targs,
-  (make_tree_vec
-

[PATCH] Fix invalid tree sharing caused by the inliner (PR tree-optimization/59386)

2013-12-11 Thread Jakub Jelinek

Hi!

If id->retvar isn't a decl (can happen for DECL_BY_REFERENCE returns),
then we can end up with invalid tree sharing because we reuse it more than
once (on the following testcase id->retvar is a COMPONENT_REF).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2013-12-11  Jakub Jelinek  

PR tree-optimization/59386
* tree-inline.c (remap_gimple_stmt): If not id->do_not_unshare,
unshare_expr (id->retval) before passing it to gimple_build_assign.

* gcc.c-torture/compile/pr59386.c: New test.

--- gcc/tree-inline.c.jj2013-12-10 08:52:13.0 +0100
+++ gcc/tree-inline.c   2013-12-10 17:04:09.022069945 +0100
@@ -1273,7 +1273,9 @@ remap_gimple_stmt (gimple stmt, copy_bod
  || ! SSA_NAME_VAR (retval)
  || TREE_CODE (SSA_NAME_VAR (retval)) != RESULT_DECL)))
 {
- copy = gimple_build_assign (id->retvar, retval);
+ copy = gimple_build_assign (id->do_not_unshare
+ ? id->retvar : unshare_expr (id->retvar),
+ retval);
  /* id->retvar is already substituted.  Skip it on later remapping.  */
  skip_first = true;
}
--- gcc/testsuite/gcc.c-torture/compile/pr59386.c.jj2013-12-10 
17:06:39.389291052 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr59386.c   2013-12-10 
17:05:37.0 +0100
@@ -0,0 +1,24 @@
+/* PR tree-optimization/59386 */
+
+struct S { int s; };
+struct T { int t; struct S u; } c;
+int b;
+
+struct S
+foo ()
+{
+  struct T d;
+  if (b)
+while (c.t)
+  ;
+  else
+return d.u;
+}
+
+struct S
+bar ()
+{
+  struct T a;
+  a.u = foo ();
+  return a.u;
+}

Jakub

Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution

2013-12-11 Thread Bin.Cheng

On Wed, Dec 11, 2013 at 4:17 PM, Jakub Jelinek  wrote:
> On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote:
>> On 12/10/13 23:35, Bin.Cheng wrote:
>> >I know little about GC, so when ggc_collect may be called (between two
>> >passes)? If so, I have to call free_affine_expand_cache just after the
>> >calling to tree_to_affine_combination_expand in SCEV because it's an
>> >analyzer and as you pointed out, the analyzing result is used by
>> >different optimizers.  The pointer map would be maintained between
>> >different passes if it's not instantly freed.
>> The garbage collector only runs between passes.  If you have roots
>> in static storage, then  you'll need to decorate them so the garbage
>> collector scans them.
>>
>> There's a fairly extensive discussion of the garbage collector in
>> the GCC internals manual.
>
> It isn't that easy, pointer_map isn't a data structure recognized by the
> garbage collector at all (plus, as I've said before, the code is inserting
> XNEW allocated structures into the pointer_map, which isn't GC friendly
> either, but making them GC allocated might increase garbage unnecessarily,
> as all the structs are short lived).
>
> I've looked through scev_{initialize,finalize} callers, and almost all
> passes initialize it and finalize within the same pass, it is only the loop
> optimizations:
>   NEXT_PASS (pass_tree_loop_init);
>   NEXT_PASS (pass_lim);
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_dce_loop);
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_check_data_deps);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_graphite);
>   PUSH_INSERT_PASSES_WITHIN (pass_graphite)
>   NEXT_PASS (pass_graphite_transforms);
>   NEXT_PASS (pass_lim);
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_dce_loop);
>   POP_INSERT_PASSES ()
>   NEXT_PASS (pass_iv_canon);
>   NEXT_PASS (pass_parallelize_loops);
>   NEXT_PASS (pass_if_conversion);
>   /* pass_vectorize must immediately follow pass_if_conversion.
>  Please do not add any other passes in between.  */
>   NEXT_PASS (pass_vectorize);
>   PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
>   NEXT_PASS (pass_dce_loop);
>   POP_INSERT_PASSES ()
>   NEXT_PASS (pass_predcom);
>   NEXT_PASS (pass_complete_unroll);
>   NEXT_PASS (pass_slp_vectorize);
>   NEXT_PASS (pass_loop_prefetch);
>   NEXT_PASS (pass_iv_optimize);
>   NEXT_PASS (pass_lim);
>   NEXT_PASS (pass_tree_loop_done);
> where scev is live in between the passes, and unfortunately there
> is no call that each pass calls that you could easily add the free_affine.*
> call to (except for execute_function_todo but you'd need to invent another
> TODO_* flag for it and the question is how you'd ensure it will be handled
> for the non-loop specific passes that are just run in between these
> passes (say pass_copy_prop, would we need pass_copy_prop_loop?).
>
> Perhaps you could replace in tree-affine.c and all it's users
> {,struct} pointer_map_t * with
> struct GTY((user)) affine_expand_cache {
>   pointer_map_t *cache;
> };
> and write custom gt_ggc_mx (affine_expand_cache *) function that would
> would either pointer_map_traverse the cache and for each cache entry
> call gt_ggc_mx on the embedded trees in the structures, or drop the cache
> (dunno if the cache is really just a cache and it doesn't matter if it is
> dropped any time, or if it can affect code generation).  There are also PCH
> related functions that need to be defined, though I guess you could just
> assert there that the cache is NULL and not walk anything.

Thank both of you for being patient on this patch.
I went through the documentation quickly and realized that I have to
modify pointer-map structure to make it recognized by GC (maybe more
work suggested by Jakub).  It seems I shouldn't include that task in
this patch at this stage 3, I am thinking just call free_affine*
function in the place it is created for SCEV.  Of course, that makes
it more expensive.

I just found that the patch also fixes PR58296 on IVOPT and I do want
to include it in 4.9 if we are ok with it.  So what's your opinion?

Thanks,
bin
>
> Jakub



-- 
Best Regards.

[PATCH] Range info handling fixes (PR tree-optimization/59417)

2013-12-11 Thread Jakub Jelinek

Hi!

As discussed in the PR, range info as well as alignment info for pointers
can be position dependent, can be e.g. computed from VRP ASSERT_EXPRs,
so we can't blindly propagate it to SSA_NAMEs that the SSA_NAME with range
info has been set to.  The following patch limits that to SSA_NAMEs defined
within the same bb, in that case it should be safe.

Also, the patch makes determine_value_range more robust, insert of asserting
it will just punt and use solely range info of var, rather than also
considering ranges of PHI results.  If VR_UNDEFINED comes into play, range
info can be weird (of course on undefined behavior only code, but we
shouldn't ICE on it).

Either of these changes fixes the ICE on it's own.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-12-11  Jakub Jelinek  

PR tree-optimization/59417
* tree-ssa-copy.c (fini_copy_prop): If copy_of[i].value is defined
in a different bb rhan var, only duplicate points-to info and
not alignment info and don't duplicate range info.
* tree-ssa-loop-niter.c (determine_value_range): Instead of
assertion failure handle inconsistencies in range info by only
using var's range info and not PHI result range infos.

* gcc.c-torture/compile/pr59417.c: New test.

--- gcc/tree-ssa-copy.c.jj  2013-12-10 08:52:13.0 +0100
+++ gcc/tree-ssa-copy.c 2013-12-10 17:55:16.819293842 +0100
@@ -567,14 +567,28 @@ fini_copy_prop (void)
   if (copy_of[i].value != var
  && TREE_CODE (copy_of[i].value) == SSA_NAME)
{
+ basic_block copy_of_bb
+   = gimple_bb (SSA_NAME_DEF_STMT (copy_of[i].value));
+ basic_block var_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
  if (POINTER_TYPE_P (TREE_TYPE (var))
  && SSA_NAME_PTR_INFO (var)
  && !SSA_NAME_PTR_INFO (copy_of[i].value))
-   duplicate_ssa_name_ptr_info (copy_of[i].value,
-SSA_NAME_PTR_INFO (var));
+   {
+ duplicate_ssa_name_ptr_info (copy_of[i].value,
+  SSA_NAME_PTR_INFO (var));
+ /* Points-to information is cfg insensitive,
+but alignment info might be cfg sensitive, if it
+e.g. is derived from VRP derived non-zero bits.
+So, do not copy alignment info if the two SSA_NAMEs
+aren't defined in the same basic block.  */
+ if (var_bb != copy_of_bb)
+   mark_ptr_info_alignment_unknown
+   (SSA_NAME_PTR_INFO (copy_of[i].value));
+   }
  else if (!POINTER_TYPE_P (TREE_TYPE (var))
   && SSA_NAME_RANGE_INFO (var)
-  && !SSA_NAME_RANGE_INFO (copy_of[i].value))
+  && !SSA_NAME_RANGE_INFO (copy_of[i].value)
+  && var_bb == copy_of_bb)
duplicate_ssa_name_range_info (copy_of[i].value,
   SSA_NAME_RANGE_TYPE (var),
   SSA_NAME_RANGE_INFO (var));
--- gcc/tree-ssa-loop-niter.c.jj2013-12-02 23:34:02.0 +0100
+++ gcc/tree-ssa-loop-niter.c   2013-12-10 17:59:07.744105878 +0100
@@ -173,7 +173,15 @@ determine_value_range (struct loop *loop
{
  minv = minv.max (minc, TYPE_UNSIGNED (type));
  maxv = maxv.min (maxc, TYPE_UNSIGNED (type));
- gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0);
+ /* If the PHI result range are inconsistent with
+the VAR range, give up on looking at the PHI
+results.  This can happen if VR_UNDEFINED is
+involved.  */
+ if (minv.cmp (maxv, TYPE_UNSIGNED (type)) > 0)
+   {
+ rtype = get_range_info (var, &minv, &maxv);
+ break;
+   }
}
}
}
--- gcc/testsuite/gcc.c-torture/compile/pr59417.c.jj2013-12-10 
17:59:29.235996171 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr59417.c   2013-12-09 
16:26:53.0 +0100
@@ -0,0 +1,39 @@
+/* PR tree-optimization/59417 */
+
+int a, b, d;
+short c;
+
+void
+f (void)
+{
+  if (b)
+{
+  int *e;
+
+  if (d)
+   {
+ for (; b; a++)
+ lbl1:
+   d = 0;
+
+ for (; d <= 1; d++)
+   {
+ int **q = &e;
+ for (**q = 0; **q <= 0; **q++)
+   d = 0;
+   }
+   }
+}
+
+  else
+{
+  int t;
+  for (c = 0; c < 77; c++)
+   for (c = 0; c < 46; c++);
+  for (; t <= 0; t++)
+  lbl2:
+   ;
+  goto lbl1;
+}
+  goto lbl2;
+}

Jakub

Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution

2013-12-11 Thread Jakub Jelinek

On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote:
> Thank both of you for being patient on this patch.
> I went through the documentation quickly and realized that I have to
> modify pointer-map structure to make it recognized by GC (maybe more

Changing pointer_map at this point is IMHO not appropriate.

> work suggested by Jakub).  It seems I shouldn't include that task in
> this patch at this stage 3, I am thinking just call free_affine*
> function in the place it is created for SCEV.  Of course, that makes
> it more expensive.

Perhaps you could call free_affine_expand_cache say from
analyze_scalar_evolution, that is the innermost enclosing exported
API that indirectly calls your new code, but it seems it is also called
recursively from analyze_scalar_evolution_1 or functions it calls.
So maybe you'd need to rename analyze_scalar_evolution to
analyze_scalar_evolution_2 and adjust some calls to analyze_scalar_evolution
to the latter in tree-scalar-evolution.c, and add analyze_scalar_evolution
wrapper that calls analyze_scalar_evolution_2 and free_affine_expand_cache.
Whether this would work depends on detailed analysis of the
tree-scalar-evolution.c callgraph, there are tons of functions, most of
them static, and the question is if there is a single non-static (or at most
a few of them) function that cover all calls to your new code in the
static functions it (indirectly) calls, i.e. if there isn't any other
external entry point that might call your new code without doing
free_affine_expand_cache afterwards.

If you can find that, I'd say it would be the safest thing to do.
But let's see what say Richard thinks about it too.

Jakub

[PATCH, nds32] Missing target_cpu_default in TARGET_DEFAULT_TARGET_FLAGS.

2013-12-11 Thread Monk Chiang

Hi,

Recently I used --target=nds32be-elf to configure nds32 gcc,
it seems that the big endian is not set as default.

In the config.gcc, I notice that target_cpu_default is defined as:

  nds32le-*-*)
target_cpu_default="0"

  nds32be-*-*)
target_cpu_default="0|MASK_BIG_ENDIAN"

But the TARGET_CPU_DEFAULT is not added into TARGET_DEFAULT_TARGET_FLAGS so that
MASK_BIG_ENDIAN is not enabled for nds32be-elf.

The following is the patch to fix this issue.  Tested on nds32be-elf.

OK to apply?

Index: common/config/nds32/nds32-common.c
===
--- common/config/nds32/nds32-common.c  (revision 205880)
+++ common/config/nds32/nds32-common.c  (working copy)
@@ -93,7 +93,8 @@
  TARGET_CMOV : Generate conditional move instruction.  */
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS\
-  (MASK_GP_DIRECT  \
+  (TARGET_CPU_DEFAULT  \
+   | MASK_GP_DIRECT\
| MASK_16_BIT   \
| MASK_PERF_EXT \
| MASK_CMOV)
Index: ChangeLog
===
--- ChangeLog   (revision 205880)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2013-12-11  Monk Chiang 
+
+   * common/config/nds32/nds32-common.c (TARGET_DEFAULT_TARGET_FLAGS):
+   Redefine.
+
 2013-12-11  Bin Cheng  

Reverted:

Monk

Re: [PATCH] Fix invalid tree sharing caused by the inliner (PR tree-optimization/59386)

2013-12-11 Thread Richard Biener

On Wed, 11 Dec 2013, Jakub Jelinek wrote:

> Hi!
> 
> If id->retvar isn't a decl (can happen for DECL_BY_REFERENCE returns),
> then we can end up with invalid tree sharing because we reuse it more than
> once (on the following testcase id->retvar is a COMPONENT_REF).
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok.

Thanks,
Richard.

> 2013-12-11  Jakub Jelinek  
> 
>   PR tree-optimization/59386
>   * tree-inline.c (remap_gimple_stmt): If not id->do_not_unshare,
>   unshare_expr (id->retval) before passing it to gimple_build_assign.
> 
>   * gcc.c-torture/compile/pr59386.c: New test.
> 
> --- gcc/tree-inline.c.jj  2013-12-10 08:52:13.0 +0100
> +++ gcc/tree-inline.c 2013-12-10 17:04:09.022069945 +0100
> @@ -1273,7 +1273,9 @@ remap_gimple_stmt (gimple stmt, copy_bod
> || ! SSA_NAME_VAR (retval)
> || TREE_CODE (SSA_NAME_VAR (retval)) != RESULT_DECL)))
>  {
> -   copy = gimple_build_assign (id->retvar, retval);
> +   copy = gimple_build_assign (id->do_not_unshare
> +   ? id->retvar : unshare_expr (id->retvar),
> +   retval);
> /* id->retvar is already substituted.  Skip it on later remapping.  */
> skip_first = true;
>   }
> --- gcc/testsuite/gcc.c-torture/compile/pr59386.c.jj  2013-12-10 
> 17:06:39.389291052 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr59386.c 2013-12-10 
> 17:05:37.0 +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/59386 */
> +
> +struct S { int s; };
> +struct T { int t; struct S u; } c;
> +int b;
> +
> +struct S
> +foo ()
> +{
> +  struct T d;
> +  if (b)
> +while (c.t)
> +  ;
> +  else
> +return d.u;
> +}
> +
> +struct S
> +bar ()
> +{
> +  struct T a;
> +  a.u = foo ();
> +  return a.u;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH] Range info handling fixes (PR tree-optimization/59417)

2013-12-11 Thread Richard Biener

On Wed, 11 Dec 2013, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, range info as well as alignment info for pointers
> can be position dependent, can be e.g. computed from VRP ASSERT_EXPRs,
> so we can't blindly propagate it to SSA_NAMEs that the SSA_NAME with range
> info has been set to.  The following patch limits that to SSA_NAMEs defined
> within the same bb, in that case it should be safe.
> 
> Also, the patch makes determine_value_range more robust, insert of asserting
> it will just punt and use solely range info of var, rather than also
> considering ranges of PHI results.  If VR_UNDEFINED comes into play, range
> info can be weird (of course on undefined behavior only code, but we
> shouldn't ICE on it).
> 
> Either of these changes fixes the ICE on it's own.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2013-12-11  Jakub Jelinek  
> 
>   PR tree-optimization/59417
>   * tree-ssa-copy.c (fini_copy_prop): If copy_of[i].value is defined
>   in a different bb rhan var, only duplicate points-to info and
>   not alignment info and don't duplicate range info.
>   * tree-ssa-loop-niter.c (determine_value_range): Instead of
>   assertion failure handle inconsistencies in range info by only
>   using var's range info and not PHI result range infos.
> 
>   * gcc.c-torture/compile/pr59417.c: New test.
> 
> --- gcc/tree-ssa-copy.c.jj2013-12-10 08:52:13.0 +0100
> +++ gcc/tree-ssa-copy.c   2013-12-10 17:55:16.819293842 +0100
> @@ -567,14 +567,28 @@ fini_copy_prop (void)
>if (copy_of[i].value != var
> && TREE_CODE (copy_of[i].value) == SSA_NAME)
>   {
> +   basic_block copy_of_bb
> + = gimple_bb (SSA_NAME_DEF_STMT (copy_of[i].value));
> +   basic_block var_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
> if (POINTER_TYPE_P (TREE_TYPE (var))
> && SSA_NAME_PTR_INFO (var)
> && !SSA_NAME_PTR_INFO (copy_of[i].value))
> - duplicate_ssa_name_ptr_info (copy_of[i].value,
> -  SSA_NAME_PTR_INFO (var));
> + {
> +   duplicate_ssa_name_ptr_info (copy_of[i].value,
> +SSA_NAME_PTR_INFO (var));
> +   /* Points-to information is cfg insensitive,
> +  but alignment info might be cfg sensitive, if it
> +  e.g. is derived from VRP derived non-zero bits.
> +  So, do not copy alignment info if the two SSA_NAMEs
> +  aren't defined in the same basic block.  */
> +   if (var_bb != copy_of_bb)
> + mark_ptr_info_alignment_unknown
> + (SSA_NAME_PTR_INFO (copy_of[i].value));
> + }
> else if (!POINTER_TYPE_P (TREE_TYPE (var))
>  && SSA_NAME_RANGE_INFO (var)
> -&& !SSA_NAME_RANGE_INFO (copy_of[i].value))
> +&& !SSA_NAME_RANGE_INFO (copy_of[i].value)
> +&& var_bb == copy_of_bb)
>   duplicate_ssa_name_range_info (copy_of[i].value,
>  SSA_NAME_RANGE_TYPE (var),
>  SSA_NAME_RANGE_INFO (var));
> --- gcc/tree-ssa-loop-niter.c.jj  2013-12-02 23:34:02.0 +0100
> +++ gcc/tree-ssa-loop-niter.c 2013-12-10 17:59:07.744105878 +0100
> @@ -173,7 +173,15 @@ determine_value_range (struct loop *loop
>   {
> minv = minv.max (minc, TYPE_UNSIGNED (type));
> maxv = maxv.min (maxc, TYPE_UNSIGNED (type));
> -   gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0);
> +   /* If the PHI result range are inconsistent with
> +  the VAR range, give up on looking at the PHI
> +  results.  This can happen if VR_UNDEFINED is
> +  involved.  */
> +   if (minv.cmp (maxv, TYPE_UNSIGNED (type)) > 0)
> + {
> +   rtype = get_range_info (var, &minv, &maxv);
> +   break;
> + }
>   }
>   }
>   }
> --- gcc/testsuite/gcc.c-torture/compile/pr59417.c.jj  2013-12-10 
> 17:59:29.235996171 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr59417.c 2013-12-09 
> 16:26:53.0 +0100
> @@ -0,0 +1,39 @@
> +/* PR tree-optimization/59417 */
> +
> +int a, b, d;
> +short c;
> +
> +void
> +f (void)
> +{
> +  if (b)
> +{
> +  int *e;
> +
> +  if (d)
> + {
> +   for (; b; a++)
> +   lbl1:
> + d = 0;
> +
> +   for (; d <= 1; d++)
> + {
> +   int **q = &e;
> +   for (**q = 0; **q <= 0; **q++)
> + d = 0;
> + }
> + }
> +}
> +
> +  else
> +{
> +  int t;
> +  for (c = 0; c < 77; c++)
> + for (c = 0; c < 46; c++);
> +  for (; t <= 0; t++)
> +  lbl2:
> + ;
> +  goto lbl1;
> +}
> +  goto lbl2;
> +}
> 
>   Jakub

Re: [PATCH] S/390: Function hotpatching option and function attribute

2013-12-11 Thread Dominik Vogt

Backporting this patch to 4.8, 4.4 and 4.1 brougth some issues to
light.  The new version of the patch optimizes some code and
warnings and adapts the test cases so that they run without
modifications on 4.8 and earlier.

Original message:
On Thu, Dec 05, 2013 at 09:06:31AM +0100, Dominik Vogt wrote:
> Andreas Krebbel an I have ported the function hotpatching feature
> from i386 (aka ms_hook_prologue attribute) to S/390.  Andreas has
> already internally approved the attached patch and will commit it
> soon (for legal reasons he has to do that himself).
> 
> The attached patch introduces command line options as well as a
> function attribute to control the function hotpatching prologue:
> 
>   -m[no-]hotpatch
> Enable or disable hotpatching prologues for all functions in
> the compilation unit (with the default prologue size).
> 
>   -mhotpatch=
> Same as -mhotpatch, but the size of the prologue is 
> halfwords (i.e. the trampoline area is  halwords).
> 
>   __attribute__ ((hotpatch)) and 
>   __attribute__ ((hotpatch()))
> Work like the option, but only for the specific function.  If
> the attribute and one of the options is given, the attribute
> always wins.
> 
>  above is limited to values 0 through 100 (default 12).
> Functions that are explicitly inline cannot be hotpatched, and
> hotpatched functions are never implicitly inlined.
> 
> The layout of a hotpatchable function prologue is
> 
> nopr %r7 ( times, 2 bytes each, aka "trampoline area")
>   function_label:
> nop 0 (four bytes)
> 
> The function label alignment is changed to eight bytes to make
> sure that the eight bytes directly in front of the label reside
> in the same cacheline.
> 
> To hotpatch a function, the program first writes the address of
> the new version of the hotpatched function to the eight bytes
> before the label (four bytes on 32-bit), then writes code to read
> that address and to jump to it into the trampoline area and then
> replaces the nop behind the label with a relative backwards jump
> into the trampoline area.  (Some maintenance of the Icache is
> necessary in this procedure.)
> 
> If the program can make sure that the patched functions are always
> close enough to the original in the memory map, the trampoline
> area can be omitted ( = 0), and a short relative jump to the
> patched function can be written over the four byte nop.
> 
> The patch includes a number of test cases that all run without
> trouble here.  The tests validate the syntax of the options an the
> attribute and check for the correct number of nops.  I have
> verified the correct layout of the trampoline manually (i.e. the
> correct placement of the nops).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From a307cee52819be37cb44d8f2a972200b132697e8 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 18 Nov 2013 15:41:46 +
Subject: [PATCH 1/2] S/390: Function hotpatching option and function attribute

---
 gcc/config/s390/s390-protos.h  |   1 +
 gcc/config/s390/s390.c | 201 +
 gcc/config/s390/s390.h |   5 +-
 gcc/config/s390/s390.opt   |   8 +
 gcc/doc/extend.texi|  11 ++
 gcc/doc/invoke.texi|  18 +-
 gcc/testsuite/gcc.target/s390/hotpatch-1.c |  20 ++
 gcc/testsuite/gcc.target/s390/hotpatch-10.c|  21 +++
 gcc/testsuite/gcc.target/s390/hotpatch-11.c|  20 ++
 gcc/testsuite/gcc.target/s390/hotpatch-12.c|  20 ++
 gcc/testsuite/gcc.target/s390/hotpatch-2.c |  20 ++
 gcc/testsuite/gcc.target/s390/hotpatch-3.c |  20 ++
 gcc/testsuite/gcc.target/s390/hotpatch-4.c |  26 +++
 gcc/testsuite/gcc.target/s390/hotpatch-5.c |  21 +++
 gcc/testsuite/gcc.target/s390/hotpatch-6.c |  21 +++
 gcc/testsuite/gcc.target/s390/hotpatch-7.c |  21 +++
 gcc/testsuite/gcc.target/s390/hotpatch-8.c |  28 +++
 gcc/testsuite/gcc.target/s390/hotpatch-9.c |  21 +++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-1.c |  27 +++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-2.c |  27 +++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-3.c |  27 +++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-4.c |  11 ++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-5.c |  28 +++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-6.c |  11 ++
 gcc/testsuite/gcc.target/s390/hotpatch-compile-7.c |  68 +++
 25 files changed, 700 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-11.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-12.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-3.c
 create mode 100644 gcc/testsuite/g

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread Richard Sandiford

Richard Henderson  writes:
> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>> Sorry, I don't understand.  I never said it was invalid.  I said
>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>> a single register.  On a little-endian target, the offset cannot be
>> anything other than 0 in that case.
>> 
>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>> something that is always invalid, regardless of the target.  That kind
>> of situation should be rejected by target-independent code instead.
>
> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
> will be allocated to a single hard register.  That is something that we can't
> know in target-independent code before register allocation.

I was thinking that if we've got a class, we've also got things like
CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
But even in the padding cases an offset-based check in C_C_M_C could
be derived from other information.

subreg_get_info handles padding with:

  nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
  if (GET_MODE_INNER (xmode) == VOIDmode)
xmode_unit = xmode;
  else
xmode_unit = GET_MODE_INNER (xmode);
  gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
  gcc_assert (nregs_xmode
  == (GET_MODE_NUNITS (xmode)
  * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
  gcc_assert (hard_regno_nregs[xregno][xmode]
  == (hard_regno_nregs[xregno][xmode_unit]
  * GET_MODE_NUNITS (xmode)));

  /* You can only ask for a SUBREG of a value with holes in the middle
 if you don't cross the holes.  (Such a SUBREG should be done by
 picking a different register class, or doing it in memory if
 necessary.)  An example of a value with holes is XCmode on 32-bit
 x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
 3 for each part, but in memory it's two 128-bit parts.
 Padding is assumed to be at the end (not necessarily the 'high part')
 of each unit.  */
  if ((offset / GET_MODE_SIZE (xmode_unit) + 1
   < GET_MODE_NUNITS (xmode))
  && (offset / GET_MODE_SIZE (xmode_unit)
  != ((offset + GET_MODE_SIZE (ymode) - 1)
  / GET_MODE_SIZE (xmode_unit
{
  info->representable_p = false;
  rknown = true;
}

and I wouldn't really want to force targets to individually reproduce
that kind of logic at the class level.  If the worst comes to the worst
we could cache the difficult cases.

Thanks,
Richard

Re: [PATCH] Fix PR 59390

2013-12-11 Thread Bernhard Reutner-Fischer

On 8 December 2013 16:53, Uros Bizjak  wrote:
> On Fri, Dec 6, 2013 at 8:33 PM, Sriraman Tallam  wrote:
>> Patch updated with two more tests to check if the vfmadd insn is being
>> produced when possible.
>>
>> Thanks
>> Sri
>>
>> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam  wrote:
>>> Hi,
>>>
>>> I have attached a patch to fix
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review.
>>>
>>> Here is the problem. GCC adds target-specific builtins on demand. The
>>> FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via
>>> this declaration:
>>>
>>> void fun() __attribute__((target("fma")));
>>>
>>> Specifically, the builtin __builtin_ia32_vfmaddpd gets added when
>>> ix86_add_new_builtins is called from ix86_valid_target_attribute_tree
>>> when processing this target attribute.
>>>
>>> Now, when the vectorizer is processing the builtin "__builtin_fma" in
>>> function other_fn(), it checks to see if this function is vectorizable
>>> and calls ix86_builtin_vectorized_function in i386.c. That returns the
>>> builtin stored here:
>>>
>>>
>>> case BUILT_IN_FMA:
>>> if (out_mode == DFmode && in_mode == DFmode)
>>> {
>>>  if (out_n == 2 && in_n == 2)
>>>return ix86_builtins[IX86_BUILTIN_VFMADDPD];
>>>   
>>>
>>> ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE
>>> had the builtin not been added by the previous target attribute. That
>>> is why the code works if we remove the previous declaration.
>>>
>>> The fix is to not just return the builtin but to also check if the
>>> current function's isa allows the use of the builtin. For instance,
>>> this patch would solve the problem:
>>>
>>> @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, tre
>>>if (out_mode == DFmode && in_mode == DFmode)
>>>   {
>>>if (out_n == 2 && in_n == 2)
>>> -return ix86_builtins[IX86_BUILTIN_VFMADDPD];
>>> +{
>>> +  if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa
>>> +  & global_options.x_ix86_isa_flags)
>>> +return ix86_builtins[IX86_BUILTIN_VFMADDPD];
>>> +  else
>>> + return NULL_TREE;
>>> +}
>>>
>>>
>>> but there are many instances of this usage in
>>> ix86_builtin_vectorized_function. This patch covers all the cases.
>
>> PR target/59390
>> * gcc.target/i386/pr59390.c: New test.
>> * gcc.target/i386/pr59390_1.c: New test.
>> * gcc.target/i386/pr59390_2.c: New test.
>> * config/i386/i386.c (get_builtin): New function.
>> (ix86_builtin_vectorized_function): Replace all instances of
>> ix86_builtins[...] with get_builtin(...).
>> (ix86_builtin_reciprocal): Ditto.
>
> OK, with a couple of nits:
>
> +static tree get_builtin (enum ix86_builtins code)
>
> Please name this function ix86_get_builtin.
>
> +  if (current_function_decl)
> +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
> +  if (target_tree)
> +opts = TREE_TARGET_OPTION (target_tree);
> +  else
> +opts = TREE_TARGET_OPTION (target_option_default_node);
> +
>
> opts = TREE_TARGET_OPTION (target_tree ? target_tree :
> target_option_default_node);

Just curious:
> +static tree get_builtin (enum ix86_builtins code)
> +{
[]
> +>
[]
> @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre
>if (out_mode == DFmode && in_mode == DFmode)
> {
>   if (out_n == 2 && in_n == 2)
> -return ix86_builtins[IX86_BUILTIN_SQRTPD];
> +get_builtin (IX86_BUILTIN_SQRTPD);
>   else if (out_n == 4 && in_n == 4)
> -return ix86_builtins[IX86_BUILTIN_SQRTPD256];
> +get_builtin (IX86_BUILTIN_SQRTPD256);
> }
>break;

I must be missing something?
Don't you have to return ix86_get_builtin(...) ?
thanks,

Re: [PATCH] Fix PR 59390

2013-12-11 Thread Richard Biener

On Wed, 11 Dec 2013, Bernhard Reutner-Fischer wrote:

> On 8 December 2013 16:53, Uros Bizjak  wrote:
> > On Fri, Dec 6, 2013 at 8:33 PM, Sriraman Tallam  wrote:
> >> Patch updated with two more tests to check if the vfmadd insn is being
> >> produced when possible.
> >>
> >> Thanks
> >> Sri
> >>
> >> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam  
> >> wrote:
> >>> Hi,
> >>>
> >>> I have attached a patch to fix
> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review.
> >>>
> >>> Here is the problem. GCC adds target-specific builtins on demand. The
> >>> FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via
> >>> this declaration:
> >>>
> >>> void fun() __attribute__((target("fma")));
> >>>
> >>> Specifically, the builtin __builtin_ia32_vfmaddpd gets added when
> >>> ix86_add_new_builtins is called from ix86_valid_target_attribute_tree
> >>> when processing this target attribute.
> >>>
> >>> Now, when the vectorizer is processing the builtin "__builtin_fma" in
> >>> function other_fn(), it checks to see if this function is vectorizable
> >>> and calls ix86_builtin_vectorized_function in i386.c. That returns the
> >>> builtin stored here:
> >>>
> >>>
> >>> case BUILT_IN_FMA:
> >>> if (out_mode == DFmode && in_mode == DFmode)
> >>> {
> >>>  if (out_n == 2 && in_n == 2)
> >>>return ix86_builtins[IX86_BUILTIN_VFMADDPD];
> >>>   
> >>>
> >>> ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE
> >>> had the builtin not been added by the previous target attribute. That
> >>> is why the code works if we remove the previous declaration.
> >>>
> >>> The fix is to not just return the builtin but to also check if the
> >>> current function's isa allows the use of the builtin. For instance,
> >>> this patch would solve the problem:
> >>>
> >>> @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, 
> >>> tre
> >>>if (out_mode == DFmode && in_mode == DFmode)
> >>>   {
> >>>if (out_n == 2 && in_n == 2)
> >>> -return ix86_builtins[IX86_BUILTIN_VFMADDPD];
> >>> +{
> >>> +  if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa
> >>> +  & global_options.x_ix86_isa_flags)
> >>> +return ix86_builtins[IX86_BUILTIN_VFMADDPD];
> >>> +  else
> >>> + return NULL_TREE;
> >>> +}
> >>>
> >>>
> >>> but there are many instances of this usage in
> >>> ix86_builtin_vectorized_function. This patch covers all the cases.
> >
> >> PR target/59390
> >> * gcc.target/i386/pr59390.c: New test.
> >> * gcc.target/i386/pr59390_1.c: New test.
> >> * gcc.target/i386/pr59390_2.c: New test.
> >> * config/i386/i386.c (get_builtin): New function.
> >> (ix86_builtin_vectorized_function): Replace all instances of
> >> ix86_builtins[...] with get_builtin(...).
> >> (ix86_builtin_reciprocal): Ditto.
> >
> > OK, with a couple of nits:
> >
> > +static tree get_builtin (enum ix86_builtins code)
> >
> > Please name this function ix86_get_builtin.
> >
> > +  if (current_function_decl)
> > +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
> > +  if (target_tree)
> > +opts = TREE_TARGET_OPTION (target_tree);
> > +  else
> > +opts = TREE_TARGET_OPTION (target_option_default_node);
> > +
> >
> > opts = TREE_TARGET_OPTION (target_tree ? target_tree :
> > target_option_default_node);
> 
> Just curious:
> > +static tree get_builtin (enum ix86_builtins code)
> > +{
> []
> > +>
> []
> > @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre
> >if (out_mode == DFmode && in_mode == DFmode)
> > {
> >   if (out_n == 2 && in_n == 2)
> > -return ix86_builtins[IX86_BUILTIN_SQRTPD];
> > +get_builtin (IX86_BUILTIN_SQRTPD);
> >   else if (out_n == 4 && in_n == 4)
> > -return ix86_builtins[IX86_BUILTIN_SQRTPD256];
> > +get_builtin (IX86_BUILTIN_SQRTPD256);
> > }
> >break;
> 
> I must be missing something?
> Don't you have to return ix86_get_builtin(...) ?

Of course you have to.

Richard.

Re: [RFA][PATCH][PR tree-optimization/45685]

2013-12-11 Thread Richard Biener

On Wed, Dec 11, 2013 at 7:54 AM, Jeff Law  wrote:
>
> So for this source, compiled for x86_64 with -O3:
>
> typedef unsigned long int uint64_t;
> typedef long int int64_t;
> int summation_helper_1(int64_t* products, uint64_t count)
> {
> int s = 0;
> uint64_t i;
> for(i=0; i {
> int64_t val = (products[i]>0) ? 1 : -1;
> products[i] *= val;
> if(products[i] != i)
> val = -val;
> products[i] = val;
> s += val;
> }
> return s;
> }
>
>
> int summation_helper_2(int64_t* products, uint64_t count)
> {
> int s = 0;
> uint64_t i;
> for(i=0; i {
> int val = (products[i]>0) ? 1 : -1;
> products[i] *= val;
> if(products[i] != i)
> val = -val;
> products[i] = val;
> s += val;
> }
> return s;
> }
>
>
> The loops we generate are pretty bad and have regressed relative to older
> versions of GCC.
>
> For the first loop, we have the following .optimized output for the loop:
>
>   :
>   # s_28 = PHI 
>   # i_27 = PHI 
>   _11 = MEM[base: products_9(D), index: i_27, step: 8, offset: 0B];
>   val_4 = _11 > 0 ? 1 : -1;
>   prephitmp_38 = _11 > 0 ? -1 : 1;
>   prephitmp_39 = _11 > 0 ? 4294967295 : 1;
>   prephitmp_41 = _11 > 0 ? 1 : 4294967295;
>   _12 = val_4 * _11;
>   _14 = (long unsigned int) _12;
>   val_3 = _14 != i_27 ? prephitmp_38 : val_4;
>   prephitmp_44 = _14 != i_27 ? prephitmp_39 : prephitmp_41;
>   MEM[base: products_9(D), index: i_27, step: 8, offset: 0B] = val_3;
>   s.1_18 = (unsigned int) s_28;
>   _19 = prephitmp_44 + s.1_18;
>   s_20 = (int) _19;
>   i_21 = i_27 + 1;
>   if (i_21 != count_7(D))
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
>
> Note the series of COND_EXPRs.  A couple are just conditional negation which
> can be implemented with a straight-line code sequence.   Using that
> straight-line sequence results in:
>
>   :
>   # s_31 = PHI 
>   # i_32 = PHI 
>   _11 = MEM[base: products_9(D), index: i_32, step: 8, offset: 0B];
>   val_4 = _11 > 0 ? 1 : -1;
>   _12 = val_4 * _11;
>   _14 = (long unsigned int) _12;
>   _24 = _14 != i_32;
>   _25 = (int64_t) _24;
>   _29 = -_25;
>   _28 = _29 ^ val_4;
>   _27 = _28 + _25;
>   MEM[base: products_9(D), index: i_32, step: 8, offset: 0B] = _27;
>   _17 = (unsigned int) _27;
>   s.1_18 = (unsigned int) s_31;
>   _19 = _17 + s.1_18;
>   s_20 = (int) _19;
>   i_21 = i_32 + 1;
>   if (i_21 != count_7(D))
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
> Which *appears* worse.  However, that code can much more easily be handled
> by the RTL optimizers.When we look at what the trunk generates at the
> assembly level we have:
>
> .L3:
> movq(%rdi,%rcx,8), %rdx
> testq   %rdx, %rdx
> setg%r8b
> movzbl  %r8b, %r10d
> movzbl  %r8b, %r8d
> leaq-1(%r10,%r10), %r10
> leal-1(%r8,%r8), %r8d
> movq%r10, %r11
> imulq   %rdx, %r11
> testq   %rdx, %rdx
> setle   %dl
> movzbl  %dl, %r9d
> movzbl  %dl, %edx
> leaq-1(%r9,%r9), %r9
> leal-1(%rdx,%rdx), %edx
> cmpq%rcx, %r11
> cmove   %r10, %r9
> cmove   %r8d, %edx
> movq%r9, (%rdi,%rcx,8)
> addq$1, %rcx
> addl%edx, %eax
> cmpq%rsi, %rcx
> jne .L3
> (Ick)
>
> With the conditional negation patch that turns into:
>
> L3:
> movq(%rdi,%rcx,8), %r8
> xorl%edx, %edx
> testq   %r8, %r8
> setg%dl
> leaq-1(%rdx,%rdx), %rdx
> imulq   %rdx, %r8
> cmpq%rcx, %r8
> setne   %r8b
> movzbl  %r8b, %r8d
> movq%r8, %r9
> negq%r9
> xorq%r9, %rdx
> addq%r8, %rdx
> movq%rdx, (%rdi,%rcx,8)
> addq$1, %rcx
> addl%edx, %eax
> cmpq%rsi, %rcx
> jne .L3
>
> No branches within the loop, no conditional moves either.  In all it's 5
> instructions shorter.
>
>
> The second loop shows similar effects, though they're not as dramatic.
>
> Before:
>   :
>   # s_27 = PHI 
>   # i_26 = PHI 
>   _11 = MEM[base: products_9(D), index: i_26, step: 8, offset: 0B];
>   val_4 = _11 > 0 ? 1 : -1;
>   prephitmp_32 = _11 > 0 ? 1 : -1;
>   prephitmp_33 = _11 > 0 ? -1 : 1;
>   prephitmp_34 = _11 > 0 ? -1 : 1;
>   _13 = _11 * prephitmp_32;
>   _15 = (long unsigned int) _13;
>   val_3 = _15 != i_26 ? prephitmp_33 : val_4;
>   prephitmp_36 = _15 != i_26 ? prephitmp_34 : prephitmp_32;
>   MEM[base: products_9(D), index: i_26, step: 8, offset: 0B] = prephitmp_36;
>   s_19 = val_3 + s_27;
>   i_20 = i_26 + 1;
>   if (i_20 != count_7(D))
> goto ;
>   else
> goto ;
>
>  :
>   goto ;
>
>
> Which results in the following assembly:
>
>

Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution

2013-12-11 Thread Richard Biener

On Wed, Dec 11, 2013 at 9:50 AM, Jakub Jelinek  wrote:
> On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote:
>> Thank both of you for being patient on this patch.
>> I went through the documentation quickly and realized that I have to
>> modify pointer-map structure to make it recognized by GC (maybe more
>
> Changing pointer_map at this point is IMHO not appropriate.
>
>> work suggested by Jakub).  It seems I shouldn't include that task in
>> this patch at this stage 3, I am thinking just call free_affine*
>> function in the place it is created for SCEV.  Of course, that makes
>> it more expensive.
>
> Perhaps you could call free_affine_expand_cache say from
> analyze_scalar_evolution, that is the innermost enclosing exported
> API that indirectly calls your new code, but it seems it is also called
> recursively from analyze_scalar_evolution_1 or functions it calls.
> So maybe you'd need to rename analyze_scalar_evolution to
> analyze_scalar_evolution_2 and adjust some calls to analyze_scalar_evolution
> to the latter in tree-scalar-evolution.c, and add analyze_scalar_evolution
> wrapper that calls analyze_scalar_evolution_2 and free_affine_expand_cache.
> Whether this would work depends on detailed analysis of the
> tree-scalar-evolution.c callgraph, there are tons of functions, most of
> them static, and the question is if there is a single non-static (or at most
> a few of them) function that cover all calls to your new code in the
> static functions it (indirectly) calls, i.e. if there isn't any other
> external entry point that might call your new code without doing
> free_affine_expand_cache afterwards.
>
> If you can find that, I'd say it would be the safest thing to do.
> But let's see what say Richard thinks about it too.

I think keeping the SCEV cache live over multiple passes is fragile
(we do re-use SSA names and passes do not appropriately call
scev_reset[_htab] in all cases).

With fixing that the easiest approach is to associate
a affine-expand cache with the scev info.

Without that there isn't a very good solution other than discarding the
cache after each expansion at the point you use it.  The SCEV
cache should effectively make most of the affine expansion caching
moot (but you can try instrumenting that to verify it).

That is, I suggest to try freeing the cache right after the second
tree_to_aff_combination_expand.

Btw,

+  /* Try harder to check if they are equal.  */
+  tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map);
+  tree_to_aff_combination_expand (step_val, type, &aff2, &peeled_chrec_map);
+  aff_combination_scale (&aff2, double_int_minus_one);
+  aff_combination_add (&aff1, &aff2);
+  left = fold_convert (type, aff_combination_to_tree (&aff1));
+
+  /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP
+ if "left" equals to "init + right".  */
+  if (operand_equal_p (left, integer_zero_node, 0))

you don't need aff_combination_to_tree, simply check

 aff1.n == 0 && aff1.offset.is_zero ()

(you may want to add aff_combination_zero_p or even
aff_combination_equal_p)

Thanks,
Richard.


> Jakub

New tsan tests.

2013-12-11 Thread Maxim Ostapenko


Hi all,

I've added new  tests for tsan from LLVM.

Tested on x86_64.

Ok to commit?

-Maxim
diff --git a/gcc/testsuite/c-c++-common/tsan/free_race2.c b/gcc/testsuite/c-c++-common/tsan/free_race2.c
new file mode 100644
index 000..3c15d2d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/tsan/free_race2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-shouldfail "tsan" } */
+
+#include 
+
+void __attribute__((noinline)) foo(int *mem) {
+  free(mem);
+}
+
+void __attribute__((noinline)) bar(int *mem) {
+  mem[0] = 42;
+}
+
+int main() {
+  int *mem = (int*)malloc(100);
+  foo(mem);
+  bar(mem);
+  return 0;
+}
+
+/* { dg-output "WARNING: ThreadSanitizer: heap-use-after-free.*(\n|\r\n|\r)" } */
+/* { dg-output "  Write of size 4.* by main thread:(\n|\r\n|\r)" } */
+/* { dg-output "#0 bar.*" } */
+/* { dg-output "#1 main .*" } */
+/* { dg-output "  Previous write of size 8 at .* by main thread:(\n|\r\n|\r)" } */
+/* { dg-output "#0 free .*" } */
+/* { dg-output "#\(1|2\) foo.*(\n|\r\n|\r)" } */
+/* { dg-output "#\(2|3\) main .*" } */
+
diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_barrier2.c b/gcc/testsuite/c-c++-common/tsan/race_on_barrier2.c
new file mode 100644
index 000..9576c67
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/tsan/race_on_barrier2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-shouldfail "tsan" } */
+
+#include 
+#include 
+#include 
+#include 
+
+pthread_barrier_t B;
+int Global;
+
+void *Thread1(void *x) {
+  if (pthread_barrier_wait(&B) == PTHREAD_BARRIER_SERIAL_THREAD)
+pthread_barrier_destroy(&B);
+  return NULL;
+}
+
+void *Thread2(void *x) {
+  if (pthread_barrier_wait(&B) == PTHREAD_BARRIER_SERIAL_THREAD)
+pthread_barrier_destroy(&B);
+  return NULL;
+}
+
+int main() {
+  pthread_barrier_init(&B, 0, 2);
+  pthread_t t;
+  pthread_create(&t, NULL, Thread1, NULL);
+  Thread2(0);
+  pthread_join(t, NULL);
+  return 0;
+}
+
+/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c b/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c
new file mode 100644
index 000..f112d09
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-shouldfail "tsan" } */
+
+#include 
+#include 
+#include 
+#include 
+
+pthread_mutex_t Mtx;
+int Global;
+
+void *Thread1(void *x) {
+  pthread_mutex_init(&Mtx, 0);
+  pthread_mutex_lock(&Mtx);
+  Global = 42;
+  pthread_mutex_unlock(&Mtx);
+  return NULL;
+}
+
+void *Thread2(void *x) {
+  sleep(1);
+  pthread_mutex_lock(&Mtx);
+  Global = 43;
+  pthread_mutex_unlock(&Mtx);
+  return NULL;
+}
+
+int main() {
+  pthread_t t[2];
+  pthread_create(&t[0], NULL, Thread1, NULL);
+  pthread_create(&t[1], NULL, Thread2, NULL);
+  pthread_join(t[0], NULL);
+  pthread_join(t[1], NULL);
+  pthread_mutex_destroy(&Mtx);
+  return 0;
+}
+
+/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
+/* { dg-output "  Atomic read of size 1 at .* by thread T2:(\n|\r\n|\r)" } */
+/* { dg-output "#0 pthread_mutex_lock.*" } */
+/* { dg-output "#1 Thread2.* .*(race_on_mutex.c:22|\\?{2}:0) (.*)" } */
+/* { dg-output "  Previous write of size 1 at .* by thread T1:(\n|\r\n|\r)" } */
+/* { dg-output "#0 pthread_mutex_init .* (.)*" } */
+/* { dg-output "#1 Thread1.* .*(race_on_mutex.c:13|\\?{2}:0) .*" } */
diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_mutex2.c b/gcc/testsuite/c-c++-common/tsan/race_on_mutex2.c
new file mode 100644
index 000..d8a6980
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/tsan/race_on_mutex2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-shouldfail "tsan" } */
+
+#include 
+#include 
+#include 
+#include 
+
+void *Thread(void *x) {
+  pthread_mutex_lock((pthread_mutex_t*)x);
+  pthread_mutex_unlock((pthread_mutex_t*)x);
+  return 0;
+}
+
+int main() {
+  pthread_mutex_t Mtx;
+  pthread_mutex_init(&Mtx, 0);
+  pthread_t t;
+  pthread_create(&t, 0, Thread, &Mtx);
+  sleep(1);
+  pthread_mutex_destroy(&Mtx);
+  pthread_join(t, 0);
+  return 0;
+}
+
+/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
diff --git a/gcc/testsuite/c-c++-common/tsan/simple_race.c b/gcc/testsuite/c-c++-common/tsan/simple_race.c
new file mode 100644
index 000..24b88e8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/tsan/simple_race.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-shouldfail "tsan" } */
+
+#include 
+#include 
+
+int Global;
+
+void *Thread1(void *x) {
+  Global = 42;
+  return NULL;
+}
+
+void *Thread2(void *x) {
+  Global = 43;
+  return NULL;
+}
+
+int main() {
+  pthread_t t[2];
+  pthread_create(&t[0], NULL, Thread1, NULL);
+  pthread_create(&t[1], NULL, Thread2, NULL);
+  pthread_join(t[0], NULL);
+  pthread_join(t[1], NULL);
+  return 0;
+}
+
+/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
diff --git a/gcc/testsuite/c-c++-common/tsan/simple_stack.c b/gcc/testsuite/c-c++-common/tsan/simple_stack.c
new file

Re: New tsan tests.

2013-12-11 Thread Jakub Jelinek

On Wed, Dec 11, 2013 at 02:12:35PM +0400, Maxim Ostapenko wrote:
> I've added new  tests for tsan from LLVM.
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tsan/aligned_vs_unaligned_race.C
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +
> +#include 
> +#include 
> +#include 
> +
> +uint64_t Global[2];
> +
> +void *Thread1(void *x) {
> +  Global[1]++;
> +  return NULL;
> +}
> +
> +void *Thread2(void *x) {
> +  char *p1 = reinterpret_cast(&Global[0]);
> +  uint64_t *p4 = reinterpret_cast(p1 + 1);

This test would fail on strict alignment targets.  Right now tsan
is supported only on x86_64-linux, so this isn't fatal right now, but
something to consider for the future.  Why is the test C++ only though?
Just replace the reinterpret_cast stuff with normal C cast?

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tsan/benign_race.C
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +
> +#include 
> +#include 
> +#include 
> +
> +int Global;
> +int WTFGlobal;

Again, why C++ only?  Just guard the extern "C" { and } with #ifdef __cplusplus.

> +
> +extern "C" {
> +void AnnotateBenignRaceSized(const char *f, int l,
> + void *mem, unsigned int size, const char *desc);
> +void WTFAnnotateBenignRaceSized(const char *f, int l,
> +void *mem, unsigned int size,
> +const char *desc);
> +}

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tsan/default_options.C
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-shouldfail "tsan" } */
> +
> +#include 
> +#include 
> +
> +extern "C" const char *__tsan_default_options() {
> +  return "report_bugs=0";

Similarly.

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tsan/fd_close_norace.C
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +void *Thread1(void *x) {
> +  int f = open("/dev/random", O_RDONLY);
> +  close(f);
> +  return NULL;
> +}
> +
> +void *Thread2(void *x) {
> +  sleep(1);
> +  int f = open("/dev/random", O_RDONLY);
> +  close(f);
> +  return NULL;
> +}
> +
> +int main() {
> +  pthread_t t[2];
> +  pthread_create(&t[0], NULL, Thread1, NULL);
> +  pthread_create(&t[1], NULL, Thread2, NULL);
> +  pthread_join(t[0], NULL);
> +  pthread_join(t[1], NULL);
> +  printf("OK\n");
> +}

Ditto.  Is the only non-C thing here the missing return 0; from main?

> +
> +/* { dg-prune-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
> diff --git a/gcc/testsuite/g++.dg/tsan/fd_close_norace2.C 
> b/gcc/testsuite/g++.dg/tsan/fd_close_norace2.C
> new file mode 100644
> index 000..f2d394c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tsan/fd_close_norace2.C

Likewise.

Jakub

RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-12-11 Thread Gopalasubramanian, Ganesh

Hi Uros!

Accommodated the changes that you mentioned.
Completed the bootstrap testing too.

Regards
Ganesh

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Wednesday, December 04, 2013 3:17 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther  
(richard.guent...@gmail.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh 
 wrote:

> Attached is the revised patch.
> The target independent part has been already approved and added.
>
> This revision of the patch adds a x86 tune definition and checks it while 
> deciding the unroll factor.
>
> Accommodated the comments given by you except one.
>
>> *x will never be null for active insns.
> Since every rtx in the insn is checked for memory references, the NULL_RTX 
> check is required.

Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by 
"There are no sub-expressions." comment.

+if (NONDEBUG_INSN_P (insn) && INSN_CODE (insn) != -1)

Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is 
enough.

+for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);
+}
+  free (bbs);
+
+  if (mem_count <=32)
+return 32/mem_count;

Ouch... mem_count can be zero. Is there a reason to change this part from 
previous patch?

Uros.

unroll-adjust.patch
Description: unroll-adjust.patch

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Yvan Roux

Hi Vladimir,

I've some regressions on ARM after this SP elimination patch, and they
are execution failures.  Here is the list:

g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
gcc.c-torture/execute/va-arg-22.c  -O2
gcc.dg/atomic/c11-atomic-exec-5.c  -O0
gfortran.dg/direct_io_12.f90  -O[23]
gfortran.dg/elemental_dependency_1.f90  -O2
gfortran.dg/matmul_2.f90  -O2
gfortran.dg/matmul_6.f90  -O2
gfortran.dg/mvbits_7.f90  -O3
gfortran.dg/unlimited_polymorphic_1.f03  -O3

I reduced and looked at var-arg-22.c and the issue is that in
lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
we try to do here is to change the pseudo 195 of the insn 118 below :

(insn 118 114 112 8 (set (reg:DI 195)
(unspec:DI [
(mem:DI (plus:SI (reg/f:SI 215)
(const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
+ 64B]+8 S8 A8])
] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
 (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
(const_int 8 [0x8])) [7 a35+8 S8 A32])
(nil)))

with its equivalent (x arg of lra_eliminate_regs_1):

(mem/c:DI (plus:SI (reg/f:SI 102 sfp)
(const_int 76 [0x4c])) [7 a35+8 S8 A32])

lra_eliminate_regs_1 is called with full_p = true (it is not really
clear for what it means), but in the PLUS switch case, we have offset
= 0xb (given by ep->offset) and  as lra_get_insn_recog_data
(insn)->sp_offset value is 0, we will indeed add 0xb to the original
0x4c offset.

So, here I don't get if it is the sp_offset value of the
lra_insn_recog_data element which is not well updated or if lra_
eliminate_regs_1 has to be called with update_p and not full_p (which
fixed the value in that particular case).  Is it more obvious for you
?

Thanks
Yvan

On 3 December 2013 16:39, Vladimir Makarov  wrote:
> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>
>> On 2 December 2013 23:44, Vladimir Makarov  wrote:
>>
>>> If somebody with the rights approves, I can commit it tomorrow.
>>>
>>> 2013-12-02  Vladimir Makarov  
>>>
>>>  * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
>>> Check
>>>  LR_REGNUM.
>>>  (aarch64_can_eliminate): Don't check elimination source when
>>>  frame_pointer_requred is false.
>>>
>>
>>
>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>
> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>

RE: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-11 Thread Bernd Edlinger

Hi,

On Tue, 10 Dec 2013 16:14:43, Richard Biener wrote:
>
> On Tue, Dec 10, 2013 at 4:02 PM, Richard Biener
>  wrote:
>> On Tue, Dec 10, 2013 at 11:53 AM, Eric Botcazou  
>> wrote:
 What we support is out of bounds accesses for heap vars if the var's type
 has flexible array member or something we treat similarly and there is the
 possibility that there could be payload after the heap var that could be
 accessed from the flexible array members or similar arrays.
>>>
>>> My question was about the above similar arrays, i.e. whether we consider all
>>> trailing arrays in structures as flexible-like or not. No strong opinion.
>>
>> Yes we do, even for struct { struct { int a; char a[1] } }; (note the not 
>> really
>> "trailing" as there is padding after the trailing array). We do take
>> size limitations from a DECL (if we see one) into account to limit the
>> effect of this trailing-array-supporting, so it effectively only applies to
>> indirect accesses (and the padding example above, you can use the whole
>> padding if DECL_SIZE allows that).
>>
 So, I don't see what is the big deal with BLKmode, because all the cases
 which actually could have flexible array member extra payloads (or similar)
 must necessarily live in memory, if it is the compiler that decides whether
 to put it into memory or keep in registers etc., then it can't be heap
 allocated.
>>>
>>> The invariant is that types for which objects can effectively have variable
>>> size must have BLKmode, otherwise you need to add very ugly code in the RTL
>>> expander to mask the lie.
>>
>> I wonder if we can make the expander more rely on the DECLs mode
>> and optimize only the DECLs mode (if we can constrain its size via
>> DECL_SIZE) to non-BLKmode instead of doing that for the TYPEs mode.
>> Yes, you'd have DECL_MODE != TYPE_MODE that way.
>>
>> Or rather I wonder if the expander doesn't already work that way
>> (looks at DECL_MODE).
>
> To speak in patches (completely untested) - sth like the following ontop
> of making more types have BLKmode:
>
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c (revision 205857)
> +++ gcc/stor-layout.c (working copy)
> @@ -569,8 +569,17 @@ layout_decl (tree decl, unsigned int kno
> bitsize_unit_node));
>
> if (code != FIELD_DECL)
> - /* For non-fields, update the alignment from the type. */
> - do_type_align (type, decl);
> + {
> + /* For non-fields, update the alignment from the type. */
> + do_type_align (type, decl);
> + /* Optimize the mode of the decl.
> + ??? Ensure choosen mode alignment fits decl alignment. */
> + if (DECL_MODE (decl) == BLKmode
> + && DECL_SIZE (decl)
> + && compare_tree_int (DECL_SIZE (t), MAX_FIXED_MODE_SIZE) <= 0)
> + DECL_MODE (decl)
> + = mode_for_size (TREE_INT_CST_LOW (DECL_SIZE (decl)), MODE_INT, 1);
> + }
> else
> /* For fields, it's a bit more complicated... */
> {
>
>
>> Richard.
>>
>>> --
>>> Eric Botcazou

Whatever the fix will be, it should contain at least the two test cases from
my patch, and, maybe - if that is possible - it would be nice to test it with
with SLOW_UNALIGNED_ACCESS defined like this:

config/i386/i386.h:
#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) 1


Bernd.

Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-12-11 Thread Uros Bizjak

On Wed, Dec 11, 2013 at 11:27 AM, Gopalasubramanian, Ganesh
 wrote:

> Accommodated the changes that you mentioned.
> Completed the bootstrap testing too.

Please provide updated ChangeLog.

The function comment is missing. Maybe you should also describe magic
number 32 here?
+static unsigned
+ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)

+  if (!ix86_tune_features [X86_TUNE_ADJUST_UNROLL])
+ return nunroll;

Please add

#define TARGET_ADJUST_UNROLL \
ix86_tune_features[X86_TUNE_ADJUST_UNROLL]

to i386.h and use definition here.

Otherwise, the patch looks OK.

BTW: Please avoid top-posting, see e.g. [1] for reasons...

[1] http://gcc.gnu.org/ml/gcc/2008-08/msg00133.html

Uros.

Re: [PATCH] Fix PR 59390

2013-12-11 Thread Uros Bizjak

On Wed, Dec 11, 2013 at 10:15 AM, Bernhard Reutner-Fischer
 wrote:

>>> Patch updated with two more tests to check if the vfmadd insn is being
>>> produced when possible.
>>>
>>> Thanks
>>> Sri
>>>
>>> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam  
>>> wrote:
 Hi,

 I have attached a patch to fix
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review.

 Here is the problem. GCC adds target-specific builtins on demand. The
 FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via
 this declaration:

 void fun() __attribute__((target("fma")));

 Specifically, the builtin __builtin_ia32_vfmaddpd gets added when
 ix86_add_new_builtins is called from ix86_valid_target_attribute_tree
 when processing this target attribute.

 Now, when the vectorizer is processing the builtin "__builtin_fma" in
 function other_fn(), it checks to see if this function is vectorizable
 and calls ix86_builtin_vectorized_function in i386.c. That returns the
 builtin stored here:

 case BUILT_IN_FMA:
 if (out_mode == DFmode && in_mode == DFmode)
 {
  if (out_n == 2 && in_n == 2)
return ix86_builtins[IX86_BUILTIN_VFMADDPD];

 ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE
 had the builtin not been added by the previous target attribute. That
 is why the code works if we remove the previous declaration.

 The fix is to not just return the builtin but to also check if the
 current function's isa allows the use of the builtin. For instance,
 this patch would solve the problem:

 @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, tre
if (out_mode == DFmode && in_mode == DFmode)
   {
if (out_n == 2 && in_n == 2)
 -return ix86_builtins[IX86_BUILTIN_VFMADDPD];
 +{
 +  if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa
 +  & global_options.x_ix86_isa_flags)
 +return ix86_builtins[IX86_BUILTIN_VFMADDPD];
 +  else
 + return NULL_TREE;
 +}

 but there are many instances of this usage in
 ix86_builtin_vectorized_function. This patch covers all the cases.
>>
>>> PR target/59390
>>> * gcc.target/i386/pr59390.c: New test.
>>> * gcc.target/i386/pr59390_1.c: New test.
>>> * gcc.target/i386/pr59390_2.c: New test.
>>> * config/i386/i386.c (get_builtin): New function.
>>> (ix86_builtin_vectorized_function): Replace all instances of
>>> ix86_builtins[...] with get_builtin(...).
>>> (ix86_builtin_reciprocal): Ditto.
>>
>> OK, with a couple of nits:
>>
>> +static tree get_builtin (enum ix86_builtins code)
>>
>> Please name this function ix86_get_builtin.
>>
>> +  if (current_function_decl)
>> +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
>> +  if (target_tree)
>> +opts = TREE_TARGET_OPTION (target_tree);
>> +  else
>> +opts = TREE_TARGET_OPTION (target_option_default_node);
>> +
>>
>> opts = TREE_TARGET_OPTION (target_tree ? target_tree :
>> target_option_default_node);
>
> Just curious:
>> +static tree get_builtin (enum ix86_builtins code)
>> +{
> []
>> +>
> []
>> @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre
>>if (out_mode == DFmode && in_mode == DFmode)
>> {
>>   if (out_n == 2 && in_n == 2)
>> -return ix86_builtins[IX86_BUILTIN_SQRTPD];
>> +get_builtin (IX86_BUILTIN_SQRTPD);
>>   else if (out_n == 4 && in_n == 4)
>> -return ix86_builtins[IX86_BUILTIN_SQRTPD256];
>> +get_builtin (IX86_BUILTIN_SQRTPD256);
>> }
>>break;
>
> I must be missing something?
> Don't you have to return ix86_get_builtin(...) ?

Huh, I didn't even notice this mistake.

Uros.

Re: New tsan tests.

2013-12-11 Thread Yury Gribov


> This test would fail on strict alignment targets.

Ok, we'll throw in a dg-skip-if.

> Why is the test C++ only though?
> Again, why C++ only?
> Ditto.
> Likewise.

I think main reason is having same versions of code with Clang. I had an 
impression that minimization of changes would be preferred and C vs. C++ 
should not make much difference for TSan anyway.


We can convert the tests if necessary though.

-Y

Re: New tsan tests.

2013-12-11 Thread Jakub Jelinek

On Wed, Dec 11, 2013 at 03:05:22PM +0400, Yury Gribov wrote:
> > This test would fail on strict alignment targets.
> 
> Ok, we'll throw in a dg-skip-if.
> 
> > Why is the test C++ only though?
> > Again, why C++ only?
> > Ditto.
> > Likewise.
> 
> I think main reason is having same versions of code with Clang. I
> had an impression that minimization of changes would be preferred
> and C vs. C++ should not make much difference for TSan anyway.
> 
> We can convert the tests if necessary though.

I guess I can live with it as is, just was wondering why the tests
were written so carelessly that when they don't really need to test
any C++ features they are still written in C++.

Jakub

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Ramana Radhakrishnan

Yvan,

On Wed, Dec 11, 2013 at 10:35 AM, Yvan Roux  wrote:
> Hi Vladimir,
>
> I've some regressions on ARM after this SP elimination patch, and they
> are execution failures.  Here is the list:

Pragmatically, I think it's time we turned LRA on by default now that
we are in stage3 and that would help with getting more issues out of
the auto-testers quicker than anything else. Given we are now well
into stage3, we should make sure that the LRA support gets as much
testing as it can get in the run-up to the release.

Can you prepare a patch for this please ?

regards
Ramana



>
> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
> gcc.c-torture/execute/va-arg-22.c  -O2
> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
> gfortran.dg/direct_io_12.f90  -O[23]
> gfortran.dg/elemental_dependency_1.f90  -O2
> gfortran.dg/matmul_2.f90  -O2
> gfortran.dg/matmul_6.f90  -O2
> gfortran.dg/mvbits_7.f90  -O3
> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>
> I reduced and looked at var-arg-22.c and the issue is that in
> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
> we try to do here is to change the pseudo 195 of the insn 118 below :
>
> (insn 118 114 112 8 (set (reg:DI 195)
> (unspec:DI [
> (mem:DI (plus:SI (reg/f:SI 215)
> (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
> + 64B]+8 S8 A8])
> ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>  (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
> (const_int 8 [0x8])) [7 a35+8 S8 A32])
> (nil)))
>
> with its equivalent (x arg of lra_eliminate_regs_1):
>
> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
> (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>
> lra_eliminate_regs_1 is called with full_p = true (it is not really
> clear for what it means), but in the PLUS switch case, we have offset
> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
> 0x4c offset.
>
> So, here I don't get if it is the sp_offset value of the
> lra_insn_recog_data element which is not well updated or if lra_
> eliminate_regs_1 has to be called with update_p and not full_p (which
> fixed the value in that particular case).  Is it more obvious for you
> ?
>
> Thanks
> Yvan
>
>
> On 3 December 2013 16:39, Vladimir Makarov  wrote:
>> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>>
>>> On 2 December 2013 23:44, Vladimir Makarov  wrote:
>>>
 If somebody with the rights approves, I can commit it tomorrow.

 2013-12-02  Vladimir Makarov  

  * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
 Check
  LR_REGNUM.
  (aarch64_can_eliminate): Don't check elimination source when
  frame_pointer_requred is false.

>>>
>>>
>>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>>
>> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>>

Re: Two build != host fixes

2013-12-11 Thread Bernd Edlinger

Hi,

I'm having problems with that patch.

I try to start at X86_64-linux-gnu, and I want to get the GCC running on 
arm-linux-gnueabihf.
I grabbed system headers and libraries from the target and put it in the prefix 
path.

In the first step I do

../gcc-4.9-20131208/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 
--target=arm-linux-gnueabihf --enable-languages=c,c++,fortran 
--with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard

This GCC runs on PC and generates arm-linux-gnueabihf executables.

Then I try this

../gcc-4.9-20131208/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross 
--host=arm-linux-gnueabihf --target=arm-linux-gnueabihf 
--enable-languages=c,c++,fortran --with-arch=armv7-a --with-tune=cortex-a9 
--with-fpu=vfpv3-d16 --with-float=hard

But It fails because auto-build.h contains nonsense. That is probably because 
almost every check
has a fatal error #include  not found.

I personally prefer to have gmp, mpfr, mpc in-tree (using 
contrib/download_prerequisites).

I experimented a bit and at least this attached patch improves the situation 
for me.

Maybe I never had any problems with GMP before, because the in-tree 
configuration of GMP does -DNO_ASM ?



Regards
Bernd.

patch-configure.diff
Description: Binary data

Re: New tsan tests.

2013-12-11 Thread Yury Gribov


> I guess I can live with it as is, just was wondering why the tests
> were written so carelessly that when they don't really need to test
> any C++ features they are still written in C++.

Adding Kostya to comment.

-Y

Re: questions about COND_EXEC and SMS

2013-12-11 Thread Steven Bosscher

On Thu, Dec 5, 2013 at 12:11 PM, dxq wrote:
> hi all,
>
> *We found that COND_EXEC is better than IF_THEN_ELSE when used as expressing
> condition move insns, because in sched, IF_THEN_ELSE insn has a dependence
> on itself, and COND_EXEC has not.
> * Besides,  IF_THEN_ELSE is not good for SMS. some backend (frv) expands
> condition move as IF_THEN_ELSE pattern before reload and splits IF_THEN_ELSE
> into COND_EXEC at split2 pass, which is a post reolad pass. However, in SMS
> pass(pre-reload), we can't get the accurate information of IF_THEN_ELSE
> insns.
> * However, as far as i know, COND_EXEC is not supporting in pre-reload
> passes.
>
> So, I'm asking for some suggestions for supporiting COND_EXEC in pre-reload
> passes, for me, maybe from split1 to reload pass is good enough. what work
> need to do for supporting that, and is there any one who has done any work
> on that?

Supporting COND_EXEC before register allocation (RA) is in itself not
a big challenge. The problem is handling them *during* register
allocation: How to do spilling/reloading registers used in a
predicated instruction, and how to spill/reload the predicate
condition (condition code or BImode register).

If I would have to add support for this myself, I'd do it as follows:

1. Make your port work with LRA instead of reload.

2. Make pre-RA passes handle COND_EXEC with the predicate
(COND_EXEC_COND) explicitly assigned to a hard register and only
predicated instructions (COND_EXEC_EXPR) that don't need reloads (e.g.
only reg->reg and imm->reg moves).

3. Try to get the simple test cases from (1) to get through IRA/LRA. I
wouldn't try doing it with reload. With LRA it's likely simpler to add
support for this because you can spill to pseudo-registers.

4. Gradually relax the constraints on the kind of COND_EXEC
instructions you accept before reload, and play whack-a-mole on all
bugs that ubdoubtedly will pop up.

Supporting COND_EXEC before reload has been discussed before on this
mailing list, but I don't think anyone has every started a serious
effort to implement it. There are so few targets that use COND_EXPR
and the ones that do (ia64) are falling out of favor. I expect (or at
least: hope) that LRA makes it a lot easier to implement it.

Ciao!
Steven

Re: New tsan tests.

2013-12-11 Thread Konstantin Serebryany

What's wrong with C++?  :)
Upstream we have 16 .c tests and 106 .cc tests in
compiler-rt/lib/tsan/lit_tests.
We typically prefer .cc because imo C++ is a better language (even
when using what looks like the C subset).
But we need some .c tests to make sure that tsan still works w/o C++ run-time.

--kcc

On Wed, Dec 11, 2013 at 3:12 PM, Yury Gribov  wrote:
>> I guess I can live with it as is, just was wondering why the tests
>> were written so carelessly that when they don't really need to test
>> any C++ features they are still written in C++.
>
> Adding Kostya to comment.
>
> -Y

Re: New tsan tests.

2013-12-11 Thread Jakub Jelinek

On Wed, Dec 11, 2013 at 03:21:32PM +0400, Konstantin Serebryany wrote:
> What's wrong with C++?  :)
> Upstream we have 16 .c tests and 106 .cc tests in
> compiler-rt/lib/tsan/lit_tests.
> We typically prefer .cc because imo C++ is a better language (even

That is a matter of opinion.

> when using what looks like the C subset).

The question is why don't you limit to the subset of the two languages
when it doesn't cost anything.  As I've mentioned on the patch, some of
the tests were C++ only (well, also C99) just because there wasn't return 0;
in main, others just because they used reinterpret_cast instead of C cast
where it didn't result in any advantage.  Some just because they used
new instead of malloc, though in this case I can understand that you might
want to test how is operator new instrumented.

Jakub

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Yvan Roux

> Pragmatically, I think it's time we turned LRA on by default now that
> we are in stage3 and that would help with getting more issues out of
> the auto-testers quicker than anything else. Given we are now well
> into stage3, we should make sure that the LRA support gets as much
> testing as it can get in the run-up to the release.

I agree.

> Can you prepare a patch for this please ?

I'll post the patch on the list.

Thanks,
Yvan

Re: New tsan tests.

2013-12-11 Thread Konstantin Serebryany

On Wed, Dec 11, 2013 at 3:27 PM, Jakub Jelinek  wrote:
> On Wed, Dec 11, 2013 at 03:21:32PM +0400, Konstantin Serebryany wrote:
>> What's wrong with C++?  :)
>> Upstream we have 16 .c tests and 106 .cc tests in
>> compiler-rt/lib/tsan/lit_tests.
>> We typically prefer .cc because imo C++ is a better language (even
>
> That is a matter of opinion.

Of course! (I did say "imo")

>
>> when using what looks like the C subset).
>
> The question is why don't you limit to the subset of the two languages
> when it doesn't cost anything.

Mostly because of my (our) opinion above. :)
GCC test suite may have more .c tests than .cc tests; I don't have an
opinion here.
We need some C++specific tests (e.g. with operator new) and we need at
least one C test.

--kcc

>  As I've mentioned on the patch, some of
> the tests were C++ only (well, also C99) just because there wasn't return 0;
> in main, others just because they used reinterpret_cast instead of C cast
> where it didn't result in any advantage.  Some just because they used
> new instead of malloc, though in this case I can understand that you might
> want to test how is operator new instrumented.
>
> Jakub

[PATCH, ARM, LRA] Switch on LRA on ARM.

2013-12-11 Thread Yvan Roux

Hi,

here is the patch to turn LRA on by default on ARM, there is still
some regressions in the testsuite, as reported in the thread below,
but as Ramana said in the same thread, we now need to find the
remaining issue as fast as possible.

http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01074.html

Thanks
Yvan


2013-12-11  Yvan Roux  

* config/arm/arm.opt (mlra): Enable LRA by default.
Index: gcc/config/arm/arm.opt
===
--- gcc/config/arm/arm.opt  (revision 205885)
+++ gcc/config/arm/arm.opt  (working copy)
@@ -144,7 +144,7 @@
 Specify the name of the target floating point hardware/format
 
 mlra
-Target Report Var(arm_lra_flag) Init(0) Save
+Target Report Var(arm_lra_flag) Init(1) Save
 Use LRA instead of reload (transitional)
 
 mhard-float

Re: [PATCH, ARM, LRA] Switch on LRA on ARM.

2013-12-11 Thread Richard Earnshaw

On 11/12/13 11:47, Yvan Roux wrote:
> Hi,
> 
> here is the patch to turn LRA on by default on ARM, there is still
> some regressions in the testsuite, as reported in the thread below,
> but as Ramana said in the same thread, we now need to find the
> remaining issue as fast as possible.
> 
> http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01074.html
> 
> Thanks
> Yvan
> 
> 
> 2013-12-11  Yvan Roux  
> 
> * config/arm/arm.opt (mlra): Enable LRA by default.
> 
> 
> lra-on.diff
> 
> 
> Index: gcc/config/arm/arm.opt
> ===
> --- gcc/config/arm/arm.opt(revision 205885)
> +++ gcc/config/arm/arm.opt(working copy)
> @@ -144,7 +144,7 @@
>  Specify the name of the target floating point hardware/format
>  
>  mlra
> -Target Report Var(arm_lra_flag) Init(0) Save
> +Target Report Var(arm_lra_flag) Init(1) Save
>  Use LRA instead of reload (transitional)
>  
>  mhard-float
> 

OK.

R.

Re: New tsan tests.

2013-12-11 Thread Jakub Jelinek

On Wed, Dec 11, 2013 at 03:35:37PM +0400, Konstantin Serebryany wrote:
> >> when using what looks like the C subset).
> >
> > The question is why don't you limit to the subset of the two languages
> > when it doesn't cost anything.
> 
> Mostly because of my (our) opinion above. :)
> GCC test suite may have more .c tests than .cc tests; I don't have an
> opinion here.
> We need some C++specific tests (e.g. with operator new) and we need at
> least one C test.

In the GCC testsuite, you can put tests that are written in the common
subset of C and C++ that you want to test by both frontends into
c-c++-common (they even can have different expected errors etc. through
{ target c } or { target c++ }), you can use #ifdef __cplusplus in the tests
too.  Anyway, let's keep the current tests as is, the patch is ok for trunk,
and if in the future you wouldn't mind making more tests written in the
common subset of C/C++, it would be certainly appreciated.

Jakub

Re: [ARM] Fix register r3 wrongly used to save ip in nested APCS frame

2013-12-11 Thread Eric Botcazou

> Revised patch attached, your testcase passes on arm-eabi with it.  Does it
> look OK to you?  If so, I'll run a testing cycle on arm-vxworks and
> arm-eabi.
> 
> 
>   * config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
>   arguments to push onto the stack and no varargs, save ip into the last
>   stack slot if r3 isn't available on entry.

No problems detected for the patch on arm-vxworks or arm-eabi.

-- 
Eric Botcazou

Re: AARCH64 configure check for gas -mabi support

2013-12-11 Thread Marcus Shawcroft


On 10/12/13 20:23, Kugan wrote:


gcc/

+2013-12-11  Kugan Vivekanandarajah  
+   * configure.ac: Add check for aarch64 assembler -mabi support.
+   * configure: Regenerate.
+   * config.in: Regenerate.
+   * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define.
+   (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC.
+   * config/aarch64/aarch64.h (aarch64_override_options):  Issue error if
+   assembler does not support -mabi and option ilp32 is selected.
+   * doc/install.texi: Added note that building gcc 4.9 and after with pre
+   2.24 binutils will not support -mabi=ilp32.
+



Kugan, Thanks for sorting this out. OK to commit.

/Marcus

Re: Two build != host fixes

2013-12-11 Thread Alan Modra

On Wed, Dec 11, 2013 at 12:10:04PM +0100, Bernd Edlinger wrote:
> Hi,
> 
> I'm having problems with that patch.

Sorry to hear that.

> I try to start at X86_64-linux-gnu, and I want to get the GCC running on 
> arm-linux-gnueabihf.
> I grabbed system headers and libraries from the target and put it in the 
> prefix path.
> 
> In the first step I do
> 
> ../gcc-4.9-20131208/configure 
> --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 
> --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran 
> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 
> --with-float=hard
> 
> This GCC runs on PC and generates arm-linux-gnueabihf executables.
> 
> Then I try this
> 
> ../gcc-4.9-20131208/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross 
> --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf 
> --enable-languages=c,c++,fortran --with-arch=armv7-a --with-tune=cortex-a9 
> --with-fpu=vfpv3-d16 --with-float=hard
> 
> But It fails because auto-build.h contains nonsense. That is probably because 
> almost every check
> has a fatal error #include  not found.
> 
> I personally prefer to have gmp, mpfr, mpc in-tree (using 
> contrib/download_prerequisites).
> 
> I experimented a bit and at least this attached patch improves the situation 
> for me.
> 
> Maybe I never had any problems with GMP before, because the in-tree 
> configuration of GMP does -DNO_ASM ?

GMPINC really shouldn't be used to find build headers, since it is used
to find host headers.  See the top level Makefile.in.  When gmp has
been installed, using GMPINC means you pull in a whole lot of host
headers for the build compiler.  Which might work in rare cases, but
it's a lot more likely to fail.  Even with in-tree gmp, how do you get
things like GMP_LIMB_BITS correct if your build machine is 64-bit and
your host is 32-bit?  (Perhaps there is some build magic that allows
this to work, I'll investigate when I get back from vacation.)

Incidentally, we've been using a couple of other patches for
build != host that I haven't posted because I wasn't sure who authored
them.  It's possible the first one might help you.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index aad927c..7995e64 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -747,7 +747,8 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
 
 # Native linker and preprocessor flags.  For x-fragment overrides.
 BUILD_LDFLAGS=@BUILD_LDFLAGS@
-BUILD_CPPFLAGS=$(ALL_CPPFLAGS)
+BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
+  -I$(srcdir)/../include $(CPPINC)
 
 # Actual name to use when installing a native compiler.
 GCC_INSTALL_NAME := $(shell echo gcc|sed '$(program_transform_name)')
diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index 57f9009..e1d3ed6 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -625,7 +625,7 @@ ada.tags: force
 ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb
-$(MKDIR) ada/doctools
$(CP) $^ ada/doctools
-   cd ada/doctools && $(GNATMAKE) -q xgnatugn
+   cd ada/doctools && gnatmake -q xgnatugn
 
 # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on
 # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info
diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in
index 794d374..6b0d5e8 100644
--- a/gnattools/Makefile.in
+++ b/gnattools/Makefile.in
@@ -23,6 +23,7 @@ SHELL = @SHELL@
 srcdir = @srcdir@
 libdir = @libdir@
 build = @build@
+host = @host@
 target = @target@
 prefix = @prefix@
 INSTALL = @INSTALL@
@@ -31,6 +32,7 @@ INSTALL_PROGRAM = @INSTALL_PROGRAM@
 
 # Nonstandard autoconf-set variables.
 LN_S=@LN_S@
+host_alias=@host_alias@
 target_noncanonical=@target_noncanonical@
 
 # Variables for the user (or the top level) to override.
@@ -183,7 +185,11 @@ regnattools: $(GCC_DIR)/stamp-gnatlib-rts
 # put the host RTS dir first in the PATH to hide the default runtime
 # files that are among the sources
 # FIXME: This should be done in configure.
+ifeq ($(host), $(build))
 RTS_DIR:=$(strip $(subst \,/,$(shell gnatls -v | grep adalib )))
+else
+RTS_DIR:=$(strip $(subst \,/,$(shell $(host_alias)-gnatls -v | grep adalib )))
+endif
 gnattools-cross: $(GCC_DIR)/stamp-tools
# gnattools1-re
$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \

-- 
Alan Modra
Australia Development Lab, IBM

Re: AARCH64 configure check for gas -mabi support

2013-12-11 Thread Christophe Lyon

Committed on Kugan's behalf as rev 205891.

On 11 December 2013 13:27, Marcus Shawcroft  wrote:
> On 10/12/13 20:23, Kugan wrote:
>
>> gcc/
>>
>> +2013-12-11  Kugan Vivekanandarajah  
>> +   * configure.ac: Add check for aarch64 assembler -mabi support.
>> +   * configure: Regenerate.
>> +   * config.in: Regenerate.
>> +   * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define.
>> +   (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC.
>> +   * config/aarch64/aarch64.h (aarch64_override_options):  Issue
>> error if
>> +   assembler does not support -mabi and option ilp32 is selected.
>> +   * doc/install.texi: Added note that building gcc 4.9 and after
>> with pre
>> +   2.24 binutils will not support -mabi=ilp32.
>> +
>>
>
> Kugan, Thanks for sorting this out. OK to commit.
>
> /Marcus
>

[Patch, Fortran, F03] PR 58916: Allocation of scalar with array source not rejected

2013-12-11 Thread Janus Weil

Hi all,

attached is a small patch which fixes accepts-invalid and
ICE-on-invalid problems on allocation with source. Regtested on
x86_64-unknown-linux-gnu. Ok for trunk?

Cheers,
Janus


2013-12-11  Janus Weil  

PR fortran/58916
* resolve.c (conformable_arrays): Treat scalar 'e2'.
(resolve_allocate_expr): Check rank also for unlimited-polymorphic
variables.


2013-12-11  Janus Weil  

PR fortran/58916
* gfortran.dg/allocate_with_source_4.f90: New.
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 205872)
+++ gcc/fortran/resolve.c   (working copy)
@@ -6597,7 +6597,8 @@ conformable_arrays (gfc_expr *e1, gfc_expr *e2)
   for (tail = e2->ref; tail && tail->next; tail = tail->next);
 
   /* First compare rank.  */
-  if (tail && e1->rank != tail->u.ar.as->rank)
+  if ((tail && e1->rank != tail->u.ar.as->rank)
+  || (!tail && e1->rank != e2->rank))
 {
   gfc_error ("Source-expr at %L must be scalar or have the "
 "same rank as the allocate-object at %L",
@@ -6794,8 +6795,7 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code
}
 
   /* Check F03:C632 and restriction following Note 6.18.  */
-  if (code->expr3->rank > 0 && !unlimited
- && !conformable_arrays (code->expr3, e))
+  if (code->expr3->rank > 0 && !conformable_arrays (code->expr3, e))
goto failure;
 
   /* Check F03:C633.  */
! { dg-do compile }
!
! PR 58916: [F03] Allocation of scalar with array source not rejected
!
! Contributed by Vladimir Fuka 

  class(*), allocatable :: a1
  real, allocatable :: a2  
  real b(1)
  allocate(a1, source=b)  ! { dg-error "must be scalar or have the same rank" }
  allocate(a2, source=b)  ! { dg-error "must be scalar or have the same rank" }
end

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread H.J. Lu

On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
 wrote:
> Richard Henderson  writes:
>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>> a single register.  On a little-endian target, the offset cannot be
>>> anything other than 0 in that case.
>>>
>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>> something that is always invalid, regardless of the target.  That kind
>>> of situation should be rejected by target-independent code instead.
>>
>> But, we want to disable the subreg before we know whether or not (reg:V4SF X)
>> will be allocated to a single hard register.  That is something that we can't
>> know in target-independent code before register allocation.
>
> I was thinking that if we've got a class, we've also got things like
> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
> But even in the padding cases an offset-based check in C_C_M_C could
> be derived from other information.
>
> subreg_get_info handles padding with:
>
>   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>   if (GET_MODE_INNER (xmode) == VOIDmode)
> xmode_unit = xmode;
>   else
> xmode_unit = GET_MODE_INNER (xmode);
>   gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>   gcc_assert (nregs_xmode
>   == (GET_MODE_NUNITS (xmode)
>   * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>   gcc_assert (hard_regno_nregs[xregno][xmode]
>   == (hard_regno_nregs[xregno][xmode_unit]
>   * GET_MODE_NUNITS (xmode)));
>
>   /* You can only ask for a SUBREG of a value with holes in the middle
>  if you don't cross the holes.  (Such a SUBREG should be done by
>  picking a different register class, or doing it in memory if
>  necessary.)  An example of a value with holes is XCmode on 32-bit
>  x86 with -m128bit-long-double; it's represented in 6 32-bit 
> registers,
>  3 for each part, but in memory it's two 128-bit parts.
>  Padding is assumed to be at the end (not necessarily the 'high part')
>  of each unit.  */
>   if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>< GET_MODE_NUNITS (xmode))
>   && (offset / GET_MODE_SIZE (xmode_unit)
>   != ((offset + GET_MODE_SIZE (ymode) - 1)
>   / GET_MODE_SIZE (xmode_unit
> {
>   info->representable_p = false;
>   rknown = true;
> }
>
> and I wouldn't really want to force targets to individually reproduce
> that kind of logic at the class level.  If the worst comes to the worst
> we could cache the difficult cases.
>

My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
to know if the subreg byte is zero or not.  It doesn't care about mode
padding.  You are concerned about information passed to
CANNOT_CHANGE_MODE_CLASS is too expensive for target
to process.  It isn't the case for x86.  Am I correct that mode can't change
if subreg byte is non-zero?  A target can just check subreg byte != 0,
like my patch does.

Here is a patch to add SUBREG_BYTE to CANNOT_CHANGE_MODE_CLASS.
Tested on Linux/x86-64.  Does it look OK?

Thanks.

-- 
H.J.
---
2013-12-11   H.J. Lu  

* combine.c (subst): Pass subreg byte to REG_CANNOT_CHANGE_MODE_P.
(simplify_set): Likewise.
* emit-rtl.c (validate_subreg): Likewise.
* recog.c (register_operand): Likewise.
* rtlanal.c (simplify_subreg_regno): Likewise.
* hard-reg-set.h (REG_CANNOT_CHANGE_MODE_P): Add SUBREG_BYTE
and pass it to CANNOT_CHANGE_MODE_CLASS.
* regcprop.c (mode_change_ok): Pass unknown subreg byte to
REG_CANNOT_CHANGE_MODE_P.
* reginfo.c (record_subregs_of_mode): Pass unknown subreg byte
to CANNOT_CHANGE_MODE_CLASS.
* postreload.c (reload_cse_simplify_set): Pass subreg byte to
CANNOT_CHANGE_MODE_CLASS.
(reload_cse_simplify_operands): Likewise.
* reload.c (push_reload): Likewise.
* reload1.c (choose_reload_regs): Pass subreg byte to
REG_CANNOT_CHANGE_MODE_P.
(inherit_piecemeal_p): Pass unknown subreg byte to
REG_CANNOT_CHANGE_MODE_P.
* config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Add
and ignore subreg byte.
* config/alpha/alpha.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/ia64/ia64.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/m32c/m32c.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/mep/mep.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/mips/mips.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/msp430/msp430.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/pa/pa32-regs.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/pa/pa64-regs.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
* config/pdp11/pdp11.h (CANNO

RE: Two build != host fixes

2013-12-11 Thread Bernd Edlinger

On Wed, 11 Dec 2013 23:11:46, Alan Modra wrote:
>
> On Wed, Dec 11, 2013 at 12:10:04PM +0100, Bernd Edlinger wrote:
>> Hi,
>>
>> I'm having problems with that patch.
>
> Sorry to hear that.
>

Never mind.

I have similar patches, but I did not 
>> I try to start at X86_64-linux-gnu, and I want to get the GCC running on 
>> arm-linux-gnueabihf.
>> I grabbed system headers and libraries from the target and put it in the 
>> prefix path.
>>
>> In the first step I do
>>
>> ../gcc-4.9-20131208/configure 
>> --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 
>> --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran 
>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 
>> --with-float=hard
>>
>> This GCC runs on PC and generates arm-linux-gnueabihf executables.
>>
>> Then I try this
>>
>> ../gcc-4.9-20131208/configure 
>> --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross --host=arm-linux-gnueabihf 
>> --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran 
>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 
>> --with-float=hard
>>
>> But It fails because auto-build.h contains nonsense. That is probably 
>> because almost every check
>> has a fatal error #include  not found.
>>
>> I personally prefer to have gmp, mpfr, mpc in-tree (using 
>> contrib/download_prerequisites).
>>
>> I experimented a bit and at least this attached patch improves the situation 
>> for me.
>>
>> Maybe I never had any problems with GMP before, because the in-tree 
>> configuration of GMP does -DNO_ASM ?
>
> GMPINC really shouldn't be used to find build headers, since it is used
> to find host headers. See the top level Makefile.in. When gmp has
> been installed, using GMPINC means you pull in a whole lot of host
> headers for the build compiler. Which might work in rare cases, but
> it's a lot more likely to fail. Even with in-tree gmp, how do you get
> things like GMP_LIMB_BITS correct if your build machine is 64-bit and
> your host is 32-bit? (Perhaps there is some build magic that allows
> this to work, I'll investigate when I get back from vacation.)
>

I do not know, but until last week the only problem was a missing SSIZE_MAX
in gcc/config/host-linux.c (glimits.h does not define this, and fix-include 
replaced mine!)

We need the auto-build only to build something that translates .md files to .c,
so I would'nt care about GMP, but some other things, like the right prototype 
for
printf make a difference.

now the auto-build.h has 
#define HAVE_DECL_SBRK 0

last week that was
#define HAVE_DECL_SBRK 1

I can give you my sys-root files, and you can play with it if you like.


> Incidentally, we've been using a couple of other patches for
> build != host that I haven't posted because I wasn't sure who authored
> them. It's possible the first one might help you.
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index aad927c..7995e64 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -747,7 +747,8 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
>
> # Native linker and preprocessor flags. For x-fragment overrides.
> BUILD_LDFLAGS=@BUILD_LDFLAGS@
> -BUILD_CPPFLAGS=$(ALL_CPPFLAGS)
> +BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
> + -I$(srcdir)/../include $(CPPINC)
>

I did not have this one.
What is it good for?

> # Actual name to use when installing a native compiler.
> GCC_INSTALL_NAME := $(shell echo gcc|sed '$(program_transform_name)')
> diff --git a/gcc/ada/gcc-interface/Make-lang.in 
> b/gcc/ada/gcc-interface/Make-lang.in
> index 57f9009..e1d3ed6 100644
> --- a/gcc/ada/gcc-interface/Make-lang.in
> +++ b/gcc/ada/gcc-interface/Make-lang.in
> @@ -625,7 +625,7 @@ ada.tags: force
> ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb
> -$(MKDIR) ada/doctools
> $(CP) $^ ada/doctools
> - cd ada/doctools && $(GNATMAKE) -q xgnatugn
> + cd ada/doctools && gnatmake -q xgnatugn
>

Yes, I also have that. It's a show-stopper for Ada.

> # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on
> # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info
> diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in
> index 794d374..6b0d5e8 100644
> --- a/gnattools/Makefile.in
> +++ b/gnattools/Makefile.in
> @@ -23,6 +23,7 @@ SHELL = @SHELL@
> srcdir = @srcdir@
> libdir = @libdir@
> build = @build@
> +host = @host@
> target = @target@
> prefix = @prefix@
> INSTALL = @INSTALL@
> @@ -31,6 +32,7 @@ INSTALL_PROGRAM = @INSTALL_PROGRAM@
>
> # Nonstandard autoconf-set variables.
> LN_S=@LN_S@
> +host_alias=@host_alias@
> target_noncanonical=@target_noncanonical@
>
> # Variables for the user (or the top level) to override.
> @@ -183,7 +185,11 @@ regnattools: $(GCC_DIR)/stamp-gnatlib-rts
> # put the host RTS dir first in the PATH to hide the default runtime
> # files that are among the sources
> # FIXME: This should be done in configure.
> +ifeq ($(host), $(build))
> RTS_DIR:=$(strip $(subst \,/,$(shell gnatls -v | grep adalib )))
> +else
> +RTS_DIR:=$(strip $(subst \,

Re: Two build != host fixes

2013-12-11 Thread Iain Sandoe


On 11 Dec 2013, at 13:11, Bernd Edlinger wrote:
> I did not have this one.
> What is it good for?
> 
>> # Actual name to use when installing a native compiler.
>> GCC_INSTALL_NAME := $(shell echo gcc|sed '$(program_transform_name)')
>> diff --git a/gcc/ada/gcc-interface/Make-lang.in 
>> b/gcc/ada/gcc-interface/Make-lang.in
>> index 57f9009..e1d3ed6 100644
>> --- a/gcc/ada/gcc-interface/Make-lang.in
>> +++ b/gcc/ada/gcc-interface/Make-lang.in
>> @@ -625,7 +625,7 @@ ada.tags: force
>> ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb
>> -$(MKDIR) ada/doctools
>> $(CP) $^ ada/doctools
>> - cd ada/doctools && $(GNATMAKE) -q xgnatugn
>> + cd ada/doctools && gnatmake -q xgnatugn
>> 
> 
> Yes, I also have that. It's a show-stopper for Ada.

I have some more fixes for Ada cross-builds that Eric commented on but need a 
little more work - will try to re-test this evening and re-post tomorrow.
Iain

Re: [Patch, Fortran, F03] PR 58916: Allocation of scalar with array source not rejected

2013-12-11 Thread Paul Richard Thomas

Dear Janus,

It looks good to me - OK for trunk.

Thanks for the patch.

Paul

On 11 December 2013 14:02, Janus Weil  wrote:
> Hi all,
>
> attached is a small patch which fixes accepts-invalid and
> ICE-on-invalid problems on allocation with source. Regtested on
> x86_64-unknown-linux-gnu. Ok for trunk?
>
> Cheers,
> Janus
>
>
> 2013-12-11  Janus Weil  
>
> PR fortran/58916
> * resolve.c (conformable_arrays): Treat scalar 'e2'.
> (resolve_allocate_expr): Check rank also for unlimited-polymorphic
> variables.
>
>
> 2013-12-11  Janus Weil  
>
> PR fortran/58916
> * gfortran.dg/allocate_with_source_4.f90: New.



-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jan Hubicka

> On 12/10/2013 04:48 AM, Jan Hubicka wrote:
> >The case where I played with local comdats was the following.
> >I made cgraph_body_availability to get context argument (i.e. instead of 
> >saying
> >if something is available in current unit, it was saying if it is available
> >in current function/variable).
> >
> >If two symbols are in the same comdat group and refering each other, they are
> >available even though they may seem overwritable to others. I then started to
> >produce local symbols for those local references that are equivalent to your 
> >comdat
> >local.
> >
> >We probably want to get in this extension too. (I did not get data on how 
> >often
> >it fires, but it does i.e. for recursive calls of C++ inlines).
> 
> I wouldn't expect it to affect anything other than recursive calls,
> since before my patch functions in the same COMDAT don't call each
> other, and with my patch they only call functions that are already
> local.
> 
> Also, this optimization would seem to apply to all recursive
> functions, not just those in comdat groups.

Agreed, I already do the conversion for many functions (based on "inline"
keyword implying that there is no overwrite changing semantic). So far the 
conversion
does not happen for comdats, since it would lead to local comdat and I also 
ignored the
conversion rule.
I have patch for that that handles it post-inlining + inliner patch that takes 
advantage
of function context to allow recursive inlining.
> 
> Are you thinking to add this after my patch is in?

Yes, lets do that incrementally.
> 
> >>+  /* Don't clone decls local to a comdat group.  */
> >>+  if (comdat_local_p (node))
> >>+return false;
> >
> >Why? It seems this should work if the clone becomes another comdat local?
> 
> Right, I think the problem was that the clone wasn't becoming comdat
> local.  And for the specific case of decloning, there's no point in
> cloning the decloned constructor.

If it does not make sense, how we ended up cloning it?
Can you show me some code sample of decloning?  

I assume that we basically turn original cloned constructors into wrappers
around common "worker" that is now comdat local.  I think ipa-cp may end up
deciding on clonning the wrapper that will break because it will end up static
calling the local comdat function.
On the other hand it may decide to clone both the wrapper and the actual 
function
and in that case bringing both static is fine. 

So to be on safe side, we probably want to consider functions calling local 
comdat
unclonable but we do not need to worry about local comdats themselves.
For good codegen, I think ipa-cp will need to understand it needs to either 
clone
both or nothing. I think it is something Martin can look into?
> 
> >On the other hand, I think you want to prevent ipa-cp propagating addresses 
> >of comdat
> >locals out of the function. (i.e. make it to check can_refer predicate for 
> >its subtitutions)
> 
> Right, I wasn't worrying about that because it can't come up with decloning.
> 
> >So we should check here if both caller and callee are in the same group and 
> >allow
> >inlining then?
> 
> Makes sense.
> 
> >Probably you want to get make_decl_local to preserve comdat group; then you 
> >won't need
> >to care about it here and clonning could work.
> 
> OK.
> 
> >Probably also when declaring a comdat symbol local, we want to turn all 
> >associated
> >comdats to local, right? (i.e. with LTO and hidden comdat)
> 
> I don't think so; if we change a public comdat symbol to local,
> that's a change to the ABI of the comdat, so it can't be the same
> comdat anymore, and dissolving the comdat seems appropriate.

Yep, this is what I had in mind.  When we declare to turn comdat into 
non-comdat we need
to make sure that the comdat locals are dissolved, too.
> 
> >Also i think this change needs more work.  FROM_DECL is not the function you
> >are going to get the reference, it is variable whose constructor the value 
> >was
> >take from.
> >I think we need to rename it to can_refer_decl_in_symbol and add symbol 
> >argument
> >to get proper check. I am not sure where we use it without being sure the 
> >symbol
> >is current_function_decl.  We definitly make use of it during devirt and we 
> >need to
> >start using it in ipa-cp.
> 
> Would it be OK for me to just drop this change, since it isn't
> needed for decloning?

OK, I will make followup patch for this.
Thanks,
Honza

Re: [Patch, Fortran, F03] PR 58916: Allocation of scalar with array source not rejected

2013-12-11 Thread Janus Weil

> It looks good to me - OK for trunk.

Thanks, Paul. Committed as r205894.

Cheers,
Janus



> On 11 December 2013 14:02, Janus Weil  wrote:
>> Hi all,
>>
>> attached is a small patch which fixes accepts-invalid and
>> ICE-on-invalid problems on allocation with source. Regtested on
>> x86_64-unknown-linux-gnu. Ok for trunk?
>>
>> Cheers,
>> Janus
>>
>>
>> 2013-12-11  Janus Weil  
>>
>> PR fortran/58916
>> * resolve.c (conformable_arrays): Treat scalar 'e2'.
>> (resolve_allocate_expr): Check rank also for unlimited-polymorphic
>> variables.
>>
>>
>> 2013-12-11  Janus Weil  
>>
>> PR fortran/58916
>> * gfortran.dg/allocate_with_source_4.f90: New.
>
>
>
> --
> The knack of flying is learning how to throw yourself at the ground and miss.
>--Hitchhikers Guide to the Galaxy

Re: [PATCH, testsuite] Fix alignment in movapd tests

2013-12-11 Thread Ryan Mansfield


On 13-12-10 01:13 PM, Uros Bizjak wrote:

Hello!


2013-12-10  Ryan Mansfield  

PR testsuite/59442
* gcc.target/i386/sse2-movapd-1.c: Fix alignment attributes.
* gcc.target/i386/sse2-movapd-2.c: Likewise.
* gcc.target/i386/avx-vmovapd-256-1.c: Likewise.
* gcc.target/i386/avx-vmovapd-256-2.c: Likewise.


OK for mainline and release branches.


Thanks. Could someone please apply it for me?

Regards,

Ryan Mansfield

Re: Two build != host fixes

2013-12-11 Thread Eric Botcazou

> I have some more fixes for Ada cross-builds that Eric commented on but need
> a little more work - will try to re-test this evening and re-post tomorrow.

It's also PR ada/55946.  Would mind trying the attached patch?

-- 
Eric BotcazouIndex: gnattools/Makefile.in
===
--- gnattools/Makefile.in	(revision 205881)
+++ gnattools/Makefile.in	(working copy)
@@ -24,6 +24,8 @@ srcdir = @srcdir@
 libdir = @libdir@
 build = @build@
 target = @target@
+host = @host@
+host_alias= @host_alias@
 prefix = @prefix@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
@@ -92,6 +94,7 @@ TOOLS_FLAGS_TO_PASS_RE= \
 	"CC=../../xgcc -B../../" \
 	"CXX=../../xg++ -B../../ $(CXX_LFLAGS)" \
 	"CFLAGS=$(CFLAGS)" \
+	"LDFLAGS=$(LDFLAGS)" \
 	"ADAFLAGS=$(ADAFLAGS)" \
 	"ADA_CFLAGS=$(ADA_CFLAGS)" \
 	"INCLUDES=$(INCLUDES_FOR_SUBDIR)" \
@@ -105,6 +108,22 @@ TOOLS_FLAGS_TO_PASS_RE= \
 	"TOOLSCASE=cross"
 
 # Variables for gnattools, cross
+ifeq ($(build), $(host))
+  GNATMAKE_FOR_HOST=gnatmake
+  GNATLINK_FOR_HOST=gnatlink
+  GNATBIND_FOR_HOST=gnatbind
+  GNATLS_FOR_HOST=gnatls
+else
+  GNATMAKE_FOR_HOST=$(host_alias)-gnatmake
+  GNATLINK_FOR_HOST=$(host_alias)-gnatlink
+  GNATBIND_FOR_HOST=$(host_alias)-gnatbind
+  GNATLS_FOR_HOST=$(host_alias)-gnatls
+endif
+
+# Put the host RTS dir first in the PATH to hide the default runtime
+# files that are among the sources
+RTS_DIR:=$(strip $(subst \,/,$(shell $(GNATLS_FOR_HOST) -v | grep adalib )))
+
 TOOLS_FLAGS_TO_PASS_CROSS= \
 	"CC=$(CC)" \
 	"CXX=$(CXX)" \
@@ -117,9 +136,9 @@ TOOLS_FLAGS_TO_PASS_CROSS= \
 	"exeext=$(exeext)" \
 	"fsrcdir=$(fsrcdir)" \
 	"srcdir=$(fsrcdir)" \
-	"GNATMAKE=gnatmake" \
-	"GNATLINK=gnatlink" \
-	"GNATBIND=gnatbind" \
+	"GNATMAKE=$(GNATMAKE_FOR_HOST)" \
+	"GNATLINK=$(GNATLINK_FOR_HOST)" \
+	"GNATBIND=$(GNATBIND_FOR_HOST)" \
 	"TOOLSCASE=cross" \
 	"LIBGNAT="
 
@@ -188,11 +207,6 @@ regnattools: $(GCC_DIR)/stamp-gnatlib-rt
 	$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \
 	  $(TOOLS_FLAGS_TO_PASS_NATIVE) common-tools
 
-# For cross builds of gnattools,
-# put the host RTS dir first in the PATH to hide the default runtime
-# files that are among the sources
-# FIXME: This should be done in configure.
-RTS_DIR:=$(strip $(subst \,/,$(shell gnatls -v | grep adalib )))
 gnattools-cross: $(GCC_DIR)/stamp-tools
 	# gnattools1-re
 	$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \
Index: gcc/ada/gcc-interface/Make-lang.in
===
--- gcc/ada/gcc-interface/Make-lang.in	(revision 205881)
+++ gcc/ada/gcc-interface/Make-lang.in	(working copy)
@@ -658,7 +658,7 @@ ada.tags: force
 ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb
 	-$(MKDIR) ada/doctools
 	$(CP) $^ ada/doctools
-	cd ada/doctools && $(GNATMAKE) -q xgnatugn
+	cd ada/doctools && gnatmake -q xgnatugn
 
 # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on
 # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info

Re: [Patch Ada/build] deal with some cross/native cross issues

2013-12-11 Thread Iain Sandoe

Hello Eric,

I had made the mods to this and done some light testing - then got side-tracked 
by other priorities.
however, since the topic has come up on the list:

On 6 Nov 2013, at 12:57, Eric Botcazou wrote:

>> I've been trying to improve the building and testing of Darwin for crosses
>> and native crosses.

>> 1. xgnatugn needs to be run on the build system, so needs to be built with
>> the build system's gnatmake. I haven't put a canonical prefix on this since
>> this doesn't appear to be done elsewhere. Defined as GNATMAKE_FOR_BUILD and
>> passed to sub-processes.
> 
> Why do you need to pass it to ADA_TOOLS_FLAGS_TO_PASS though?  Just replace 
> $(GNATMAKE) with gnatmake.
done (FWIW, I think that GNATMAKE_FOR_BUILD would make it obvious for a future 
reader, but not a big deal)

>> 2. Some builds might need to pass LDFLAGS to the gnat* builds.  Appended
>> LDFLAGS to GCC_LINK.  Passed on in gnattools/Make.
> 
> OK.
> 
>> 3. In gnattools, the RTS dir must be for the host and not for the build; 
>> This actually only showed up when I tried a cross from a 64bit pointer
>> machine to a 32bit pointer one (i.e it is easy for it to go unnoticed).
> 
> OK, but don't you need to do the same for gnatmake/gnatbind/gnatlink here?
> See gcc-interface/Make-lang.in, line 171 and below, for similar code.

it did appear odd that one path had the test and the other did not, however the 
comment on the native-x case is somewhat misleading since it implies (at least 
to me) that the *intention* is to use the newly-built target(=host) lib?

In the current patch this is changed to place the test and setting RTS_DIR to 
cover both native and canadian X cases, at least this should be safe.

Note that I have *not* tested any canadian-crosses, just cross and 
native-cross, if you need someone to do a canadian X before this is applied, 
let me know, and I'll try to set something up. (unless it gets covered by Alan 
or Bernd's cases).

I am re-testing the attached, rebased to tot, but that will take a while, given 
the machines I have available, 

OK to apply if [cross & native-cross] testing passes?

(if the other folks doing cross-build stuff want to incorporate/take this on, 
that's OK too).
Iain

 gcc/ada/gcc-interface/Make-lang.in | 9 +
 gcc/ada/gcc-interface/Makefile.in  | 2 +-
 gnattools/Makefile.in  | 6 ++
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index cd3676f..f7aafc0 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -178,6 +178,10 @@ else
   GNATLINK_FOR_HOST=$(host)-gnatlink
   GNATLS_FOR_HOST=$(host)-gnatls
 
+  ifneq ($(findstring ada,$(LANGUAGES)),)
+RTS_DIR:=$(strip $(subst \,/,$(shell $(GNATLS_FOR_HOST) -v | grep adalib 
)))
+  endif
+
   ifeq ($(host), $(target))
 # This is a cross native. All the sources are taken from the currently
 # built runtime.
@@ -193,9 +197,6 @@ else
   else
 # This is a canadian cross. We should use a toolchain running on the
 # build platform and targeting the host platform.
-ifneq ($(findstring ada,$(LANGUAGES)),)
-  RTS_DIR:=$(strip $(subst \,/,$(shell $(GNATLS_FOR_HOST) -v | grep adalib 
)))
-endif
 ADA_TOOLS_FLAGS_TO_PASS=\
 CC="$(CC)" \
 CXX="$(CXX)" \
@@ -658,7 +659,7 @@ ada.tags: force
 ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb
-$(MKDIR) ada/doctools
$(CP) $^ ada/doctools
-   cd ada/doctools && $(GNATMAKE) -q xgnatugn
+   cd ada/doctools && gnatmake -q xgnatugn
 
 # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on
 # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info
diff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index 885a5ed..6b675f2 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -2415,7 +2415,7 @@ TOOLS_FLAGS_TO_PASS=  \
"GNATLINK=$(GNATLINK)"  \
"GNATBIND=$(GNATBIND)"
 
-GCC_LINK=$(CXX) $(GCC_LINK_FLAGS) $(ADA_INCLUDES)
+GCC_LINK=$(CXX) $(GCC_LINK_FLAGS) $(ADA_INCLUDES) $(LDFLAGS)
 
 # Build directory for the tools. Let's copy the target-dependent
 # sources using the same mechanism as for gnatlib. The other sources are
diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in
index fdd6491..118847c 100644
--- a/gnattools/Makefile.in
+++ b/gnattools/Makefile.in
@@ -24,6 +24,7 @@ srcdir = @srcdir@
 libdir = @libdir@
 build = @build@
 target = @target@
+host = @host@
 prefix = @prefix@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
@@ -92,6 +93,7 @@ TOOLS_FLAGS_TO_PASS_RE= \
"CC=../../xgcc -B../../" \
"CXX=../../xg++ -B../../ $(CXX_LFLAGS)" \
"CFLAGS=$(CFLAGS)" \
+   "LDFLAGS=$(LDFLAGS)" \
"ADAFLAGS=$(ADAFLAGS)" \
"ADA_CFLAGS=$(ADA_CFLAGS)" \
"INCLUDES=$(INCLUDES_FOR_SUBDIR)" \
@@ -192,7 +194,11 @@ regnattools: $(GCC_DIR

Re: [RFA][PATCH][PR tree-optimization/45685]

2013-12-11 Thread Jeff Law


On 12/11/13 02:51, Richard Biener wrote:


First of all phiopt runs unconditionally for -On with n > 0 but the conversion
is clearly not suitable for non-speed optimizations.  Thus I'd guard it
with at least !optimize_size && optimize >= 2.  As you are targeting
a worse transformation done by if-conversion you may want to
add || (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
&& flag_tree_loop_if_convert != 0)
   || flag_tree_loop_if_convert == 1
   || flag_tree_loop_if_convert_stores == 1)
(ugh).

That's a hell of a condition to guard a transformation.  But, yea, I agree.


+
+  /* If inversion is needed, first try to invert the test since
+ that's cheapest.  */
+  if (invert)
+{
+  enum tree_code new_code
+   = invert_tree_comparison (cond_code,
+ HONOR_NANS (TYPE_MODE (TREE_TYPE (rhs;


That looks wrong - you want to look at HONOR_NANS on the mode
of one of the comparison operands, not of the actual value you want
to negate (it's integer and thus never has NaNs).
Bah.  That was supposed to be HONOR_SIGNED_ZEROS.  Which as far as I can 
tell is a property of the value being tested.


So it seems to me it should be

HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (one of the conditional's 
arguments)))


much like is done in abs_replacement.  With the difference we want to 
look at the comparison (which may have different arguments than the PHI 
we're converting) and that we can still apply the optimization, just in 
a slightly different way.


jeff

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jason Merrill


On 12/11/2013 08:55 AM, Jan Hubicka wrote:

+  /* Don't clone decls local to a comdat group.  */
+  if (comdat_local_p (node))
+return false;


Why? It seems this should work if the clone becomes another comdat local?


Right, I think the problem was that the clone wasn't becoming comdat
local.  And for the specific case of decloning, there's no point in
cloning the decloned constructor.


If it does not make sense, how we ended up cloning it?


I guess the heuristics are making a mistake.


Can you show me some code sample of decloning?



I assume that we basically turn original cloned constructors into wrappers
around common "worker" that is now comdat local.


Right.


I think ipa-cp may end up
deciding on clonning the wrapper that will break because it will end up static
calling the local comdat function.
On the other hand it may decide to clone both the wrapper and the actual 
function
and in that case bringing both static is fine.


In the case I'm looking at (g++.dg/torture/pr41257.C, with decloning 
forced on), we're cloning the local function in order to propagate in a 
'0' passed in from one of the wrappers.  This is pointless because the 
wrapper contains just the one call, so in any situation where cloning 
makes sense, inlining is better.


So, it's probably possible to make it work to clone the comdat local 
function into another comdat local function, but it's not useful, and 
it's easier to just prevent it.


Jason

Re: [RFA][PATCH][PR tree-optimization/45685]

2013-12-11 Thread Richard Biener

On Wed, Dec 11, 2013 at 3:51 PM, Jeff Law  wrote:
> On 12/11/13 02:51, Richard Biener wrote:
>>
>>
>> First of all phiopt runs unconditionally for -On with n > 0 but the
>> conversion
>> is clearly not suitable for non-speed optimizations.  Thus I'd guard it
>> with at least !optimize_size && optimize >= 2.  As you are targeting
>> a worse transformation done by if-conversion you may want to
>> add || (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
>> && flag_tree_loop_if_convert != 0)
>>|| flag_tree_loop_if_convert == 1
>>|| flag_tree_loop_if_convert_stores == 1)
>> (ugh).
>
> That's a hell of a condition to guard a transformation.  But, yea, I agree.
>
>
>>> +
>>> +  /* If inversion is needed, first try to invert the test since
>>> + that's cheapest.  */
>>> +  if (invert)
>>> +{
>>> +  enum tree_code new_code
>>> +   = invert_tree_comparison (cond_code,
>>> + HONOR_NANS (TYPE_MODE (TREE_TYPE
>>> (rhs;
>>
>>
>> That looks wrong - you want to look at HONOR_NANS on the mode
>> of one of the comparison operands, not of the actual value you want
>> to negate (it's integer and thus never has NaNs).
>
> Bah.  That was supposed to be HONOR_SIGNED_ZEROS.  Which as far as I can
> tell is a property of the value being tested.

No, it's

invert_tree_comparison (enum tree_code code, bool honor_nans)

so indeed HONOR_NANS.  And yes, on a conditional argument
(it can be a FP comparison but a integer negate).

Richard.

> So it seems to me it should be
>
> HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (one of the conditional's
> arguments)))
>
> much like is done in abs_replacement.  With the difference we want to look
> at the comparison (which may have different arguments than the PHI we're
> converting) and that we can still apply the optimization, just in a slightly
> different way.
>
> jeff
>

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread Richard Sandiford

"H.J. Lu"  writes:
> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>  wrote:
>> Richard Henderson  writes:
>>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
 Sorry, I don't understand.  I never said it was invalid.  I said
 (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
 a single register.  On a little-endian target, the offset cannot be
 anything other than 0 in that case.

 So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
 something that is always invalid, regardless of the target.  That kind
 of situation should be rejected by target-independent code instead.
>>>
>>> But, we want to disable the subreg before we know whether or not (reg:V4SF 
>>> X)
>>> will be allocated to a single hard register.  That is something that we 
>>> can't
>>> know in target-independent code before register allocation.
>>
>> I was thinking that if we've got a class, we've also got things like
>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>> But even in the padding cases an offset-based check in C_C_M_C could
>> be derived from other information.
>>
>> subreg_get_info handles padding with:
>>
>>   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>   if (GET_MODE_INNER (xmode) == VOIDmode)
>> xmode_unit = xmode;
>>   else
>> xmode_unit = GET_MODE_INNER (xmode);
>>   gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>   gcc_assert (nregs_xmode
>>   == (GET_MODE_NUNITS (xmode)
>>   * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>>   gcc_assert (hard_regno_nregs[xregno][xmode]
>>   == (hard_regno_nregs[xregno][xmode_unit]
>>   * GET_MODE_NUNITS (xmode)));
>>
>>   /* You can only ask for a SUBREG of a value with holes in the middle
>>  if you don't cross the holes.  (Such a SUBREG should be done by
>>  picking a different register class, or doing it in memory if
>>  necessary.)  An example of a value with holes is XCmode on 32-bit
>>  x86 with -m128bit-long-double; it's represented in 6 32-bit 
>> registers,
>>  3 for each part, but in memory it's two 128-bit parts.
>>  Padding is assumed to be at the end (not necessarily the 'high 
>> part')
>>  of each unit.  */
>>   if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>< GET_MODE_NUNITS (xmode))
>>   && (offset / GET_MODE_SIZE (xmode_unit)
>>   != ((offset + GET_MODE_SIZE (ymode) - 1)
>>   / GET_MODE_SIZE (xmode_unit
>> {
>>   info->representable_p = false;
>>   rknown = true;
>> }
>>
>> and I wouldn't really want to force targets to individually reproduce
>> that kind of logic at the class level.  If the worst comes to the worst
>> we could cache the difficult cases.
>>
>
> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
> to know if the subreg byte is zero or not.  It doesn't care about mode
> padding.  You are concerned about information passed to
> CANNOT_CHANGE_MODE_CLASS is too expensive for target
> to process.  It isn't the case for x86.

No, I'm concerned that by going this route, we're forcing every target
(or at least every target with wider-than-word registers, which is most
of the common ones) to implement the same target-independent restriction.
This is not an x86-specific issue.

Thanks,
Richard

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread H.J. Lu

On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
 wrote:
> "H.J. Lu"  writes:
>> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
>>  wrote:
>>> Richard Henderson  writes:
 On 12/10/2013 10:44 AM, Richard Sandiford wrote:
> Sorry, I don't understand.  I never said it was invalid.  I said
> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
> a single register.  On a little-endian target, the offset cannot be
> anything other than 0 in that case.
>
> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
> something that is always invalid, regardless of the target.  That kind
> of situation should be rejected by target-independent code instead.

 But, we want to disable the subreg before we know whether or not (reg:V4SF 
 X)
 will be allocated to a single hard register.  That is something that we 
 can't
 know in target-independent code before register allocation.
>>>
>>> I was thinking that if we've got a class, we've also got things like
>>> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
>>> But even in the padding cases an offset-based check in C_C_M_C could
>>> be derived from other information.
>>>
>>> subreg_get_info handles padding with:
>>>
>>>   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>>>   if (GET_MODE_INNER (xmode) == VOIDmode)
>>> xmode_unit = xmode;
>>>   else
>>> xmode_unit = GET_MODE_INNER (xmode);
>>>   gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>>>   gcc_assert (nregs_xmode
>>>   == (GET_MODE_NUNITS (xmode)
>>>   * HARD_REGNO_NREGS_WITH_PADDING (xregno, 
>>> xmode_unit)));
>>>   gcc_assert (hard_regno_nregs[xregno][xmode]
>>>   == (hard_regno_nregs[xregno][xmode_unit]
>>>   * GET_MODE_NUNITS (xmode)));
>>>
>>>   /* You can only ask for a SUBREG of a value with holes in the middle
>>>  if you don't cross the holes.  (Such a SUBREG should be done by
>>>  picking a different register class, or doing it in memory if
>>>  necessary.)  An example of a value with holes is XCmode on 32-bit
>>>  x86 with -m128bit-long-double; it's represented in 6 32-bit 
>>> registers,
>>>  3 for each part, but in memory it's two 128-bit parts.
>>>  Padding is assumed to be at the end (not necessarily the 'high 
>>> part')
>>>  of each unit.  */
>>>   if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>>>< GET_MODE_NUNITS (xmode))
>>>   && (offset / GET_MODE_SIZE (xmode_unit)
>>>   != ((offset + GET_MODE_SIZE (ymode) - 1)
>>>   / GET_MODE_SIZE (xmode_unit
>>> {
>>>   info->representable_p = false;
>>>   rknown = true;
>>> }
>>>
>>> and I wouldn't really want to force targets to individually reproduce
>>> that kind of logic at the class level.  If the worst comes to the worst
>>> we could cache the difficult cases.
>>>
>>
>> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
>> to know if the subreg byte is zero or not.  It doesn't care about mode
>> padding.  You are concerned about information passed to
>> CANNOT_CHANGE_MODE_CLASS is too expensive for target
>> to process.  It isn't the case for x86.
>
> No, I'm concerned that by going this route, we're forcing every target
> (or at least every target with wider-than-word registers, which is most
> of the common ones) to implement the same target-independent restriction.
> This is not an x86-specific issue.
>

So you prefer a generic solution which makes
CANNOT_CHANGE_MODE_CLASS return true
for vector mode subreg if subreg byte != 0. Is this
correct?

Thanks.


-- 
H.J.

[wwwdocs] Buildstat update for 4.8

2013-12-11 Thread Tom G. Christensen

Latest results for gcc 4.8.x.

This patch supersedes the previous one sent on 2013-11-03 as it was
never applied.

-tgc

Testresults for 4.8.2
  arm-unknown-linux-gnueabi
  i386-pc-solaris2.9
  i686-pc-linux-gnu (2)
  i686-unknown-linux-gnu
  mips-unknown-linux-gnu
  mipsel-unknown-linux-gnu
  powerpc-apple-darwin8.11.0
  powerpc-unknown-linux-gnu
  sparc-sun-solaris2.9
  sparc64-sun-solaris2.10 (2)
  sparc-unknown-linux-gnu
  x86_64-apple-darwin13
  x86_64-unknown-linux-gnu (3)

Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/buildstat.html,v
retrieving revision 1.5
diff -u -r1.5 buildstat.html
--- buildstat.html  2 Oct 2013 08:58:33 -   1.5
+++ buildstat.html  11 Dec 2013 16:15:47 -
@@ -26,6 +26,7 @@
 arm-unknown-linux-gnueabi
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01757.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01612.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02658.html";>4.8.0
 
@@ -85,6 +86,7 @@
 i386-pc-solaris2.9
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg02068.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00230.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00416.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00415.html";>4.8.0,
@@ -112,15 +114,26 @@
 i686-pc-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01348.html";>4.8.2,
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01313.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-07/msg02349.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-04/msg03091.html";>4.8.0
 
 
 
 
+i686-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01351.html";>4.8.2
+
+
+
+
 mips-unknown-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01758.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg02891.html";>4.8.1
 
 
@@ -129,6 +142,7 @@
 mipsel-unknown-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01759.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01568.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02562.html";>4.8.0
 
@@ -138,6 +152,7 @@
 powerpc-apple-darwin8.11.0
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01405.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00237.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-04/msg01012.html";>4.8.0
 
@@ -163,6 +178,7 @@
 powerpc-unknown-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00014.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01486.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02452.html";>4.8.0
 
@@ -180,6 +196,7 @@
 sparc-sun-solaris2.9
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg02067.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00243.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02774.html";>4.8.0,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02717.html";>4.8.0
@@ -206,6 +223,8 @@
 sparc64-sun-solaris2.10
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01666.html";>4.8.2,
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01379.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02413.html";>4.8.0
 
 
@@ -214,6 +233,7 @@
 sparc-unknown-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01785.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg02503.html";>4.8.1,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02660.html";>4.8.0
 
@@ -244,6 +264,14 @@
 
 
 
+x86_64-apple-darwin13
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01562.html";>4.8.2
+
+
+
+
 x86_64-apple-darwin13.0.0
  
 Test results:
@@ -255,6 +283,9 @@
 x86_64-unknown-linux-gnu
  
 Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00373.html";>4.8.2,
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01854.html";>4.8.2,
+http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01777.html";>4.8.2,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02881.html";>4.8.0,
 http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02623.html";>4.8.0

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jason Merrill


On 12/11/2013 10:11 AM, Jason Merrill wrote:

So, it's probably possible to make it work to clone the comdat local
function into another comdat local function, but it's not useful, and
it's easier to just prevent it.


Well, not that much easier actually.  I'm attaching both a delta from my 
last patch and a complete patch against trunk.  Two questions:


Do you have an opinion about which way to implement symtab_in_same_comdat_p?

Is ipa_passes the right place to initialize calls_comdat_local?

Jason

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index bb6b382..8b5b786 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2623,11 +2623,6 @@ verify_cgraph_node (struct cgraph_node *node)
   error ("execution count is negative");
   error_found = true;
 }
-  if (node->global.inlined_to && node->same_comdat_group)
-{
-  error ("inline clone in same comdat group list");
-  error_found = true;
-}
   if (!node->definition && !node->in_other_partition && node->local.local)
 {
   error ("local symbols must be defined");
@@ -2666,15 +2661,13 @@ verify_cgraph_node (struct cgraph_node *node)
 	  error_found = true;
 	}
 }
-  tree required_group = NULL_TREE;
-  if (comdat_local_p (node))
-required_group = DECL_COMDAT_GROUP (node->decl);
+  bool check_group = symtab_comdat_local_p (node);
   for (e = node->callers; e; e = e->next_caller)
 {
   if (verify_edge_count_and_frequency (e))
 	error_found = true;
-  if (required_group
-	  && DECL_COMDAT_GROUP (e->caller->decl) != required_group)
+  if (check_group
+	  && !symtab_in_same_comdat_p (e->caller, node))
 	{
 	  error ("comdat-local function called by %s outside its comdat",
 		 identifier_to_locale (e->caller->name ()));
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index a4de294..62b7a7c 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -428,6 +428,8 @@ public:
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* True if this decl calls a COMDAT-local function.  */
+  unsigned calls_comdat_local : 1;
 };
 
 
@@ -1496,8 +1498,25 @@ symtab_can_be_discarded (symtab_node *node)
constructors.  */
 
 static inline bool
-comdat_local_p (symtab_node *node)
+symtab_comdat_local_p (symtab_node *node)
 {
   return (node->same_comdat_group && !TREE_PUBLIC (node->decl));
 }
+
+/* Return true if ONE and TWO are part of the same COMDAT group.  */
+
+static inline bool
+symtab_in_same_comdat_p (symtab_node *one, symtab_node *two)
+{
+#if 1
+  return DECL_COMDAT_GROUP (one->decl) == DECL_COMDAT_GROUP (two->decl);
+#else
+  if (one->same_comdat_group && two->same_comdat_group)
+for (symtab_node *n = one->same_comdat_group; n != one;
+	 n = n->same_comdat_group)
+  if (n == two)
+	return true;
+  return false;
+#endif
+}
 #endif  /* GCC_CGRAPH_H  */
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 90ef901..cff9a45 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -216,6 +216,8 @@ cgraph_clone_node (struct cgraph_node *n, tree decl, gcov_type count, int freq,
   new_node->clone = n->clone;
   new_node->clone.tree_map = NULL;
   new_node->tp_first_run = n->tp_first_run;
+  if (n->same_comdat_group)
+symtab_add_to_same_comdat_group (new_node, n);
   if (n->count)
 {
   if (new_node->count > n->count)
@@ -446,13 +448,10 @@ cgraph_create_virtual_clone (struct cgraph_node *old_node,
 redirect_callers, false, NULL);
   /* Update the properties.
  Make clone visible only within this translation unit.  Make sure
- that is not weak also.
- ??? We cannot use COMDAT linkage because there is no
- ABI support for this.  */
+ that is not weak also.  */
   DECL_EXTERNAL (new_node->decl) = 0;
   if (DECL_ONE_ONLY (old_decl))
 DECL_SECTION_NAME (new_node->decl) = NULL;
-  DECL_COMDAT_GROUP (new_node->decl) = 0;
   TREE_PUBLIC (new_node->decl) = 0;
   DECL_COMDAT (new_node->decl) = 0;
   DECL_WEAK (new_node->decl) = 0;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 44f3afd..a1294ae 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1244,7 +1244,8 @@ mark_functions_to_output (void)
 	  for (next = cgraph (node->same_comdat_group);
 		   next != node;
 		   next = cgraph (next->same_comdat_group))
-		if (!next->thunk.thunk_p && !next->alias)
+		if (!next->thunk.thunk_p && !next->alias
+		&& !symtab_comdat_local_p (next))
 		  next->process = 1;
 	}
 	}
@@ -1990,6 +1991,12 @@ ipa_passes (void)
 
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
 
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+if (symtab_comdat_local_p (node))
+  for (struct cgraph_edge *e = node->callers; e; e = e->next_caller)
+	e->caller->calls_comdat_local = true;
+
   if (!in_lto_p)
 {
   execute_ipa_pass_list (passes->all_small_ipa_passes);
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index e326e75..850abb8 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -97,7 +97

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread Tejas Belagod


H.J. Lu wrote:

On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
 wrote:

"H.J. Lu"  writes:

On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
 wrote:

Richard Henderson  writes:

On 12/10/2013 10:44 AM, Richard Sandiford wrote:

Sorry, I don't understand.  I never said it was invalid.  I said
(subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
a single register.  On a little-endian target, the offset cannot be
anything other than 0 in that case.

So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
something that is always invalid, regardless of the target.  That kind
of situation should be rejected by target-independent code instead.

But, we want to disable the subreg before we know whether or not (reg:V4SF X)
will be allocated to a single hard register.  That is something that we can't
know in target-independent code before register allocation.

I was thinking that if we've got a class, we've also got things like
CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
But even in the padding cases an offset-based check in C_C_M_C could
be derived from other information.

subreg_get_info handles padding with:

  nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
  if (GET_MODE_INNER (xmode) == VOIDmode)
xmode_unit = xmode;
  else
xmode_unit = GET_MODE_INNER (xmode);
  gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
  gcc_assert (nregs_xmode
  == (GET_MODE_NUNITS (xmode)
  * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
  gcc_assert (hard_regno_nregs[xregno][xmode]
  == (hard_regno_nregs[xregno][xmode_unit]
  * GET_MODE_NUNITS (xmode)));

  /* You can only ask for a SUBREG of a value with holes in the middle
 if you don't cross the holes.  (Such a SUBREG should be done by
 picking a different register class, or doing it in memory if
 necessary.)  An example of a value with holes is XCmode on 32-bit
 x86 with -m128bit-long-double; it's represented in 6 32-bit registers,
 3 for each part, but in memory it's two 128-bit parts.
 Padding is assumed to be at the end (not necessarily the 'high part')
 of each unit.  */
  if ((offset / GET_MODE_SIZE (xmode_unit) + 1
   < GET_MODE_NUNITS (xmode))
  && (offset / GET_MODE_SIZE (xmode_unit)
  != ((offset + GET_MODE_SIZE (ymode) - 1)
  / GET_MODE_SIZE (xmode_unit
{
  info->representable_p = false;
  rknown = true;
}

and I wouldn't really want to force targets to individually reproduce
that kind of logic at the class level.  If the worst comes to the worst
we could cache the difficult cases.


My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
to know if the subreg byte is zero or not.  It doesn't care about mode
padding.  You are concerned about information passed to
CANNOT_CHANGE_MODE_CLASS is too expensive for target
to process.  It isn't the case for x86.

No, I'm concerned that by going this route, we're forcing every target
(or at least every target with wider-than-word registers, which is most
of the common ones) to implement the same target-independent restriction.
This is not an x86-specific issue.



So you prefer a generic solution which makes
CANNOT_CHANGE_MODE_CLASS return true
for vector mode subreg if subreg byte != 0. Is this
correct?


Do you mean a generic solution for C_C_M_C to return true for non-zero 
byte_offset vector subregs in the context of x86?


I want to clarify because in the context of 32-bit ARM little-endian, a non-zero 
byte-offset vector subreg is still a valid full hardreg. eg. for


   (subreg:DI (reg:V4SF) 8)

C_C_M_C can return 'false' as this can be resolved to a full D-reg.

Thanks,
Tejas.

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jason Merrill


On 12/11/2013 11:24 AM, Jason Merrill wrote:

Well, not that much easier actually.  I'm attaching both a delta from my
last patch and a complete patch against trunk.


Oops, forgot to remove the gimple-fold change, doing that now.

Jason

Re: [RFA][PATCH][PR tree-optimization/45685]

2013-12-11 Thread Jeff Law


On 12/11/13 08:11, Richard Biener wrote:

Bah.  That was supposed to be HONOR_SIGNED_ZEROS.  Which as far as I can
tell is a property of the value being tested.


No, it's

invert_tree_comparison (enum tree_code code, bool honor_nans)

so indeed HONOR_NANS.  And yes, on a conditional argument
(it can be a FP comparison but a integer negate).
I realized I was wrong after I went downstairs for breakfast :-)  Ignore 
my last message WRT HONOR_NANs and HONOR_SIGNED_ZEROs.


Jeff

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread H.J. Lu

On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod  wrote:
> H.J. Lu wrote:
>>
>> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
>>  wrote:
>>>
>>> "H.J. Lu"  writes:

 On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
  wrote:
>
> Richard Henderson  writes:
>>
>> On 12/10/2013 10:44 AM, Richard Sandiford wrote:
>>>
>>> Sorry, I don't understand.  I never said it was invalid.  I said
>>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
>>> a single register.  On a little-endian target, the offset cannot be
>>> anything other than 0 in that case.
>>>
>>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
>>> something that is always invalid, regardless of the target.  That
>>> kind
>>> of situation should be rejected by target-independent code instead.
>>
>> But, we want to disable the subreg before we know whether or not
>> (reg:V4SF X)
>> will be allocated to a single hard register.  That is something that
>> we can't
>> know in target-independent code before register allocation.
>
> I was thinking that if we've got a class, we've also got things like
> CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
> But even in the padding cases an offset-based check in C_C_M_C could
> be derived from other information.
>
> subreg_get_info handles padding with:
>
>   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
>   if (GET_MODE_INNER (xmode) == VOIDmode)
> xmode_unit = xmode;
>   else
> xmode_unit = GET_MODE_INNER (xmode);
>   gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>   gcc_assert (nregs_xmode
>   == (GET_MODE_NUNITS (xmode)
>   * HARD_REGNO_NREGS_WITH_PADDING (xregno,
> xmode_unit)));
>   gcc_assert (hard_regno_nregs[xregno][xmode]
>   == (hard_regno_nregs[xregno][xmode_unit]
>   * GET_MODE_NUNITS (xmode)));
>
>   /* You can only ask for a SUBREG of a value with holes in the
> middle
>  if you don't cross the holes.  (Such a SUBREG should be done
> by
>  picking a different register class, or doing it in memory if
>  necessary.)  An example of a value with holes is XCmode on
> 32-bit
>  x86 with -m128bit-long-double; it's represented in 6 32-bit
> registers,
>  3 for each part, but in memory it's two 128-bit parts.
>  Padding is assumed to be at the end (not necessarily the 'high
> part')
>  of each unit.  */
>   if ((offset / GET_MODE_SIZE (xmode_unit) + 1
>< GET_MODE_NUNITS (xmode))
>   && (offset / GET_MODE_SIZE (xmode_unit)
>   != ((offset + GET_MODE_SIZE (ymode) - 1)
>   / GET_MODE_SIZE (xmode_unit
> {
>   info->representable_p = false;
>   rknown = true;
> }
>
> and I wouldn't really want to force targets to individually reproduce
> that kind of logic at the class level.  If the worst comes to the worst
> we could cache the difficult cases.
>
 My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
 to know if the subreg byte is zero or not.  It doesn't care about mode
 padding.  You are concerned about information passed to
 CANNOT_CHANGE_MODE_CLASS is too expensive for target
 to process.  It isn't the case for x86.
>>>
>>> No, I'm concerned that by going this route, we're forcing every target
>>> (or at least every target with wider-than-word registers, which is most
>>> of the common ones) to implement the same target-independent restriction.
>>> This is not an x86-specific issue.
>>>
>>
>> So you prefer a generic solution which makes
>> CANNOT_CHANGE_MODE_CLASS return true
>> for vector mode subreg if subreg byte != 0. Is this
>> correct?
>
>
> Do you mean a generic solution for C_C_M_C to return true for non-zero
> byte_offset vector subregs in the context of x86?
>
> I want to clarify because in the context of 32-bit ARM little-endian, a
> non-zero byte-offset vector subreg is still a valid full hardreg. eg. for
>
>(subreg:DI (reg:V4SF) 8)
>
> C_C_M_C can return 'false' as this can be resolved to a full D-reg.
>

Does that mean subreg byte interpretation is endian-dependent?
Both llittle endian

subreg:DI (reg:V4SF) 0)

and big endian

subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)

refer to the same lower 64 bits of reg:V4SF.  Is this correct?

-- 
H.J.

Re: _Cilk_spawn and _Cilk_sync for C++

2013-12-11 Thread Jason Merrill


On 12/10/2013 06:03 PM, Iyer, Balaji V wrote:

Fixed patch and ChangeLog entries are attached. Is it Ok to install?


OK.

Jason

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-11 Thread Tejas Belagod


H.J. Lu wrote:

On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod  wrote:

H.J. Lu wrote:

On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford
 wrote:

"H.J. Lu"  writes:

On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford
 wrote:

Richard Henderson  writes:

On 12/10/2013 10:44 AM, Richard Sandiford wrote:

Sorry, I don't understand.  I never said it was invalid.  I said
(subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents
a single register.  On a little-endian target, the offset cannot be
anything other than 0 in that case.

So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for
something that is always invalid, regardless of the target.  That
kind
of situation should be rejected by target-independent code instead.

But, we want to disable the subreg before we know whether or not
(reg:V4SF X)
will be allocated to a single hard register.  That is something that
we can't
know in target-independent code before register allocation.

I was thinking that if we've got a class, we've also got things like
CLASS_MAX_NREGS.  Maybe that doesn't cope with padding properly though.
But even in the padding cases an offset-based check in C_C_M_C could
be derived from other information.

subreg_get_info handles padding with:

  nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
  if (GET_MODE_INNER (xmode) == VOIDmode)
xmode_unit = xmode;
  else
xmode_unit = GET_MODE_INNER (xmode);
  gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
  gcc_assert (nregs_xmode
  == (GET_MODE_NUNITS (xmode)
  * HARD_REGNO_NREGS_WITH_PADDING (xregno,
xmode_unit)));
  gcc_assert (hard_regno_nregs[xregno][xmode]
  == (hard_regno_nregs[xregno][xmode_unit]
  * GET_MODE_NUNITS (xmode)));

  /* You can only ask for a SUBREG of a value with holes in the
middle
 if you don't cross the holes.  (Such a SUBREG should be done
by
 picking a different register class, or doing it in memory if
 necessary.)  An example of a value with holes is XCmode on
32-bit
 x86 with -m128bit-long-double; it's represented in 6 32-bit
registers,
 3 for each part, but in memory it's two 128-bit parts.
 Padding is assumed to be at the end (not necessarily the 'high
part')
 of each unit.  */
  if ((offset / GET_MODE_SIZE (xmode_unit) + 1
   < GET_MODE_NUNITS (xmode))
  && (offset / GET_MODE_SIZE (xmode_unit)
  != ((offset + GET_MODE_SIZE (ymode) - 1)
  / GET_MODE_SIZE (xmode_unit
{
  info->representable_p = false;
  rknown = true;
}

and I wouldn't really want to force targets to individually reproduce
that kind of logic at the class level.  If the worst comes to the worst
we could cache the difficult cases.


My case is x86 CANNOT_CHANGE_MODE_CLASS only needs
to know if the subreg byte is zero or not.  It doesn't care about mode
padding.  You are concerned about information passed to
CANNOT_CHANGE_MODE_CLASS is too expensive for target
to process.  It isn't the case for x86.

No, I'm concerned that by going this route, we're forcing every target
(or at least every target with wider-than-word registers, which is most
of the common ones) to implement the same target-independent restriction.
This is not an x86-specific issue.


So you prefer a generic solution which makes
CANNOT_CHANGE_MODE_CLASS return true
for vector mode subreg if subreg byte != 0. Is this
correct?


Do you mean a generic solution for C_C_M_C to return true for non-zero
byte_offset vector subregs in the context of x86?

I want to clarify because in the context of 32-bit ARM little-endian, a
non-zero byte-offset vector subreg is still a valid full hardreg. eg. for

   (subreg:DI (reg:V4SF) 8)

C_C_M_C can return 'false' as this can be resolved to a full D-reg.



Does that mean subreg byte interpretation is endian-dependent?
Both llittle endian

subreg:DI (reg:V4SF) 0)

and big endian

subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)

refer to the same lower 64 bits of reg:V4SF.  Is this correct?



If my understanding of endianness representation in RTL registers is correct, 
yes.

I said little-endian because C_C_M_C is currently gated on TARGET_BIG_ENDIAN in 
arm.h.


Thanks,
Tejas.

RFA: patch to fix PR59466 (inefficient address generation on ppc )

2013-12-11 Thread Vladimir Makarov


  The following patch fixes PR59466

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59466

  LRA on PPC sometimes generates inefficient code

addis 9,2,.LC77@toc@ha
addi 9,9,.LC77@toc@l
ld 9,0(9)

  instead of

addis 9,2,.LC77@toc@ha
ld 9,.LC77@toc@l(9)

  I can not create a small test for this.  The smallest file when I
found is bzip2.c from SPEC2000.

  LRA generates move insn with invalid memory [unspec[`*.LC29',%2:DI] 
45] but it can handle it (make it valid) very efficiently after that

trying different numerous transformations.  PPC target machinary
through validize_mem just put all address in a pseudo.

  I could prevent use validize_mem in rs6000.c but I prefer to do it
in change_addres_1 as other targets might have the same problem and it
is better to have one for all solution.

  Still it does not fully solve the problem as insn
r257:DI=[unspec[`*.LC29',%2:DI] 45] cant be recognized as
*movdi... pattern has operand predicates rejecting memory because of
invalid address.  To fix this a change in general_operand is done.  As
LRA can not work properly with regular insn recognition, I added an
assert for this in lra_set_insn_recog_data to figure out this
situation earlier.

  Again, LRA has a very good code for legitimize address by itself and
it is better to use it.

  After applying patch I see code size reduction on SPEC2000.

Before the patch (this is relative reload generated code):

CFP2000-
-0.471%  27171  27043 168.wupwise
-0.457%  14006  13942 171.swim
-0.392%  24515  24419 172.mgrid
 0.226%  85079  85271 173.applu
 0.751% 728891 734363 177.mesa
 0.194% 214357 214773 178.galgel
-0.295%  21683  21619 179.art
 0.412%  31089  31217 183.equake
-0.520%  79976  79560 187.facerec
 0.000% 152504 152504 188.ammp
 0.000%  43758  43758 189.lucas
-0.181%10622651060337 191.fma3d
 1.035%10416841052468 200.sixtrack
-0.105% 151944 151784 301.apsi
Average = 0.0139775%
CINT2000-
 0.252%  76242  76434 164.gzip
 0.172% 186152 186472 175.vpr
-0.215%20846122080132 176.gcc
 0.000%  16716  16716 181.mcf
 0.085% 225316 225508 186.crafty
-0.015% 210100 210068 197.parser
 0.622% 433635 436332 252.eon
-0.298% 762014 759742 253.perlbmk
=0.000% 904784 904784 254.gap
-0.285% 706432 704416 255.vortex
 0.220%  58297  58425 256.bzip2
 0.314% 265334 266166 300.twolf
Average = 0.070863%

After the patch:
CFP2000-
-0.589%  27171  27011 168.wupwise
-0.457%  14006  13942 171.swim
-0.392%  24515  24419 172.mgrid
-0.113%  85079  84983 173.applu
 0.654% 728891 733659 177.mesa
 0.060% 214357 214485 178.galgel
-0.295%  21683  21619 179.art
-0.412%  31089  30961 183.equake
-0.520%  79976  79560 187.facerec
 0.000% 152504 152504 188.ammp
 0.000%  43758  43758 189.lucas
-0.317%10622651058897 191.fma3d
 0.356%10416841045396 200.sixtrack
-0.105% 151944 151784 301.apsi
Average = -0.152103%
CINT2000-
 0.084%  76242  76306 164.gzip
-0.052% 186152 186056 175.vpr
-0.284%20846122078692 176.gcc
 0.000%  16716  16716 181.mcf
-0.312% 225316 224612 186.crafty
-0.091% 210100 209908 197.parser
 0.622% 433635 436332 252.eon
-0.340% 762014 759422 253.perlbmk
 0.000% 904784 904784 254.gap
-0.335% 706432 704064 255.vortex
 0.110%  58297  58361 256.bzip2
-0.241% 265334 264694 300.twolf
Average = -0.070023%

Code size reduction for PPC in this case means also faster code
generation.  I see it but cannot provide reliable SPEC2000 rate
change.


The patch was successfully bootstrapped and tested on i686, x86_64,
and PPC64.

Ok to commit?


2013-12-11  Vladimir Makarov  

PR rtl-optimization/59466
* emit-rtl.c (change_address_1): Don't validate address for LRA.
* recog.c (general_operand): Accept any memory for LRA.
* lra.c (lra_set_insn_recog_data): Add an assert.

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 205870)
+++ emit-rtl.c  (working copy)
@@ -1951,7 +1951,9 @@ change_address_1 (rtx memref, enum machi
   && (!validate || memory_address_addr_space_p (mode, addr, as)))
 return memref;

-  if (validate)
+  /*

Re: [C++ PATCH] Fix GC related issues in C++ FE (PR c++/58627)

2013-12-11 Thread Jason Merrill

It's only safe to free the targs if they weren't used to instantiate any 
templates, so I lean toward option #1.  Did you test this with strict gc?


Jason

Re: [RFA][PATCH][PR tree-optimization/45685]

2013-12-11 Thread Jeff Law


On 12/11/13 02:51, Richard Biener wrote:

+  /* If inversion is needed, first try to invert the test since
+ that's cheapest.  */
+  if (invert)
+{
+  enum tree_code new_code
+   = invert_tree_comparison (cond_code,
+ HONOR_NANS (TYPE_MODE (TREE_TYPE (rhs;


That looks wrong - you want to look at HONOR_NANS on the mode
of one of the comparison operands, not of the actual value you want
to negate (it's integer and thus never has NaNs).

HONOR_NANS (TYPE_MODE (TREE_TYPE (gimple_cond_lhs (cond

Right?

Jeff

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jan Hubicka

> On 12/11/2013 08:55 AM, Jan Hubicka wrote:
> +  /* Don't clone decls local to a comdat group.  */
> +  if (comdat_local_p (node))
> +return false;
> >>>
> >>>Why? It seems this should work if the clone becomes another comdat local?
> >>
> >>Right, I think the problem was that the clone wasn't becoming comdat
> >>local.  And for the specific case of decloning, there's no point in
> >>cloning the decloned constructor.
> >
> >If it does not make sense, how we ended up cloning it?
> 
> I guess the heuristics are making a mistake.
> 
> >Can you show me some code sample of decloning?
> 
> >I assume that we basically turn original cloned constructors into wrappers
> >around common "worker" that is now comdat local.
> 
> Right.
> 
> >I think ipa-cp may end up
> >deciding on clonning the wrapper that will break because it will end up 
> >static
> >calling the local comdat function.
> >On the other hand it may decide to clone both the wrapper and the actual 
> >function
> >and in that case bringing both static is fine.
> 
> In the case I'm looking at (g++.dg/torture/pr41257.C, with decloning
> forced on), we're cloning the local function in order to propagate
> in a '0' passed in from one of the wrappers.  This is pointless
> because the wrapper contains just the one call, so in any situation
> where cloning makes sense, inlining is better.

ipa-cp does not really contain heuristic to figure out where cloning makes more
sense than inlining basically because if we produce such clone, inliner will 
inline
it anyway.

It is harder than it may seem to isolate testcases where inlining is always a 
win.
> 
> So, it's probably possible to make it work to clone the comdat local
> function into another comdat local function, but it's not useful,
> and it's easier to just prevent it.

Yep, with current limited use of comdat locals I guess it makes sense.

Honza
> 
> Jason

[PATCH][ARM] Add C++ name mangling entry for poly64x2_t type

2013-12-11 Thread Kyrill Tkachov


Hi all,

Following up on the crypto intrinsics for AArch32 
(http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00365.html), here is a patch 
adding C++ name mangling for the poly64x2_t container type. This of course 
depends on the Crypto intrinsics patches going in.


Tested arm-none-eabi on a model.

Ok for trunk after the prerequisites go in?

Thanks,
Kyrill

2013-12-11  Kyrylo Tkachov  

* config/arm/arm.c (arm_mangle_map): Add entry for
__builtin_neon_poly64.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5e306d0..72c94e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29340,6 +29340,7 @@ static arm_mangle_map_entry arm_mangle_map[] = {
   { V4SFmode,  "__builtin_neon_sf", "19__simd128_float32_t" },
   { V16QImode, "__builtin_neon_poly8",  "17__simd128_poly8_t" },
   { V8HImode,  "__builtin_neon_poly16", "18__simd128_poly16_t" },
+  { V2DImode,  "__builtin_neon_poly64", "18__simd128_poly64_t" },
   { VOIDmode, NULL, NULL }
 };

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jan Hubicka

> On 12/11/2013 10:11 AM, Jason Merrill wrote:
> >So, it's probably possible to make it work to clone the comdat local
> >function into another comdat local function, but it's not useful, and
> >it's easier to just prevent it.
> 
> Well, not that much easier actually.  I'm attaching both a delta
> from my last patch and a complete patch against trunk.  Two
> questions:
> 
> Do you have an opinion about which way to implement symtab_in_same_comdat_p?

I would go with the #if 1 case.  Don't seem to justify a loop that is trivially
replaceable by comdat.

In longer term the plan is to move all those visibility fields from trees into 
symbols
and in that case I am sure we will end up having the comdat group name in there 
too.
> 
> Is ipa_passes the right place to initialize calls_comdat_local?

The flag is probably needed in both early inliner and IPA inliner.  A 
conservative
place to initialize it would be in inline_analyze_function.
(early inliner analyze function twice, first before inlining and next after
early optimization, so the update should also clear the flag if call 
disapeared).

I think we also want to clear it if call to comdat local function is marked 
inline.
This is only way we can end up inlining those constructors now, right?
inlining ctors/dtors of course is rather important for further optimization,
so we should be cureful to not regress optimization here too much.
We will still lose some propagation since inliner won't work out that it can
propagate something useful through the wrapper call (unlike ipa-cp that would if
it knew how to clone local comdats).

I am on way home, I will look in detail on patch next.

Honza

RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-11 Thread Iyer, Balaji V



> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez
> Sent: Tuesday, December 10, 2013 1:03 PM
> To: Iyer, Balaji V
> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> 
> >> But aren't both OpenMP and Cilk Plus simd clones marked as "omp
> >> declare simd"?  In which case you shouldn't have to do anything?
> >> Are the Cilk Plus clones not being marked as "omp declare simd" in
> >> the front-ends?
> >>
> >
> > Didn't you mention in this thread
> > (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk
> > Plus SIMD-enabled functions must be marked as "cilk plus elementals"
> > (as it wsa called then)? Did I misunderstand you?
> 
> Both OpenMP and Cilk Plus clones should be marked with "omp declare
> simd".  But Cilk Plus should _also_ be marked with "cilk plus elementals" (or
> "cilk simd function" now).  In which case the aforementioned function
> doesn't require any changes because Cilk Plus clones will be seen as they are
> marked with "omp declare simd".
> 
> So...
> 
>   OpenMP declare simd gets tagged as:
>   "omp declare simd"
>   Cilk Plus vector functions gets tagged as:
>   "omp declare simd"
>   "cilk simd function"
> 

Ok, so you want the same clauses included in both omp declare simd and cilk 
simd function tree lists?
 
For example in the c-parser.c, I have something like this:

  if (is_cilkplus_cilk_simd_fn)
c = build_tree_list (get_identifier ("cilk simd function"), c);
  else
c = build_tree_list (get_identifier ("omp declare simd"), c);
  TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
  DECL_ATTRIBUTES (fndecl) = c;


You want to change it something like this?


If (is_cilkplus_cilk_simd_fn)
  {
tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), 
c);
TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl);
DECL_ATTRIBUTES (fndecl) = c_cilk;
}
c = build_tree_list (get_identififer ("omp declare simd"), c);
TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl);
DECL_ATTRIBUTES (fndecl) = c;

Re: [PATCH] Fix PR 59390

2013-12-11 Thread Sriraman Tallam

On Wed, Dec 11, 2013 at 2:42 AM, Uros Bizjak  wrote:
> On Wed, Dec 11, 2013 at 10:15 AM, Bernhard Reutner-Fischer
>  wrote:
>
 Patch updated with two more tests to check if the vfmadd insn is being
 produced when possible.

 Thanks
 Sri

 On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam  
 wrote:
> Hi,
>
> I have attached a patch to fix
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review.
>
> Here is the problem. GCC adds target-specific builtins on demand. The
> FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via
> this declaration:
>
> void fun() __attribute__((target("fma")));
>
> Specifically, the builtin __builtin_ia32_vfmaddpd gets added when
> ix86_add_new_builtins is called from ix86_valid_target_attribute_tree
> when processing this target attribute.
>
> Now, when the vectorizer is processing the builtin "__builtin_fma" in
> function other_fn(), it checks to see if this function is vectorizable
> and calls ix86_builtin_vectorized_function in i386.c. That returns the
> builtin stored here:
>
>
> case BUILT_IN_FMA:
> if (out_mode == DFmode && in_mode == DFmode)
> {
>  if (out_n == 2 && in_n == 2)
>return ix86_builtins[IX86_BUILTIN_VFMADDPD];
>   
>
> ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE
> had the builtin not been added by the previous target attribute. That
> is why the code works if we remove the previous declaration.
>
> The fix is to not just return the builtin but to also check if the
> current function's isa allows the use of the builtin. For instance,
> this patch would solve the problem:
>
> @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, 
> tre
>if (out_mode == DFmode && in_mode == DFmode)
>   {
>if (out_n == 2 && in_n == 2)
> -return ix86_builtins[IX86_BUILTIN_VFMADDPD];
> +{
> +  if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa
> +  & global_options.x_ix86_isa_flags)
> +return ix86_builtins[IX86_BUILTIN_VFMADDPD];
> +  else
> + return NULL_TREE;
> +}
>
>
> but there are many instances of this usage in
> ix86_builtin_vectorized_function. This patch covers all the cases.
>>>
 PR target/59390
 * gcc.target/i386/pr59390.c: New test.
 * gcc.target/i386/pr59390_1.c: New test.
 * gcc.target/i386/pr59390_2.c: New test.
 * config/i386/i386.c (get_builtin): New function.
 (ix86_builtin_vectorized_function): Replace all instances of
 ix86_builtins[...] with get_builtin(...).
 (ix86_builtin_reciprocal): Ditto.
>>>
>>> OK, with a couple of nits:
>>>
>>> +static tree get_builtin (enum ix86_builtins code)
>>>
>>> Please name this function ix86_get_builtin.
>>>
>>> +  if (current_function_decl)
>>> +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
>>> +  if (target_tree)
>>> +opts = TREE_TARGET_OPTION (target_tree);
>>> +  else
>>> +opts = TREE_TARGET_OPTION (target_option_default_node);
>>> +
>>>
>>> opts = TREE_TARGET_OPTION (target_tree ? target_tree :
>>> target_option_default_node);
>>
>> Just curious:
>>> +static tree get_builtin (enum ix86_builtins code)
>>> +{
>> []
>>> +>
>> []
>>> @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre
>>>if (out_mode == DFmode && in_mode == DFmode)
>>> {
>>>   if (out_n == 2 && in_n == 2)
>>> -return ix86_builtins[IX86_BUILTIN_SQRTPD];
>>> +get_builtin (IX86_BUILTIN_SQRTPD);
>>>   else if (out_n == 4 && in_n == 4)
>>> -return ix86_builtins[IX86_BUILTIN_SQRTPD256];
>>> +get_builtin (IX86_BUILTIN_SQRTPD256);
>>> }
>>>break;
>>
>> I must be missing something?
>> Don't you have to return ix86_get_builtin(...) ?

Yes, I noticed it last evening when the vect tests broke. I fixed it
with adding the return (which was a typo when I did a bulk edit) and
running tests. I will include Uros's changes and commit the patch.

Thanks
Sri

>
> Huh, I didn't even notice this mistake.
>
> Uros.

Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-11 Thread Aldy Hernandez


On 12/11/13 09:31, Iyer, Balaji V wrote:




-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez
Sent: Tuesday, December 10, 2013 1:03 PM
To: Iyer, Balaji V
Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
Elemental functions) for C



But aren't both OpenMP and Cilk Plus simd clones marked as "omp
declare simd"?  In which case you shouldn't have to do anything?
Are the Cilk Plus clones not being marked as "omp declare simd" in
the front-ends?



Didn't you mention in this thread
(http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk
Plus SIMD-enabled functions must be marked as "cilk plus elementals"
(as it wsa called then)? Did I misunderstand you?


Both OpenMP and Cilk Plus clones should be marked with "omp declare
simd".  But Cilk Plus should _also_ be marked with "cilk plus elementals" (or
"cilk simd function" now).  In which case the aforementioned function
doesn't require any changes because Cilk Plus clones will be seen as they are
marked with "omp declare simd".

So...

OpenMP declare simd gets tagged as:
"omp declare simd"
Cilk Plus vector functions gets tagged as:
"omp declare simd"
"cilk simd function"



Ok, so you want the same clauses included in both omp declare simd and cilk 
simd function tree lists?

For example in the c-parser.c, I have something like this:

   if (is_cilkplus_cilk_simd_fn)
 c = build_tree_list (get_identifier ("cilk simd function"), c);
   else
 c = build_tree_list (get_identifier ("omp declare simd"), c);
   TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
   DECL_ATTRIBUTES (fndecl) = c;


You want to change it something like this?


If (is_cilkplus_cilk_simd_fn)
   {
tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), 
c);
TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl);
DECL_ATTRIBUTES (fndecl) = c_cilk;
}
c = build_tree_list (get_identififer ("omp declare simd"), c);
TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl);
DECL_ATTRIBUTES (fndecl) = c;




Yes.

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jan Hubicka

Hi,
while thinking of the details on how to handle comdat locals through 
ipa-cp/inliner/folding
I wonder, if we won't end up with better code going the oposite way.
We can declare those functions static first and then have post-inliner IPA pass 
that will
travel the callgraph and mark all static functions/variables that are accessed 
only from
one comdat to be comdat local. We already have issues here - for example when 
ipa-split
decides to split a comdat function, its part will end up output comdat block.
Similarly when we decide to produce a clone only to optimize comdat function, 
the clone
may easily become dead.  I also think with the C++ include jungle, people will 
end up
having static things referenced only from comdats, but I may be wrong.

This approach would have issues by itself tough
 1) we will need to do it at -O0 too and in this case we may want to have flag
for those decloned functions since we need to do it only for those
 2) since I dropped the ball on merging function labels patch, we may end up
privatizing function with a label that has address taken from static 
variable
that is not privatized.  We may just disable localization for such 
functions I suppose.
 3) we may surprise users who forget to annotate with used attribute.
 4) inliner does not handle static and comdats the same way, since it assumes 
comdat
will be optimized out with some probability.  This won't match on statics 
that will
later become part of comdat group.   Not big deal probably.

On the other hand my feeling is that this has a potential of code size savings 
and
may end up more robust/easier to maintain than support this somewhat 
counter-intuitive
context thorough the whole IPA optimization queue.

What do you think?

Honza

C++ edge_iterator (was: Re: [SH] PR 53976 - Add RTL pass to eliminate clrt, sett insns)

2013-12-11 Thread Oleg Endo

On Thu, 2013-11-21 at 00:04 +0100, Steven Bosscher wrote:
> Declaring the edge_iterator inside the for() is not a good argument
> against FOR_EACH_EDGE. Of course, brownie points are up for grabs for
> the brave soul daring enough to make edge iterators be proper C++
> iterators... ;-)

So, I gave it a try -- see the attached patch.
It allows edge iteration to look more like STL container iteration:

for (basic_block::edge_iterator ei = bb->pred_edges ().begin ();
 ei != bb->pred_edges ().end (); ++ei)
{
  basic_block pred_bb = (*ei)->src;
  ...
}

The idea is to first make class vec look more like an STL container.
This would also allow iterating it with std::for_each and use C++11
range based for loop (in a few years maybe).  It would also make it
easier to replace vec uses with STL containers if needed and vice versa.

Then the
typedef struct basic_block_def* basic_block;

is replaced with a wrapper class 'basic_block', which is just a simple
POD wrapper around a basic_block_def*.  There should be no penalties
compared to passing/storing raw pointers.  Because of the union with
constructor restriction of C++98 an additional wrapper class
'basic_block_in_union' is required, which doesn't have any constructors
defined.

Having 'basic_block' as a class allows putting typedefs for the edge
iterator types in there (initially I tried putting the typedefs into
struct basic_block_def, but gengtype would bail out).
It would also be possible to have a free standing definition / typedef
of edge_iterator, but it would conflict with the existing one and
require too many changes at once.  Moreover, the iterator type actually
depends on the container type, which is vec, and the
container type is defined/selected by the basic_block class.

The following
  basic_block pred_bb = (*ei)->src;

can also be written as
  basic_block pred_bb = ei->src;

after converting the edge typedef to a wrapper of edge_def*.

The idea of the approach is to allow co-existence of the new
edge_iterator and the old and thus be able to gradually convert code.
The wrappers around raw pointers also helo encapsulating the underlying
memory management issues.  For example, it would be much easier to
replace garbage collected objects with intrusive reference counting.

Comments and feedback appreciated.

Cheers,
Oleg
Index: gcc/coretypes.h
===
--- gcc/coretypes.h	(revision 205801)
+++ gcc/coretypes.h	(working copy)
@@ -153,8 +153,8 @@
 typedef struct edge_def *edge;
 typedef const struct edge_def *const_edge;
 struct basic_block_def;
-typedef struct basic_block_def *basic_block;
-typedef const struct basic_block_def *const_basic_block;
+class basic_block;
+class const_basic_block;

 #define obstack_chunk_alloc	((void *(*) (long)) xmalloc)
 #define obstack_chunk_free	((void (*) (void *)) free)
Index: gcc/tracer.c
===
--- gcc/tracer.c	(revision 205801)
+++ gcc/tracer.c	(working copy)
@@ -102,7 +102,7 @@

   /* A transaction is a single entry multiple exit region.  It must be
  duplicated in its entirety or not at all.  */
-  g = last_stmt (CONST_CAST_BB (bb));
+  g = last_stmt (basic_block (bb));
   if (g && gimple_code (g) == GIMPLE_TRANSACTION)
 return true;

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c	(revision 205801)
+++ gcc/emit-rtl.c	(working copy)
@@ -4446,7 +4446,7 @@
 emit_note_after (enum insn_note subtype, rtx after)
 {
   rtx note = make_note_raw (subtype);
-  basic_block bb = BARRIER_P (after) ? NULL : BLOCK_FOR_INSN (after);
+  basic_block bb = BARRIER_P (after) ? (basic_block_def*)NULL : (basic_block_def*)BLOCK_FOR_INSN (after);
   bool on_bb_boundary_p = (bb != NULL && BB_END (bb) == after);

   if (note_outside_basic_block_p (subtype, on_bb_boundary_p))
@@ -4462,7 +4462,7 @@
 emit_note_before (enum insn_note subtype, rtx before)
 {
   rtx note = make_note_raw (subtype);
-  basic_block bb = BARRIER_P (before) ? NULL : BLOCK_FOR_INSN (before);
+  basic_block bb = BARRIER_P (before) ? (basic_block_def*)NULL : (basic_block_def*)BLOCK_FOR_INSN (before);
   bool on_bb_boundary_p = (bb != NULL && BB_HEAD (bb) == before);

   if (note_outside_basic_block_p (subtype, on_bb_boundary_p))
Index: gcc/vec.h
===
--- gcc/vec.h	(revision 205801)
+++ gcc/vec.h	(working copy)
@@ -482,6 +482,15 @@
   void quick_grow (unsigned len);
   void quick_grow_cleared (unsigned len);

+  /* STL like iterator interface.  */
+  typedef T* iterator;
+  typedef const T* const_iterator;
+
+  iterator begin (void) { return &m_vecdata[0]; }
+  iterator end (void) { return &m_vecdata[m_vecpfx.m_num]; }
+  const_iterator begin (void) const { return &m_vecdata[0]; }
+  const_iterator end (void) const { &m_vecdata[m_vecpfx.m_num]; }
+
   /* vec class can access our internal data and functions.  */
   te

Re: [wide-int] small cleanup in wide-int.*

2013-12-11 Thread Kenneth Zadeck


On 12/09/2013 10:01 AM, Richard Biener wrote:

On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck  wrote:

On 12/08/2013 05:35 AM, Richard Biener wrote:

Richard Sandiford  wrote:

Kenneth Zadeck  writes:

 #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

  (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact

multiple

of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are

settled.

I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as
MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the
change above.  IMO it doesn't make sense to both round up the

division

and also add 1 to the result.  So I think we should make this change
regardless of whatever follows.

Looks good to me otherwise, thanks.

Richard


so this one works the way you want.While it is true the the

problems

are disjoint, the solution will likely evolving change the same lines

of

source in two different ways.

Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we
are)
I think we should change it to something that makes sense.  All we want
is
1 extra bit.  We don't need to round up the size and then add a whole
HWI on top.  Doing that will potentially pessimise targets that don't
need anything beyond SImode.

I agree.

Richard.

Done, ok to commit?

+   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
+   can accomodate at least 1 more bit so that unsigned numbers of that
+   mode can be represented.  Note that it is still possible to create
+   fixed_wide_ints that have precisions greater than
+   MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
+   double-width multiplication result, for example.  */
  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))
+

that doesn't reserve 1 more bit, it just rounds up.  For one more bit you
need to add 1, so

  #define WIDE_INT_MAX_ELTS \
+  (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))

no?

Otherwise ok.

Thanks,
Richard.


kenny



It's not like I can approve the patch anyway though, so I'll leave it
there.

Thanks,
Richard



committed as revision 205900 after rebootstrap.

kenny
Index: gcc/tree.h
===
--- gcc/tree.h	(revision 205897)
+++ gcc/tree.h	(working copy)
@@ -4545,7 +4545,7 @@ namespace wi
 static const unsigned int precision = N;
   };
 
-  generic_wide_int  >
+  generic_wide_int  >
   to_widest (const_tree);
 
   generic_wide_int  > to_offset (const_tree);
@@ -4566,7 +4566,7 @@ wi::int_traits ::decompose (
 			  precision);
 }
 
-inline generic_wide_int  >
+inline generic_wide_int  >
 wi::to_widest (const_tree t)
 {
   return t;
@@ -4605,7 +4605,7 @@ wi::extended_tree ::get_len () const
 {
   if (N == ADDR_MAX_PRECISION)
 return TREE_INT_CST_OFFSET_NUNITS (m_t);
-  else if (N == MAX_BITSIZE_MODE_ANY_INT)
+  else if (N >= WIDE_INT_MAX_PRECISION)
 return TREE_INT_CST_EXT_NUNITS (m_t);
   else
 /* This class is designed to be used for specific output precisions
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 205897)
+++ gcc/tree-vrp.c	(working copy)
@@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_
 
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
 
   if (range_int_cst_p (&vr0)
 	  && range_int_cst_p (&vr1)
 	  && TYPE_OVERFLOW_WRAPS (expr_type))
 	{
-	  wide_int sizem1 = wi::mask (prec, false, prec2);
-	  wide_int size = sizem1 + 1;
+	  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
+	  typedef generic_wide_int
+  > vrp_int_cst;
+	  vrp_int sizem1 = wi::mask  (prec, false);
+	  vrp_int size = sizem1 + 1;
 
 	  /* Extend the values using the sign of the result to PREC2.
 	 From here on out, everthing is just signed math no matter
 	 what the input types were.  */
-	  wide_int min0 = wide_int::from (vr0.min, prec2, sign);
-	  wide_int max0 = wide_int::from (vr0.max, prec2, sign);
-	  wide_int min1 = wide_int::from (vr1.min, prec2, sign);
-	  wide_int max1 = wide_int::from (vr1.max, prec2, sign);
-
+  vrp_int min0 = vrp_int_cst (vr0.min);
+  vrp_int max0 = vrp_int_cst (vr0.max);
+  vrp_int min1 = vrp_int_cst (vr1.min);
+  vrp_int max1 = vrp_int_cst (vr1.max);
 	  /* Canonicalize the intervals.  */
 	  if (sign == UNSIGNED)

RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-11 Thread Iyer, Balaji V



> -Original Message-
> From: Aldy Hernandez [mailto:al...@redhat.com]
> Sent: Wednesday, December 11, 2013 12:38 PM
> To: Iyer, Balaji V
> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> On 12/11/13 09:31, Iyer, Balaji V wrote:
> >
> >
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez
> >> Sent: Tuesday, December 10, 2013 1:03 PM
> >> To: Iyer, Balaji V
> >> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> >> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> >> Elemental functions) for C
> >>
> >>
>  But aren't both OpenMP and Cilk Plus simd clones marked as "omp
>  declare simd"?  In which case you shouldn't have to do anything?
>  Are the Cilk Plus clones not being marked as "omp declare simd" in
>  the front-ends?
> 
> >>>
> >>> Didn't you mention in this thread
> >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk
> >>> Plus SIMD-enabled functions must be marked as "cilk plus elementals"
> >>> (as it wsa called then)? Did I misunderstand you?
> >>
> >> Both OpenMP and Cilk Plus clones should be marked with "omp declare
> >> simd".  But Cilk Plus should _also_ be marked with "cilk plus
> >> elementals" (or "cilk simd function" now).  In which case the
> >> aforementioned function doesn't require any changes because Cilk Plus
> >> clones will be seen as they are marked with "omp declare simd".
> >>
> >> So...
> >>
> >>OpenMP declare simd gets tagged as:
> >>"omp declare simd"
> >>Cilk Plus vector functions gets tagged as:
> >>"omp declare simd"
> >>"cilk simd function"
> >>
> >
> > Ok, so you want the same clauses included in both omp declare simd and
> cilk simd function tree lists?
> >
> > For example in the c-parser.c, I have something like this:
> >
> >if (is_cilkplus_cilk_simd_fn)
> >  c = build_tree_list (get_identifier ("cilk simd function"), c);
> >else
> >  c = build_tree_list (get_identifier ("omp declare simd"), c);
> >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
> >DECL_ATTRIBUTES (fndecl) = c;
> >
> >
> > You want to change it something like this?
> >
> >
> > If (is_cilkplus_cilk_simd_fn)
> >{
> > tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), 
> > c);
> > TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl);
> > DECL_ATTRIBUTES (fndecl) = c_cilk;
> > }
> > c = build_tree_list (get_identififer ("omp declare simd"), c);
> > TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) =
> > c;
> >
> >
> 
> Yes.

The issue with doing this is that it is creating duplicate clones. If I just 
kept the patch as is, it does not.

For example: 

__attribute__((vector (vectorlength(sizeof(int)
int func3 (int x)
{
  return x;
}

Should create two clones: mask and unmask (_ZGVbN4v_func3 and _ZGVbM4v_func3). 
It is doing that if I kept the patch AS-IS.

Now, if I make the modification that I mentioned above and remove the handling 
of cilk simd function in omp-low.c it is creating  6 clones: _ZGVdM4v_func3, 
_ZGVbN4v_func3, _ZGVcN4v_func3, _ZGVcM4v_func3, _ZGVdN4v_func3

Thanks,

Balaji V. Iyer.

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Vladimir Makarov


On 12/11/2013, 5:35 AM, Yvan Roux wrote:

Hi Vladimir,

I've some regressions on ARM after this SP elimination patch, and they
are execution failures.  Here is the list:

g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
gcc.c-torture/execute/va-arg-22.c  -O2
gcc.dg/atomic/c11-atomic-exec-5.c  -O0
gfortran.dg/direct_io_12.f90  -O[23]
gfortran.dg/elemental_dependency_1.f90  -O2
gfortran.dg/matmul_2.f90  -O2
gfortran.dg/matmul_6.f90  -O2
gfortran.dg/mvbits_7.f90  -O3
gfortran.dg/unlimited_polymorphic_1.f03  -O3

I reduced and looked at var-arg-22.c and the issue is that in
lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
we try to do here is to change the pseudo 195 of the insn 118 below :

(insn 118 114 112 8 (set (reg:DI 195)
 (unspec:DI [
 (mem:DI (plus:SI (reg/f:SI 215)
 (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
+ 64B]+8 S8 A8])
 ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
  (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
 (const_int 8 [0x8])) [7 a35+8 S8 A32])
 (nil)))

with its equivalent (x arg of lra_eliminate_regs_1):

(mem/c:DI (plus:SI (reg/f:SI 102 sfp)
 (const_int 76 [0x4c])) [7 a35+8 S8 A32])

lra_eliminate_regs_1 is called with full_p = true (it is not really
clear for what it means),


It means we use full offset between the regs, otherwise we use change in 
the full offset from the previous iteration (it can be changed as we 
reserve stack memory for spilled pseudos and the reservation can be done 
several times).  As equiv value is stored as it was before any 
elimination, we need always to use full offset to make elimination.


 but in the PLUS switch case, we have offset

= 0xb (given by ep->offset) and  as lra_get_insn_recog_data
(insn)->sp_offset value is 0, we will indeed add 0xb to the original
0x4c offset.



0 value is suspicious because it is default.  We might have not set up 
it from neighbor insns.




So, here I don't get if it is the sp_offset value of the
lra_insn_recog_data element which is not well updated or if lra_
eliminate_regs_1 has to be called with update_p and not full_p (which
fixed the value in that particular case).  Is it more obvious for you
?



Yvan, could you send me the reduced preprocessed case and the options 
for cc1 to reproduce it.

RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-11 Thread Aldy Hernandez

Can you investigate why it is creating multiple clones?

On Dec 11, 2013 10:06 AM, "Iyer, Balaji V"  wrote:
>
>
>
> > -Original Message- 
> > From: Aldy Hernandez [mailto:ald

> -Original Message-
> From: Aldy Hernandez [mailto:al...@redhat.com]
> Sent: Wednesday, December 11, 2013 12:38 PM
> To: Iyer, Balaji V
> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> On 12/11/13 09:31, Iyer, Balaji V wrote:
> >
> >
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez
> >> Sent: Tuesday, December 10, 2013 1:03 PM
> >> To: Iyer, Balaji V
> >> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> >> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> >> Elemental functions) for C
> >>
> >>
>  But aren't both OpenMP and Cilk Plus simd clones marked as "omp
>  declare simd"?  In which case you shouldn't have to do anything?
>  Are the Cilk Plus clones not being marked as "omp declare simd" in
>  the front-ends?
> 
> >>>
> >>> Didn't you mention in this thread
> >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk
> >>> Plus SIMD-enabled functions must be marked as "cilk plus elementals"
> >>> (as it wsa called then)? Did I misunderstand you?
> >>
> >> Both OpenMP and Cilk Plus clones should be marked with "omp declare
> >> simd".  But Cilk Plus should _also_ be marked with "cilk plus
> >> elementals" (or "cilk simd function" now).  In which case the
> >> aforementioned function doesn't require any changes because Cilk Plus
> >> clones will be seen as they are marked with "omp declare simd".
> >>
> >> So...
> >>
> >>OpenMP declare simd gets tagged as:
> >>"omp declare simd"
> >>Cilk Plus vector functions gets tagged as:
> >>"omp declare simd"
> >>"cilk simd function"
> >>
> >
> > Ok, so you want the same clauses included in both omp declare simd and
> cilk simd function tree lists?
> >
> > For example in the c-parser.c, I have something like this:
> >
> >if (is_cilkplus_cilk_simd_fn)
> >  c = build_tree_list (get_identifier ("cilk simd function"), c);
> >else
> >  c = build_tree_list (get_identifier ("omp declare simd"), c);
> >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
> >DECL_ATTRIBUTES (fndecl) = c;
> >
> >
> > You want to change it something like this?
> >
> >
> > If (is_cilkplus_cilk_simd_fn)
> >{
> > tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), 
> > c);
> > TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl);
> > DECL_ATTRIBUTES (fndecl) = c_cilk;
> > }
> > c = build_tree_list (get_identififer ("omp declare simd"), c);
> > TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) =
> > c;
> >
> >
> 
> Yes.

The issue with doing this is that it is creating duplicate clones. If I just 
kept the patch as is, it does not.

For example: 

__attribute__((vector (vectorlength(sizeof(int)
int func3 (int x)
{
  return x;
}

Should create two clones: mask and unmask (_ZGVbN4v_func3 and _ZGVbM4v_func3). 
It is doing that if I kept the patch AS-IS.

Now, if I make the modification that I mentioned above and remove the handling 
of cilk simd function in omp-low.c it is creating  6 clones: _ZGVdM4v_func3, 
_ZGVbN4v_func3, _ZGVcN4v_func3, _ZGVcM4v_func3, _ZGVdN4v_func3

Thanks,

Balaji V. Iyer.

RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-11 Thread Iyer, Balaji V

Hi Aldy,

> -Original Message-
> From: Aldy Hernandez [mailto:al...@redhat.com]
> Sent: Wednesday, December 11, 2013 1:27 PM
> To: Iyer, Balaji V
> Cc: Jakub Jelinek; 'gcc-patches@gcc.gnu.org'
> Subject: RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> Can you investigate why it is creating multiple clones?
> 

I can and will let you know as soon as I find it. I am working on GOMP4 branch, 
could that cause anything? 

Just out of curiosity, why can't I keep it as-is? It is giving the correct 
output/behavior and doesn't seem to interfere with anything else. The only 
extra thing I am doing is to add an extra if-statement while recursing through 
all the functions to check for cilk simd function and then calling the function 
to handle it, which the OMP will have to do anyway. The only extra thing I 
added was an extra if-statement.

Please don't read my above paragraph as an argument. I am just asking for 
clarification and to understand more about the underlying routines.

Thanks,

Balaji V. Iyer.

> On Dec 11, 2013 10:06 AM, "Iyer, Balaji V"  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Aldy Hernandez [mailto:ald
> 
> > -Original Message-
> > From: Aldy Hernandez [mailto:al...@redhat.com]
> > Sent: Wednesday, December 11, 2013 12:38 PM
> > To: Iyer, Balaji V
> > Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> > Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> > Elemental functions) for C
> >
> > On 12/11/13 09:31, Iyer, Balaji V wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > >> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez
> > >> Sent: Tuesday, December 10, 2013 1:03 PM
> > >> To: Iyer, Balaji V
> > >> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org'
> > >> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions
> > >> (formerly Elemental functions) for C
> > >>
> > >>
> >  But aren't both OpenMP and Cilk Plus simd clones marked as "omp
> >  declare simd"?  In which case you shouldn't have to do anything?
> >  Are the Cilk Plus clones not being marked as "omp declare simd"
> >  in the front-ends?
> > 
> > >>>
> > >>> Didn't you mention in this thread
> > >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that
> > >>> Cilk Plus SIMD-enabled functions must be marked as "cilk plus
> elementals"
> > >>> (as it wsa called then)? Did I misunderstand you?
> > >>
> > >> Both OpenMP and Cilk Plus clones should be marked with "omp declare
> > >> simd".  But Cilk Plus should _also_ be marked with "cilk plus
> > >> elementals" (or "cilk simd function" now).  In which case the
> > >> aforementioned function doesn't require any changes because Cilk
> > >> Plus clones will be seen as they are marked with "omp declare simd".
> > >>
> > >> So...
> > >>
> > >>  OpenMP declare simd gets tagged as:
> > >>  "omp declare simd"
> > >>  Cilk Plus vector functions gets tagged as:
> > >>  "omp declare simd"
> > >>  "cilk simd function"
> > >>
> > >
> > > Ok, so you want the same clauses included in both omp declare simd
> > > and
> > cilk simd function tree lists?
> > >
> > > For example in the c-parser.c, I have something like this:
> > >
> > >if (is_cilkplus_cilk_simd_fn)
> > >  c = build_tree_list (get_identifier ("cilk simd function"), c);
> > >else
> > >  c = build_tree_list (get_identifier ("omp declare simd"), c);
> > >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
> > >DECL_ATTRIBUTES (fndecl) = c;
> > >
> > >
> > > You want to change it something like this?
> > >
> > >
> > > If (is_cilkplus_cilk_simd_fn)
> > >{
> > >   tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), 
> > > c);
> > >   TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl);
> > >   DECL_ATTRIBUTES (fndecl) = c_cilk;
> > > }
> > > c = build_tree_list (get_identififer ("omp declare simd"), c);
> > > TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl)
> =
> > > c;
> > >
> > >
> >
> > Yes.
> 
> The issue with doing this is that it is creating duplicate clones. If I just 
> kept the
> patch as is, it does not.
> 
> For example:
> 
> __attribute__((vector (vectorlength(sizeof(int) int func3 (int x) {
>   return x;
> }
> 
> Should create two clones: mask and unmask (_ZGVbN4v_func3 and
> _ZGVbM4v_func3). It is doing that if I kept the patch AS-IS.
> 
> Now, if I make the modification that I mentioned above and remove the
> handling of cilk simd function in omp-low.c it is creating  6 clones:
> _ZGVdM4v_func3, _ZGVbN4v_func3, _ZGVcN4v_func3, _ZGVcM4v_func3,
> _ZGVdN4v_func3
> 
> Thanks,
> 
> Balaji V. Iyer.

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Yvan Roux

On 11 December 2013 19:25, Vladimir Makarov  wrote:
> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>
>> Hi Vladimir,
>>
>> I've some regressions on ARM after this SP elimination patch, and they
>> are execution failures.  Here is the list:
>>
>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>> gcc.c-torture/execute/va-arg-22.c  -O2
>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>> gfortran.dg/direct_io_12.f90  -O[23]
>> gfortran.dg/elemental_dependency_1.f90  -O2
>> gfortran.dg/matmul_2.f90  -O2
>> gfortran.dg/matmul_6.f90  -O2
>> gfortran.dg/mvbits_7.f90  -O3
>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>
>> I reduced and looked at var-arg-22.c and the issue is that in
>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>
>> (insn 118 114 112 8 (set (reg:DI 195)
>>  (unspec:DI [
>>  (mem:DI (plus:SI (reg/f:SI 215)
>>  (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>> + 64B]+8 S8 A8])
>>  ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>   (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>  (nil)))
>>
>> with its equivalent (x arg of lra_eliminate_regs_1):
>>
>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>  (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>
>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>> clear for what it means),
>
>
> It means we use full offset between the regs, otherwise we use change in the
> full offset from the previous iteration (it can be changed as we reserve
> stack memory for spilled pseudos and the reservation can be done several
> times).  As equiv value is stored as it was before any elimination, we need
> always to use full offset to make elimination.

Ok thanks it's clearer.

>  but in the PLUS switch case, we have offset
>>
>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>> 0x4c offset.
>>
>
> 0 value is suspicious because it is default.  We might have not set up it
> from neighbor insns.
>
>
>
>> So, here I don't get if it is the sp_offset value of the
>> lra_insn_recog_data element which is not well updated or if lra_
>> eliminate_regs_1 has to be called with update_p and not full_p (which
>> fixed the value in that particular case).  Is it more obvious for you
>> ?
>>
>
> Yvan, could you send me the reduced preprocessed case and the options for
> cc1 to reproduce it.


Here is cc1 command line :

cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
-mfpu=neon -mthumb  v2.c  -O2

I use a native build on a chromebook, but it's reproducible with a
cross compiler.

With the attached test case the issue is when processing insn 118.

Thanks,
Yvan
typedef __builtin_va_list __gnuc_va_list;
typedef __gnuc_va_list va_list;

extern void abort (void);
extern void exit (int);


void bar (int n, int c)
{
  static int lastn = -1, lastc = -1;

  if (lastn != n) {
if (lastc != lastn)
  abort ();
lastc = 0;
lastn = n;
  }

  if (c != (char) (lastc ^ (n << 3)))
abort ();
  lastc++;
}

 typedef struct { char x[31]; } A31;
 typedef struct { char x[32]; } A32;
 typedef struct { char x[35]; } A35;
 typedef struct { char x[72]; } A72;

void foo (int size, ...)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;


  va_list ap;

  int i;

  if (size != 21) abort ();

  __builtin_va_start(ap,size);

 a31 = __builtin_va_arg(ap,typeof (a31));
 for (i = 0; i < 31; i++) bar (31, a31.x[i]);
 a32 = __builtin_va_arg(ap,typeof (a32));
 for (i = 0; i < 32; i++) bar (32, a32.x[i]);
 a35 = __builtin_va_arg(ap,typeof (a35));
 for (i = 0; i < 35; i++) bar (35, a35.x[i]);
 a72 = __builtin_va_arg(ap,typeof (a72));
 for (i = 0; i < 72; i++) bar (72, a72.x[i]);

  __builtin_va_end(ap);

}

int main (void)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;
 int i;
 for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3);
 for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3);
 for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3);
 for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3);

  foo (21, a31, a32, a35, a72);
  exit (0);

}

[PATCH] Enable Cilk keywords in Cilk Runtime

2013-12-11 Thread Iyer, Balaji V

Hello Everyone,
Since we have _Cilk_spawn and _Cilk_sync support in C++ compiler, we 
can enable the keyword usage in runtime. This patch should do so.

Is it Ok to install?

Here are the ChangeLog entries:

2013-12-11  Balaji V. Iyer  

* Makefile.am (GENERAL_FLAGS): Removed un-defining of Cilk keywords.
* Makefile.in: Regenerate.
* runtime/symbol_test.c: Added a #define to clear out _Cilk_for.

Thanks,

Balaji V. Iyer.


Here is the patch.

diff --git a/libcilkrts/Makefile.am b/libcilkrts/Makefile.am
index 56bc9eb..d252087 100644
--- a/libcilkrts/Makefile.am
+++ b/libcilkrts/Makefile.am
@@ -38,7 +38,7 @@ ACLOCAL_AMFLAGS = -I .. -I ../config

 # Compiler and linker flags.
 GENERAL_FLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/runtime 
-I$(top_srcdir)/runtime/config/$(config_dir) -DIN_CILK_RUNTIME=1
-GENERAL_FLAGS += -D_Cilk_spawn="" -D_Cilk_sync="" -D_Cilk_for=for
+# GENERAL_FLAGS += -D_Cilk_spawn="" -D_Cilk_sync="" -D_Cilk_for=for

 # Enable Intel Cilk Plus extension
 GENERAL_FLAGS += -fcilkplus
diff --git a/libcilkrts/Makefile.in b/libcilkrts/Makefile.in
index 5066bef..94902e1 100644
--- a/libcilkrts/Makefile.in
+++ b/libcilkrts/Makefile.in
@@ -342,12 +342,12 @@ AUTOMAKE_OPTIONS = foreign
 ACLOCAL_AMFLAGS = -I .. -I ../config

 # Compiler and linker flags.
+# GENERAL_FLAGS += -D_Cilk_spawn="" -D_Cilk_sync="" -D_Cilk_for=for

 # Enable Intel Cilk Plus extension
 GENERAL_FLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/runtime \
-I$(top_srcdir)/runtime/config/$(config_dir) \
-   -DIN_CILK_RUNTIME=1 -D_Cilk_spawn="" -D_Cilk_sync="" \
-   -D_Cilk_for=for -fcilkplus
+   -DIN_CILK_RUNTIME=1 -fcilkplus
 AM_CFLAGS = $(GENERAL_FLAGS) -std=c99
 AM_CPPFLAGS = $(GENERAL_FLAGS)
 AM_LDFLAGS = -lpthread
diff --git a/libcilkrts/runtime/symbol_test.c b/libcilkrts/runtime/symbol_test.c
index 1113ecd..8291d36 100644
--- a/libcilkrts/runtime/symbol_test.c
+++ b/libcilkrts/runtime/symbol_test.c
@@ -41,6 +41,7 @@
  * will cause a linker error.
  */

+#define _Cilk_for for
 extern void* __cilkrts_global_state;
 void *volatile p;

Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-11 Thread Eric Botcazou

> Yes we do, even for struct { struct { int a; char a[1] } }; (note the not
> really "trailing" as there is padding after the trailing array).  We do
> take size limitations from a DECL (if we see one) into account to limit the
> effect of this trailing-array-supporting, so it effectively only applies to
> indirect accesses (and the padding example above, you can use the whole
> padding if DECL_SIZE allows that).

OK, so we want the attached patch?  FWIW it passed

  make -k check-c check-c++ RUNTESTFLAGS="compat.exp struct-layout-1.exp"

on x86/Linux, x86-64/Linux, PowerPC/Linux [*], IA-64/Linux, SPARC/Solaris and 
SPARC64/Solaris with ALT_CC_UNDER_TEST set to the unpatched compiler.

[*] the failures (DFP related) are the same as with the unpatched compiler.

-- 
Eric BotcazouIndex: stor-layout.c
===
--- stor-layout.c	(revision 205881)
+++ stor-layout.c	(working copy)
@@ -1605,6 +1605,15 @@ compute_record_mode (tree type)
 	  || ! tree_fits_uhwi_p (DECL_SIZE (field)))
 	return;
 
+  /* As a GNU extension, we support out-of-bounds accesses for a trailing
+	 array in a record type.  In this case, if the element type has a non
+	 zero size, then the record type effectively has variable size so it
+	 needs to have BLKmode.  */
+  if (!DECL_CHAIN (field)
+	  && TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE
+	  && !integer_zerop (TYPE_SIZE (TREE_TYPE (TREE_TYPE (field)
+	return;
+
   /* If this field is the whole struct, remember its mode so
 	 that, say, we can put a double in a class into a DF
 	 register instead of forcing it to live in the stack.  */

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jason Merrill


On 12/11/2013 12:14 PM, Jan Hubicka wrote:

Is ipa_passes the right place to initialize calls_comdat_local?


The flag is probably needed in both early inliner and IPA inliner.  A 
conservative
place to initialize it would be in inline_analyze_function.
(early inliner analyze function twice, first before inlining and next after
early optimization, so the update should also clear the flag if call 
disapeared).


Unfortunately early inlining doesn't call inline_analyze_function on all 
functions, so we need to initialize it somewhere else.  Is there a 
reason not to set up an initial value in ipa_passes?



I think we also want to clear it if call to comdat local function is marked 
inline.


Makes sense.

Jason

Re: [RFA][PATCH][PR tree-optimization/45685]

2013-12-11 Thread Jeff Law


On 12/11/13 02:51, Richard Biener wrote:


That looks wrong - you want to look at HONOR_NANS on the mode
of one of the comparison operands, not of the actual value you want
to negate (it's integer and thus never has NaNs).

The rest of the patch looks ok to me.
Here's the updated version.  It addresses the two issues you raised. 
Specifically it adds the hairy condition to avoid this code in cases 
where it's not likely to be useful and it fixes the call to 
invert_tree_comparison.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu
OK for the trunk?

jeff

PR tree-optimization/45685
* tree-ssa-phiopt.c (neg_replacement): New function.
(tree_ssa_phiopt_worker): Call it.

PR tree-optimization/45685
* gcc.dg/tree-ssa/pr45685.c: New test.


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45685.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45685.c
new file mode 100644
index 000..0628943
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45685.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-phiopt1-details" } */
+
+typedef unsigned long int uint64_t;
+typedef long int int64_t;
+int summation_helper_1(int64_t* products, uint64_t count)
+{
+   int s = 0;
+   uint64_t i;
+   for(i=0; i0) ? 1 : -1;
+   products[i] *= val;
+   if(products[i] != i)
+   val = -val;
+   products[i] = val;
+   s += val;
+   }
+   return s;
+}
+
+
+int summation_helper_2(int64_t* products, uint64_t count)
+{
+   int s = 0;
+   uint64_t i;
+   for(i=0; i0) ? 1 : -1;
+   products[i] *= val;
+   if(products[i] != i)
+   val = -val;
+   products[i] = val;
+   s += val;
+   }
+   return s;
+}
+
+/* { dg-final { scan-tree-dump-times "converted to straightline code" 2 
"phiopt1" } } */
+/* { dg-final { cleanup-tree-dump "phiopt1" } } */
+
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 11e565f..96154fb 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -69,6 +69,8 @@ static bool minmax_replacement (basic_block, basic_block,
edge, edge, gimple, tree, tree);
 static bool abs_replacement (basic_block, basic_block,
 edge, edge, gimple, tree, tree);
+static bool neg_replacement (basic_block, basic_block,
+edge, edge, gimple, tree, tree);
 static bool cond_store_replacement (basic_block, basic_block, edge, edge,
struct pointer_set_t *);
 static bool cond_if_else_store_replacement (basic_block, basic_block, 
basic_block);
@@ -336,6 +338,23 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
 /* Calculate the set of non-trapping memory accesses.  */
 nontrap = get_non_trapping ();
 
+  /* The replacement of conditional negation with a non-branching
+ sequence is really only a win when optimizing for speed and we
+ can avoid transformations by gimple if-conversion that result
+ in poor RTL generation.
+
+ Ideally either gimple if-conversion or the RTL expanders will
+ be improved and the code to emit branchless conditional negation
+ can be removed.  */
+  bool replace_conditional_negation = false;
+  if (!do_store_elim)
+replace_conditional_negation
+  = ((!optimize_size && optimize >= 2)
+|| (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
+ && flag_tree_loop_if_convert != 0)
+|| flag_tree_loop_if_convert == 1
+|| flag_tree_loop_if_convert_stores == 1));
+
   /* Search every basic block for COND_EXPR we may be able to optimize.
 
  We walk the blocks in order that guarantees that a block with
@@ -489,6 +508,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
cfgchanged = true;
  else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
+ else if (replace_conditional_negation
+  && neg_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+   cfgchanged = true;
  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
}
@@ -1285,6 +1307,143 @@ abs_replacement (basic_block cond_bb, basic_block 
middle_bb,
   return true;
 }
 
+/*  The function neg_replacement replaces conditional negation with
+equivalent straight line code.  Returns TRUE if replacement is done,
+otherwise returns FALSE.
+
+COND_BB branches around negation occuring in MIDDLE_BB.
+
+E0 and E1 are edges out of COND_BB.  E0 reaches MIDDLE_BB and
+E1 reaches the other successor which should contain PHI with
+arguments ARG0 and ARG1.
+
+Assuming negation is to occur when the condition is true,
+then the non-branching sequence is:
+
+   result = (rhs ^ -cond) + cond
+
+Inverting t

Re: [PATCH] Enable Cilk keywords in Cilk Runtime

2013-12-11 Thread Jeff Law


On 12/11/13 12:04, Iyer, Balaji V wrote:

Hello Everyone,
Since we have _Cilk_spawn and _Cilk_sync support in C++ compiler, we 
can enable the keyword usage in runtime. This patch should do so.

Is it Ok to install?

Here are the ChangeLog entries:

2013-12-11  Balaji V. Iyer  

 * Makefile.am (GENERAL_FLAGS): Removed un-defining of Cilk keywords.
 * Makefile.in: Regenerate.
 * runtime/symbol_test.c: Added a #define to clear out _Cilk_for.

This is fine.
jeff

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jason Merrill


On 12/11/2013 12:45 PM, Jan Hubicka wrote:

I wonder, if we won't end up with better code going the oposite way.
We can declare those functions static first and then have post-inliner IPA pass 
that will
travel the callgraph and mark all static functions/variables that are accessed 
only from
one comdat to be comdat local.


With that scheme I would be concerned that we would tend to inline the 
wrappers so that we can't move the worker function into the comdat, and 
we end up with copies of it in lots of object files.


Jason

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jan Hubicka

> On 12/11/2013 12:45 PM, Jan Hubicka wrote:
> >I wonder, if we won't end up with better code going the oposite way.
> >We can declare those functions static first and then have post-inliner IPA 
> >pass that will
> >travel the callgraph and mark all static functions/variables that are 
> >accessed only from
> >one comdat to be comdat local.
> 
> With that scheme I would be concerned that we would tend to inline
> the wrappers so that we can't move the worker function into the
> comdat, and we end up with copies of it in lots of object files.

Hmm, you are right. Inliner and ipa-cp will need a cost model for dragging
stuff local in any case.  Perhaps we even want to "privatize" as much as 
possible
prior inlining. I will play with that incrementally.
Lets go with minimal version of patch that makes things working and I will take
care of solving the inliner limitations.

Honza
> 
> Jason

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jan Hubicka

> On 12/11/2013 12:14 PM, Jan Hubicka wrote:
> >>Is ipa_passes the right place to initialize calls_comdat_local?
> >
> >The flag is probably needed in both early inliner and IPA inliner.  A 
> >conservative
> >place to initialize it would be in inline_analyze_function.
> >(early inliner analyze function twice, first before inlining and next after
> >early optimization, so the update should also clear the flag if call 
> >disapeared).
> 
> Unfortunately early inlining doesn't call inline_analyze_function on
> all functions, so we need to initialize it somewhere else.  Is there
> a reason not to set up an initial value in ipa_passes?

Well, less thing we have done behind passmanager back, the better.  ipa-passes
should really just set up compiler to execute ipa passes, but it should not do
any analysis by itself.

Every function we inline or clone needs to go through the inliner's analysis.
But looking into it now, the place is probably compute_inline_paremeters, since
inline_analyze_function is just a wrapper for the IPA pass, but
pass_data_inline_parameters bypass it (this is bit confusing - I will clean it
up)

Honza
> 
> >I think we also want to clear it if call to comdat local function is marked 
> >inline.
> 
> Makes sense.
> 
> Jason

Re: [PATCH] Enable Cilk keywords in Cilk Runtime

2013-12-11 Thread H.J. Lu

On Wed, Dec 11, 2013 at 11:04 AM, Iyer, Balaji V
 wrote:
> Hello Everyone,
> Since we have _Cilk_spawn and _Cilk_sync support in C++ compiler, we 
> can enable the keyword usage in runtime. This patch should do so.
>
> Is it Ok to install?
>
> Here are the ChangeLog entries:
>
> 2013-12-11  Balaji V. Iyer  
>
> * Makefile.am (GENERAL_FLAGS): Removed un-defining of Cilk keywords.
> * Makefile.in: Regenerate.
> * runtime/symbol_test.c: Added a #define to clear out _Cilk_for.

Those tests failed on Linux/i686.

H.J.
---
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -g -O2 -fcilkplus
-L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs
execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -g -O3 -fcilkplus
-L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs
execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O1 -fcilkplus
-L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs
execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O2 -fcilkplus
-L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs
execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O3 -fcilkplus
-L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs
execution test

C++ edge_iterator (was: Re: [SH] PR 53976 - Add RTL pass to eliminate clrt, sett insns)

2013-12-11 Thread Oleg Endo

Same message, but now with the correct patch attached.  Sorry.

On Thu, 2013-11-21 at 00:04 +0100, Steven Bosscher wrote:
> Declaring the edge_iterator inside the for() is not a good argument
> against FOR_EACH_EDGE. Of course, brownie points are up for grabs for
> the brave soul daring enough to make edge iterators be proper C++
> iterators... ;-)

So, I gave it a try -- see the attached patch.
It allows edge iteration to look more like STL container iteration:

for (basic_block::edge_iterator ei = bb->pred_edges ().begin ();
 ei != bb->pred_edges ().end (); ++ei)
{
  basic_block pred_bb = (*ei)->src;
  ...
}

The idea is to first make class vec look more like an STL container.
This would also allow iterating it with std::for_each and use C++11
range based for loop (in a few years maybe).  It would also make it
easier to replace vec uses with STL containers if needed and vice versa.

Then the
typedef struct basic_block_def* basic_block;

is replaced with a wrapper class 'basic_block', which is just a simple
POD wrapper around a basic_block_def*.  There should be no penalties
compared to passing/storing raw pointers.  Because of the union with
constructor restriction of C++98 an additional wrapper class
'basic_block_in_union' is required, which doesn't have any constructors
defined.

Having 'basic_block' as a class allows putting typedefs for the edge
iterator types in there (initially I tried putting the typedefs into
struct basic_block_def, but gengtype would bail out).
It would also be possible to have a free standing definition / typedef
of edge_iterator, but it would conflict with the existing one and
require too many changes at once.  Moreover, the iterator type actually
depends on the container type, which is vec, and the
container type is defined/selected by the basic_block class.

The following
  basic_block pred_bb = (*ei)->src;

can also be written as
  basic_block pred_bb = ei->src;

after converting the edge typedef to a wrapper of edge_def*.

The idea of the approach is to allow co-existence of the new
edge_iterator and the old and thus be able to gradually convert code.
The wrappers around raw pointers also helo encapsulating the underlying
memory management issues.  For example, it would be much easier to
replace garbage collected objects with intrusive reference counting.

Comments and feedback appreciated.

Cheers,
Oleg

Index: gcc/config/sh/sh_optimize_sett_clrt.cc
===
--- gcc/config/sh/sh_optimize_sett_clrt.cc	(revision 205866)
+++ gcc/config/sh/sh_optimize_sett_clrt.cc	(working copy)
@@ -390,10 +390,10 @@
 {
   prev_visited_bb.push_back (bb);

-  for (edge_iterator ei = ei_start (bb->preds); !ei_end_p (ei);
-	   ei_next (&ei))
+  for (basic_block::edge_iterator ei = bb->pred_edges ().begin ();
+	   ei != bb->pred_edges ().end (); ++ei)
 	{
-	  basic_block pred_bb = ei_edge (ei)->src;
+	  basic_block pred_bb = (*ei)->src;
 	  pred_bb_count += 1;
 	  find_last_ccreg_values (BB_END (pred_bb), pred_bb, values_out,
   prev_visited_bb);
Index: gcc/dumpfile.c
===
--- gcc/dumpfile.c	(revision 205866)
+++ gcc/dumpfile.c	(working copy)
@@ -25,6 +25,7 @@
 #include "tree.h"
 #include "gimple-pretty-print.h"
 #include "context.h"
+#include "basic-block2.h"

 /* If non-NULL, return one past-the-end of the matching SUBPART of
the WHOLE string.  */
Index: gcc/tracer.c
===
--- gcc/tracer.c	(revision 205866)
+++ gcc/tracer.c	(working copy)
@@ -102,7 +102,7 @@

   /* A transaction is a single entry multiple exit region.  It must be
  duplicated in its entirety or not at all.  */
-  g = last_stmt (CONST_CAST_BB (bb));
+  g = last_stmt (basic_block (bb));
   if (g && gimple_code (g) == GIMPLE_TRANSACTION)
 return true;

Index: gcc/dominance.c
===
--- gcc/dominance.c	(revision 205866)
+++ gcc/dominance.c	(working copy)
@@ -503,6 +503,8 @@
   TBB v, w, k, par;
   basic_block en_block;
   edge_iterator ei, einext;
+  TBB k1;
+  basic_block b;

   if (reverse)
 en_block = EXIT_BLOCK_PTR_FOR_FN (cfun);
@@ -538,9 +540,6 @@
  semidominator.  */
   while (!ei_end_p (ei))
 	{
-	  TBB k1;
-	  basic_block b;
-
 	  e = ei_edge (ei);
 	  b = (reverse) ? e->dest : e->src;
 	  einext = ei;
Index: gcc/vec.h
===
--- gcc/vec.h	(revision 205866)
+++ gcc/vec.h	(working copy)
@@ -482,6 +482,15 @@
   void quick_grow (unsigned len);
   void quick_grow_cleared (unsigned len);

+  /* STL like iterator interface.  */
+  typedef T* iterator;
+  typedef const T* const_iterator;
+
+  iterator begin (void) { return &m_vecdata[0]; }
+  iterator end (void) { return &m_vecdata[m_vecpfx.m_num]; }
+  const_iterator begin (void) const { return &m_vecdata[0]; }
+  const_i

[Patch, Fortran, OOP] PR 59450: ICE for type-bound-procedure expression in module procedure interface

2013-12-11 Thread Janus Weil

Hi all,

the PR in the subject line involves a module procedure whose result
has array bounds which contain a type-bound procedure call. Since the
procedure interface is written to the .mod file, also the TBP
expression needs to be written. This leads to an ICE, since it simply
has not been implemented yet.

When writing a function reference, we now discriminate between three
cases: (1) 'ordinary' procedures, (2) type-bound procedures and (3)
intrinsic procedures. We first read/write an integer value to indicate
which case we're dealing with, and then do the specific I/O for this
case (see patch). Up to now we basically did the same already, but
only including two cases (ordinary and intrinsic functions).

The patch has been regtested on x86_64-unknown-linux-gnu. Ok for trunk?

Should I also bump the MOD_VERSION?

Cheers,
Janus



2013-12-11  Janus Weil  

PR fortran/59450
* module.c (mio_expr): Handle type-bound function expressions.


2013-12-11  Janus Weil  

PR fortran/59450
* gfortran.dg/typebound_proc_31.f90: New.
Index: gcc/fortran/module.c
===
--- gcc/fortran/module.c(revision 205857)
+++ gcc/fortran/module.c(working copy)
@@ -3358,12 +3358,24 @@ mio_expr (gfc_expr **ep)
{
  e->value.function.name
= mio_allocated_string (e->value.function.name);
- flag = e->value.function.esym != NULL;
+ if (e->value.function.esym)
+   flag = 1;
+ else if (e->ref)
+   flag = 2;
+ else
+   flag = 0;
  mio_integer (&flag);
- if (flag)
-   mio_symbol_ref (&e->value.function.esym);
- else
-   write_atom (ATOM_STRING, e->value.function.isym->name);
+ switch (flag)
+   {
+   case 1:
+ mio_symbol_ref (&e->value.function.esym);
+ break;
+   case 2:
+ mio_ref_list (&e->ref);
+ break;
+   default:
+ write_atom (ATOM_STRING, e->value.function.isym->name);
+   }
}
   else
{
@@ -3372,10 +3384,15 @@ mio_expr (gfc_expr **ep)
  free (atom_string);
 
  mio_integer (&flag);
- if (flag)
-   mio_symbol_ref (&e->value.function.esym);
- else
+ switch (flag)
{
+   case 1:
+ mio_symbol_ref (&e->value.function.esym);
+ break;
+   case 2:
+ mio_ref_list (&e->ref);
+ break;
+   default:
  require_atom (ATOM_STRING);
  e->value.function.isym = gfc_find_function (atom_string);
  free (atom_string);
! { dg-do compile }
!
! PR 59450: [OOP] ICE for type-bound-procedure expression in module procedure interface
!
! Contributed by 

module classes

  implicit none

  type :: base_class
   contains
 procedure, nopass :: get_num
  end type

contains

  pure integer function get_num()
  end function

  function get_array( this ) result(array)
class(base_class), intent(in) :: this
integer, dimension( this%get_num() ) :: array
  end function

end module

! { dg-final { cleanup-modules "classes" } }

[PATCH][PR rtl-optimization/59446] Don't pessimize threading through loop exits

2013-12-11 Thread Jeff Law



59446 is a missed-optimization error where we want to thread through 
multiple loop exits and that request is being cancelled.


The problem is I was looking at changes in the loop_father as a proxy 
for crossing loop headers.  That was utterly dumb as we can test 
directly for crossing a loop header.  Anyway, we saw the changes in the 
loop structure, figured we were threading through too mean headers and 
truncated the jump thread.  Dumb.


Fixing that happens to clean up the code ever-so-slightly, which is good.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu and 
installed on the trunk.


PR rtl-optimization/59446
* tree-ssa-threadupdate.c (mark_threaded_blocks): Properly
test for crossing a loop header.

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 6f978e2..af8fd85 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1449,44 +1449,32 @@ mark_threaded_blocks (bitmap threaded_blocks)
{
  vec *path = THREAD_PATH (e);
 
- /* Basically we're looking for a situation where we can see
-3 or more loop structures on a jump threading path.  */
-
- struct loop *first_father = (*path)[0]->e->src->loop_father;
- struct loop *second_father = NULL;
- for (unsigned int i = 0; i < path->length (); i++)
+ for (unsigned int i = 0, crossed_headers = 0;
+  i < path->length ();
+  i++)
{
- /* See if this is a loop father we have not seen before.  */
- if ((*path)[i]->e->dest->loop_father != first_father
- && (*path)[i]->e->dest->loop_father != second_father)
+ basic_block dest = (*path)[i]->e->dest;
+ crossed_headers += (dest == dest->loop_father->header);
+ if (crossed_headers > 1)
{
- /* We've already seen two loop fathers, so we
-need to trim this jump threading path.  */
- if (second_father != NULL)
-   {
- /* Trim from entry I onwards.  */
- for (unsigned int j = i; j < path->length (); j++)
-   delete (*path)[j];
- path->truncate (i);
-
- /* Now that we've truncated the path, make sure
-what's left is still valid.   We need at least
-two edges on the path and the last edge can not
-be a joiner.  This should never happen, but let's
-be safe.  */
- if (path->length () < 2
- || (path->last ()->type
- == EDGE_COPY_SRC_JOINER_BLOCK))
-   {
- delete_jump_thread_path (path);
- e->aux = NULL;
-   }
- break;
-   }
- else
+ /* Trim from entry I onwards.  */
+ for (unsigned int j = i; j < path->length (); j++)
+   delete (*path)[j];
+ path->truncate (i);
+
+ /* Now that we've truncated the path, make sure
+what's left is still valid.   We need at least
+two edges on the path and the last edge can not
+be a joiner.  This should never happen, but let's
+be safe.  */
+ if (path->length () < 2
+ || (path->last ()->type
+ == EDGE_COPY_SRC_JOINER_BLOCK))
{
- second_father = (*path)[i]->e->dest->loop_father;
+ delete_jump_thread_path (path);
+ e->aux = NULL;
}
+ break;
}
}
}

Go patch committed: Minor fixes for recover thunks

2013-12-11 Thread Ian Lance Taylor

This patch to the Go frontend fixes a couple of problems with recover
thunks.  First, it avoids name collisions when methods call recover;
previously if two different methods with the same name (for different
types) called recover, the assembler symbols would be the same.  Second,
it avoids crashing if a method with an unnamed receiver calls recover.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.8 branch.

Ian

diff -r ba0adcfb4e6a go/gogo.cc
--- a/go/gogo.cc	Fri Dec 06 10:24:09 2013 -0800
+++ b/go/gogo.cc	Wed Dec 11 14:21:36 2013 -0800
@@ -2822,7 +2822,10 @@
   if (orig_fntype->is_varargs())
 new_fntype->set_is_varargs();
 
-  std::string name = orig_no->name() + "$recover";
+  std::string name = orig_no->name();
+  if (orig_fntype->is_method())
+name += "$" + orig_fntype->receiver()->type()->mangled_name(gogo);
+  name += "$recover";
   Named_object *new_no = gogo->start_function(name, new_fntype, false,
 	  location);
   Function *new_func = new_no->func_value();
@@ -2916,7 +2919,25 @@
 		 && !orig_rec_no->var_value()->is_receiver());
   orig_rec_no->var_value()->set_is_receiver();
 
-  const std::string& new_receiver_name(orig_fntype->receiver()->name());
+  std::string new_receiver_name(orig_fntype->receiver()->name());
+  if (new_receiver_name.empty())
+	{
+	  // Find the receiver.  It was named "r.NNN" in
+	  // Gogo::start_function.
+	  for (Bindings::const_definitions_iterator p =
+		 new_bindings->begin_definitions();
+	   p != new_bindings->end_definitions();
+	   ++p)
+	{
+	  const std::string& pname((*p)->name());
+	  if (pname[0] == 'r' && pname[1] == '.')
+		{
+		  new_receiver_name = pname;
+		  break;
+		}
+	}
+	  go_assert(!new_receiver_name.empty());
+	}
   Named_object* new_rec_no = new_bindings->lookup_local(new_receiver_name);
   if (new_rec_no == NULL)
 	go_assert(saw_errors());

libgo patch committed: Let MakeFunc functions call recover

2013-12-11 Thread Ian Lance Taylor

The only Go functions that may call recover are those that are
immediately deferred.  This permits the correct handling of a panic in a
function called by a deferred function.  In gccgo this is implemented by
recording the return address of a function that may call recover, and
checking that return address in the recover call.  That works for normal
functions, but fails for functions created by reflect.MakeFunc.  For a
MakeFunc function the caller will be some function from libffi.  This
patch lets that work correctly by hooking into the stub used to start
functions created by reflect.MakeFunc.  That stub now does the checking
for whether the function may call recover, and the function itself
checks whether it was called directly from libffi.  Bootstrapped and ran
Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.8
branch.

Ian

diff -r 4863a5e8f8a3 libgo/go/reflect/makefunc_386.S
--- a/libgo/go/reflect/makefunc_386.S	Wed Dec 11 14:34:28 2013 -0800
+++ b/libgo/go/reflect/makefunc_386.S	Wed Dec 11 15:35:56 2013 -0800
@@ -48,6 +48,15 @@
 	leal	8(%ebp), %eax	/* Set esp field in struct.  */
 	movl	%eax, -24(%ebp)
 
+	/* For MakeFunc functions that call recover.  */
+	movl	4(%ebp), %eax
+	movl	%eax, (%esp)
+#ifdef __PIC__
+	call	__go_makefunc_can_recover@PLT
+#else
+	call	__go_makefunc_can_recover
+#endif
+
 #ifdef __PIC__
 	call	__go_get_closure@PLT
 #else
@@ -65,6 +74,13 @@
 	call	reflect.MakeFuncStubGo
 #endif
 
+	/* MakeFunc functions can no longer call recover.  */
+#ifdef __PIC__
+	call __go_makefunc_returning@PLT
+#else
+	call __go_makefunc_returning
+#endif
+
 	/* Set return registers.  */
 
 	movl	-20(%ebp), %eax
diff -r 4863a5e8f8a3 libgo/go/reflect/makefunc_amd64.S
--- a/libgo/go/reflect/makefunc_amd64.S	Wed Dec 11 14:34:28 2013 -0800
+++ b/libgo/go/reflect/makefunc_amd64.S	Wed Dec 11 15:35:56 2013 -0800
@@ -61,6 +61,14 @@
 	movdqa	%xmm6, 0xa0(%rsp)
 	movdqa	%xmm7, 0xb0(%rsp)
 
+	/* For MakeFunc functions that call recover.  */
+	movq	8(%rbp), %rdi
+#ifdef __PIC__
+	call	__go_makefunc_can_recover@PLT
+#else
+	call	__go_makefunc_can_recover
+#endif
+
 	# Get function type.
 #ifdef __PIC__
 	call	__go_get_closure@PLT
@@ -77,6 +85,13 @@
 	call	reflect.MakeFuncStubGo
 #endif
 
+	/* MakeFunc functions can no longer call recover.  */
+#ifdef __PIC__
+	call __go_makefunc_returning@PLT
+#else
+	call __go_makefunc_returning
+#endif
+
 	# The structure will be updated with any return values.  Load
 	# all possible return registers before returning to the caller.
 
diff -r 4863a5e8f8a3 libgo/runtime/go-defer.c
--- a/libgo/runtime/go-defer.c	Wed Dec 11 14:34:28 2013 -0800
+++ b/libgo/runtime/go-defer.c	Wed Dec 11 15:35:56 2013 -0800
@@ -27,6 +27,7 @@
   n->__pfn = pfn;
   n->__arg = arg;
   n->__retaddr = NULL;
+  n->__makefunc_can_recover = 0;
   g->defer = n;
 }
 
diff -r 4863a5e8f8a3 libgo/runtime/go-defer.h
--- a/libgo/runtime/go-defer.h	Wed Dec 11 14:34:28 2013 -0800
+++ b/libgo/runtime/go-defer.h	Wed Dec 11 15:35:56 2013 -0800
@@ -34,4 +34,10 @@
  set by __go_set_defer_retaddr which is called by the thunks
  created by defer statements.  */
   const void *__retaddr;
+
+  /* Set to true if a function created by reflect.MakeFunc is
+ permitted to recover.  The return address of such a function
+ function will be somewhere in libffi, so __retaddr is not
+ useful.  */
+  _Bool __makefunc_can_recover;
 };
diff -r 4863a5e8f8a3 libgo/runtime/go-recover.c
--- a/libgo/runtime/go-recover.c	Wed Dec 11 14:34:28 2013 -0800
+++ b/libgo/runtime/go-recover.c	Wed Dec 11 15:35:56 2013 -0800
@@ -16,12 +16,14 @@
__go_can_recover--this is, the thunk.  */
 
 _Bool
-__go_can_recover (const void* retaddr)
+__go_can_recover (const void *retaddr)
 {
   G *g;
   struct __go_defer_stack *d;
   const char* ret;
   const char* dret;
+  Location loc;
+  const byte *name;
 
   g = runtime_g ();
 
@@ -52,7 +54,73 @@
 #endif
 
   dret = (const char *) d->__retaddr;
-  return ret <= dret && ret + 16 >= dret;
+  if (ret <= dret && ret + 16 >= dret)
+return 1;
+
+  /* If the function calling recover was created by reflect.MakeFunc,
+ then RETADDR will be somewhere in libffi.  Our caller is
+ permitted to recover if it was called from libffi.  */
+  if (!d->__makefunc_can_recover)
+return 0;
+
+  if (runtime_callers (2, &loc, 1) < 1)
+return 0;
+
+  /* If we have no function name, then we weren't called by Go code.
+ Guess that we were called by libffi.  */
+  if (loc.function.len == 0)
+return 1;
+
+  if (loc.function.len < 4)
+return 0;
+  name = loc.function.str;
+  if (*name == '_')
+{
+  if (loc.function.len < 5)
+	return 0;
+  ++name;
+}
+
+  if (name[0] == 'f' && name[1] == 'f' && name[2] == 'i' && name[3] == '_')
+return 1;
+
+  return 0;
+}
+
+/* This function is called when code is about to enter a function
+   created by reflect.MakeFunc.  It is called by the function stub
+   used by MakeFunc.  If the stub is permitted to call recover,

Re: Two build != host fixes

2013-12-11 Thread Alan Modra

On Wed, Dec 11, 2013 at 02:11:49PM +0100, Bernd Edlinger wrote:
> We need the auto-build only to build something that translates .md files to 
> .c,
> so I would'nt care about GMP, but some other things, like the right prototype 
> for
> printf make a difference.

Right, but when you get some of the HAVE_* wrong, libiberty for the
build compiler provides some of the "missing" functions and
declarations.  The declarations can clash with system header
declarations giving you bootstrap failures for no good reason..

> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index aad927c..7995e64 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -747,7 +747,8 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
> >
> > # Native linker and preprocessor flags. For x-fragment overrides.
> > BUILD_LDFLAGS=@BUILD_LDFLAGS@
> > -BUILD_CPPFLAGS=$(ALL_CPPFLAGS)
> > +BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
> > + -I$(srcdir)/../include $(CPPINC)
> >
> 
> I did not have this one.
> What is it good for?

It fixes another case of host header directories being searched for
the build compiler.  Important when GMPINC and other *INC point at
installed locations for the host compiler.  Trouble is, you don't just
get the headers you want (eg. gmp.h) but all the other host headers
too.

-- 
Alan Modra
Australia Development Lab, IBM

Go patch committed: Implement method values in reflect package

2013-12-11 Thread Ian Lance Taylor

This patch to the Go frontend and libgo implements method values in the
reflect package.  Working with method values and reflect now works
correctly, at least on x86.  This changes the type signature for type
methods in the reflect package to match the gc compiler.  That in turn
required changing the reflect package to mark method values with a new
flag, as previously they were detected by the type signature.  The
MakeFunc support needed to create a function that takes a value and
passes a pointer to the method, since all methods take pointers.  It
also needed to create a function that holds a receiver value.  And the
recover code needed to handle these new cases.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.8
branch.

Ian

diff -r e165d3ccf1e4 go/types.cc
--- a/go/types.cc	Wed Dec 11 15:38:00 2013 -0800
+++ b/go/types.cc	Wed Dec 11 16:57:53 2013 -0800
@@ -2261,26 +2261,9 @@
 
   ++p;
   go_assert(p->is_field_name("typ"));
-  if (!only_value_methods && m->is_value_method())
-{
-  // This is a value method on a pointer type.  Change the type of
-  // the method to use a pointer receiver.  The implementation
-  // always uses a pointer receiver anyhow.
-  Type* rtype = mtype->receiver()->type();
-  Type* prtype = Type::make_pointer_type(rtype);
-  Typed_identifier* receiver =
-	new Typed_identifier(mtype->receiver()->name(), prtype,
-			 mtype->receiver()->location());
-  mtype = Type::make_function_type(receiver,
-   (mtype->parameters() == NULL
-	? NULL
-	: mtype->parameters()->copy()),
-   (mtype->results() == NULL
-	? NULL
-	: mtype->results()->copy()),
-   mtype->location());
-}
-  vals->push_back(Expression::make_type_descriptor(mtype, bloc));
+  bool want_pointer_receiver = !only_value_methods && m->is_value_method();
+  nonmethod_type = mtype->copy_with_receiver_as_param(want_pointer_receiver);
+  vals->push_back(Expression::make_type_descriptor(nonmethod_type, bloc));
 
   ++p;
   go_assert(p->is_field_name("tfn"));
@@ -4008,6 +3991,32 @@
   return ret;
 }
 
+// Make a copy of a function type with the receiver as the first
+// parameter.
+
+Function_type*
+Function_type::copy_with_receiver_as_param(bool want_pointer_receiver) const
+{
+  go_assert(this->is_method());
+  Typed_identifier_list* new_params = new Typed_identifier_list();
+  Type* rtype = this->receiver_->type();
+  if (want_pointer_receiver)
+rtype = Type::make_pointer_type(rtype);
+  Typed_identifier receiver(this->receiver_->name(), rtype,
+			this->receiver_->location());
+  new_params->push_back(receiver);
+  const Typed_identifier_list* orig_params = this->parameters_;
+  if (orig_params != NULL && !orig_params->empty())
+{
+  for (Typed_identifier_list::const_iterator p = orig_params->begin();
+	   p != orig_params->end();
+	   ++p)
+	new_params->push_back(*p);
+}
+  return Type::make_function_type(NULL, new_params, this->results_,
+  this->location_);
+}
+
 // Make a copy of a function type ignoring any receiver and adding a
 // closure parameter.
 
diff -r e165d3ccf1e4 go/types.h
--- a/go/types.h	Wed Dec 11 15:38:00 2013 -0800
+++ b/go/types.h	Wed Dec 11 16:57:53 2013 -0800
@@ -1797,6 +1797,12 @@
   Function_type*
   copy_with_receiver(Type*) const;
 
+  // Return a copy of this type with the receiver treated as the first
+  // parameter.  If WANT_POINTER_RECEIVER is true, the receiver is
+  // forced to be a pointer.
+  Function_type*
+  copy_with_receiver_as_param(bool want_pointer_receiver) const;
+
   // Return a copy of this type ignoring any receiver and using dummy
   // names for all parameters.  This is used for thunks for method
   // values.
diff -r e165d3ccf1e4 libgo/go/reflect/all_test.go
--- a/libgo/go/reflect/all_test.go	Wed Dec 11 15:38:00 2013 -0800
+++ b/libgo/go/reflect/all_test.go	Wed Dec 11 16:57:53 2013 -0800
@@ -1631,9 +1631,13 @@
 	}
 }
 
-/* Not yet implemented for gccgo
-
 func TestMethodValue(t *testing.T) {
+	switch runtime.GOARCH {
+	case "amd64", "386":
+	default:
+		t.Skip("reflect method values not implemented for " + runtime.GOARCH)
+	}
+
 	p := Point{3, 4}
 	var i int64
 
@@ -1721,8 +1725,6 @@
 	}
 }
 
-*/
-
 // Reflect version of $GOROOT/test/method5.go
 
 // Concrete types implementing M method.
@@ -1807,14 +1809,18 @@
 func (t4 Tm4) M(x int, b byte) (byte, int) { return b, x + 40 }
 
 func TestMethod5(t *testing.T) {
-	/* Not yet used for gccgo
+	switch runtime.GOARCH {
+	case "amd64", "386":
+	default:
+		t.Skip("reflect method values not implemented for " + runtime.GOARCH)
+	}
+
 	CheckF := func(name string, f func(int, byte) (byte, int), inc int) {
 		b, x := f(1000, 99)
 		if b != 99 || x != 1000+inc {
 			t.Errorf("%s(1000, 99) = %v, %v, want 99, %v", name, b, x, 1000+inc)
 		}
 	}
-	*/
 
 	CheckV := func(name string, i Value, inc int) {
 		bx := i.Method(0).Call([]Value{ValueOf(1000), ValueOf(byte(99))})
@@ -1824,9 +1830,7 @@
 			t.Error

[trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-11 Thread Kenneth Zadeck




This patch is for the trunk, but it solves a problem that comes up for 
wide-int.   For wide-int we need to have the BITS_PER_UNIT available 
earlier.So this patch sets the default value (8) in genmodes.c so 
that it is available by anyone who includes insn-modes.h.  The generator 
for tm.h was modified to include insn-modes.h.The value for 
BITS_PER_UNIT can be overridden by any port by placing a define for it 
in their target modes file.


This patch removes the definition of BITS_PER_UNIT from 7 platform .h 
files.   All of those platforms initialized it to the default value so 
there was no need for additions to their target modes file.


In addition, this target also changes the way that 
MAX_BITSIZE_MODE_ANY_INT is calculated.The value is heavily used on 
the wide-int branch to allocate buffers that are used to hold large 
integer values.   The change in the way it is computed was motivated by 
the i386 port, but there may be other ports that have the same 
problem.   The i386 port defines two very large integer modes that are 
only used as containers for large vectors.   They are never used for 
large integers.  The new way of computing this allows a port to say 
(very early) that some of these integer modes are never used to hold 
numbers and so smaller buffers can be used for integer calculations.   
Other ports that play the same game should follow suit.


This patch has been bootstrapped and regression tested on x86-64. Ok to 
commit?


Kenny
Index: gcc/config/arc/arc.h
===
--- gcc/config/arc/arc.h	(revision 205895)
+++ gcc/config/arc/arc.h	(working copy)
@@ -303,9 +303,6 @@ along with GCC; see the file COPYING3.
numbered.  */
 #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN)
 
-/* Number of bits in an addressable storage unit.  */
-#define BITS_PER_UNIT 8
-
 /* Width in bits of a "word", which is the contents of a machine register.
Note that this is not necessarily the width of data type `int';
if using 16-bit ints on a 68000, this would still be 32.
Index: gcc/config/bfin/bfin.h
===
--- gcc/config/bfin/bfin.h	(revision 205895)
+++ gcc/config/bfin/bfin.h	(working copy)
@@ -859,9 +859,6 @@ typedef struct {
 /* Define this if most significant word of a multiword number is numbered. */
 #define WORDS_BIG_ENDIAN 0
 
-/* number of bits in an addressable storage unit */
-#define BITS_PER_UNIT 8
-
 /* Width in bits of a "word", which is the contents of a machine register.
Note that this is not necessarily the width of data type `int';
if using 16-bit ints on a 68000, this would still be 32.
Index: gcc/config/lm32/lm32.h
===
--- gcc/config/lm32/lm32.h	(revision 205895)
+++ gcc/config/lm32/lm32.h	(working copy)
@@ -73,7 +73,6 @@
 #define BYTES_BIG_ENDIAN 1
 #define WORDS_BIG_ENDIAN 1
 
-#define BITS_PER_UNIT 8
 #define BITS_PER_WORD 32
 #define UNITS_PER_WORD 4
 
Index: gcc/config/m32c/m32c.h
===
--- gcc/config/m32c/m32c.h	(revision 205895)
+++ gcc/config/m32c/m32c.h	(working copy)
@@ -140,7 +140,6 @@ machine_function;
matches "int".  Pointers are 16 bits for R8C/M16C (when TARGET_A16
is true) and 24 bits for M32CM/M32C (when TARGET_A24 is true), but
24-bit pointers are stored in 32-bit words.  */
-#define BITS_PER_UNIT 8
 #define UNITS_PER_WORD 2
 #define POINTER_SIZE (TARGET_A16 ? 16 : 32)
 #define POINTERS_EXTEND_UNSIGNED 1
Index: gcc/config/microblaze/microblaze.h
===
--- gcc/config/microblaze/microblaze.h	(revision 205895)
+++ gcc/config/microblaze/microblaze.h	(working copy)
@@ -193,7 +193,6 @@ extern enum pipeline_type microblaze_pip
 #define BITS_BIG_ENDIAN 0
 #define BYTES_BIG_ENDIAN (TARGET_LITTLE_ENDIAN == 0)
 #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN)
-#define BITS_PER_UNIT   8
 #define BITS_PER_WORD   32
 #define UNITS_PER_WORD  4
 #define MIN_UNITS_PER_WORD  4
Index: gcc/config/picochip/picochip.h
===
--- gcc/config/picochip/picochip.h	(revision 205895)
+++ gcc/config/picochip/picochip.h	(working copy)
@@ -92,8 +92,6 @@ extern enum picochip_dfa_type picochip_s
 #define BYTES_BIG_ENDIAN 0
 #define WORDS_BIG_ENDIAN 0
 
-#define BITS_PER_UNIT 8
-
 #define BITS_PER_WORD 16
 #define UNITS_PER_WORD (BITS_PER_WORD / BITS_PER_UNIT)
 
Index: gcc/config/spu/spu.h
===
--- gcc/config/spu/spu.h	(revision 205895)
+++ gcc/config/spu/spu.h	(working copy)
@@ -54,8 +54,6 @@ extern GTY(()) int spu_tune;
 
 #define WORDS_BIG_ENDIAN 1
 
-#define BITS_PER_UNIT 8
-
 /* GCC uses word_mode in many places, assuming that it is the fastest
integer mode.  That is not the case for SPU though.  We can't us

Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-11 Thread DJ Delorie


The m32c part is OK.

Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-11 Thread Kenneth Zadeck


On 12/11/2013 08:42 PM, DJ Delorie wrote:

The m32c part is OK.

thanks for the fast reply.

kenny

Go patch committed: Disable sibling calls by default

2013-12-11 Thread Ian Lance Taylor

Go programs expect to be able to call runtime.Callers to get their call
stack.  The libbacktrace library currently can not handle tail calls,
and it's not clear that it can ever be completely reliable in handling
tail calls.  This patch to the Go frontend disables tail call
optimization, unless it is specifically requested.  This will permit Go
programs that call runtime.Callers to work as expected.  Bootstrapped
and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline
and 4.8 branch.

Ian


2013-12-11  Ian Lance Taylor  

* go-lang.c (go_langhook_post_options): Disable sibling calls by
default.


Index: go-lang.c
===
--- go-lang.c	(revision 205872)
+++ go-lang.c	(working copy)
@@ -270,6 +270,10 @@ go_langhook_post_options (const char **p
   if (flag_excess_precision_cmdline == EXCESS_PRECISION_DEFAULT)
 flag_excess_precision_cmdline = EXCESS_PRECISION_STANDARD;
 
+  /* Tail call optimizations can confuse uses of runtime.Callers.  */
+  if (!global_options_set.x_flag_optimize_sibling_calls)
+global_options.x_flag_optimize_sibling_calls = 0;
+
   /* Returning false means that the backend should be used.  */
   return false;
 }

Re: RFA (cgraph): C++ 'structor decloning patch, Mark III

2013-12-11 Thread Jason Merrill


On 12/11/2013 03:53 PM, Jan Hubicka wrote:

Lets go with minimal version of patch that makes things working and I will take
care of solving the inliner limitations.


OK, this patch adjusts calls_comdat_local in compute_inline_parameters, 
as you suggested.  What do you think of the change to inline_call?  The 
case of inlining a comdat-local function looks O(N^2), but shouldn't be 
a problem in practice since the wrappers only have one callee anyway.


Jason

commit 841b0c8a4dfd38f7f2dc5e0e05f5e4ded0aad4f1
Author: Jason Merrill 
Date:   Thu Jan 12 14:04:42 2012 -0500

	PR c++/41090
	Add -fdeclone-ctor-dtor.
gcc/cp/
	* optimize.c (can_alias_cdtor, populate_clone_array): Split out
	from maybe_clone_body.
	(maybe_thunk_body): New function.
	(maybe_clone_body): Call it.
	* mangle.c (write_mangled_name): Remove code to suppress
	writing of mangled name for cloned constructor or destructor.
	(write_special_name_constructor): Handle decloned constructor.
	(write_special_name_destructor): Handle decloned destructor.
	* method.c (trivial_fn_p): Handle decloning.
	* semantics.c (expand_or_defer_fn_1): Clone after setting linkage.
gcc/c-family/
	* c.opt: Add -fdeclone-ctor-dtor.
	* c-opts.c (c_common_post_options): Default to on iff -Os.
gcc/
	* cgraph.h (struct cgraph_node): Add calls_comdat_local.
	(symtab_comdat_local_p, symtab_in_same_comdat_p): New.
	* symtab.c (verify_symtab_base): Make sure we don't refer to a
	comdat-local symbol from outside its comdat.
	* cgraph.c (verify_cgraph_node): Likewise.
	* cgraphunit.c (mark_functions_to_output): Don't mark comdat-locals.
	* ipa.c (symtab_remove_unreachable_nodes): Likewise.
	(function_and_variable_visibility): Handle comdat-local fns.
	* ipa-cp.c (decide_about_value): Don't clone comdat-locals.
	* ipa-inline-analysis.c (compute_inline_parameters): Update
	calls_comdat_local.
	* ipa-inline-transform.c (inline_call): Likewise.
	* ipa-inline.c (can_inline_edge_p): Check calls_comdat_local.
include/
	* demangle.h (enum gnu_v3_ctor_kinds):
	Added literal gnu_v3_unified_ctor.
	(enum gnu_v3_ctor_kinds):
	Added literal gnu_v3_unified_dtor.
libiberty/
	* cp-demangle.c (cplus_demangle_fill_ctor,cplus_demangle_fill_dtor):
	Handle unified ctor/dtor.
	(d_ctor_dtor_name): Handle unified ctor/dtor.

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index f368cab..3576f7d 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -899,6 +899,10 @@ c_common_post_options (const char **pfilename)
   if (warn_implicit_function_declaration == -1)
 warn_implicit_function_declaration = flag_isoc99;
 
+  /* Declone C++ 'structors if -Os.  */
+  if (flag_declone_ctor_dtor == -1)
+flag_declone_ctor_dtor = optimize_size;
+
   if (cxx_dialect >= cxx11)
 {
   /* If we're allowing C++0x constructs, don't warn about C++98
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index bfca1e0..d270f77 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -890,6 +890,10 @@ fdeduce-init-list
 C++ ObjC++ Var(flag_deduce_init_list) Init(0)
 -fdeduce-init-list	enable deduction of std::initializer_list for a template type parameter from a brace-enclosed initializer-list
 
+fdeclone-ctor-dtor
+C++ ObjC++ Var(flag_declone_ctor_dtor) Init(-1)
+Factor complex constructors and destructors to favor space over speed
+
 fdefault-inline
 C++ ObjC++ Ignore
 Does nothing.  Preserved for backward compatibility.
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 9501afa..ccd150c 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2666,10 +2666,18 @@ verify_cgraph_node (struct cgraph_node *node)
 	  error_found = true;
 	}
 }
+  bool check_comdat = symtab_comdat_local_p (node);
   for (e = node->callers; e; e = e->next_caller)
 {
   if (verify_edge_count_and_frequency (e))
 	error_found = true;
+  if (check_comdat
+	  && !symtab_in_same_comdat_p (e->caller, node))
+	{
+	  error ("comdat-local function called by %s outside its comdat",
+		 identifier_to_locale (e->caller->name ()));
+	  error_found = true;
+	}
   if (!e->inline_failed)
 	{
 	  if (node->global.inlined_to
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0a88da3..69b97a7 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -428,6 +428,9 @@ public:
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* True if this decl calls a COMDAT-local function.  This is set up in
+ compute_inline_parameters and inline_call.  */
+  unsigned calls_comdat_local : 1;
 };
 
 
@@ -1490,4 +1493,22 @@ symtab_can_be_discarded (symtab_node *node)
 	  && node->resolution != LDPR_PREVAILING_DEF_IRONLY
 	  && node->resolution != LDPR_PREVAILING_DEF_IRONLY_EXP));
 }
+
+/* Return true if NODE is local to a particular COMDAT group, and must not
+   be named from outside the COMDAT.  This is us

1 2 >

1 - 100 of 107 matches

Mail list logo