[PATCH] Fix PR72488

2017-01-19 Thread Richard Biener

The following fixes PR72488, we forgot to restore SSA info when we aborted
VN due to too large SCC size.  I've added a verifier that checks that
we do not share SSA_NAME_{PTR,RANGE}_INFO.

The issue is latent on the GCC 6 branch as well where I will not install
the new verifier.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2017-01-19  Richard Biener  

PR tree-optimization/72488
* tree-ssa-sccvn.c (run_scc_vn): When we abort the VN make
sure to restore SSA info.
* tree-ssa.c (verify_ssa): Verify SSA info is not shared.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 244611)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -4844,6 +4844,7 @@ run_scc_vn (vn_lookup_kind default_vn_wa
   walker.walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   if (walker.fail)
 {
+  scc_vn_restore_ssa_info ();
   free_scc_vn ();
   return false;
 }
Index: gcc/tree-ssa.c
===
--- gcc/tree-ssa.c  (revision 244611)
+++ gcc/tree-ssa.c  (working copy)
@@ -1027,24 +1027,49 @@ verify_ssa (bool check_modified_stmt, bo
 
   timevar_push (TV_TREE_SSA_VERIFY);
 
-  /* Keep track of SSA names present in the IL.  */
-  size_t i;
-  tree name;
-
-  FOR_EACH_SSA_NAME (i, name, cfun)
 {
-  gimple *stmt;
-  TREE_VISITED (name) = 0;
-
-  verify_ssa_name (name, virtual_operand_p (name));
+  /* Keep track of SSA names present in the IL.  */
+  size_t i;
+  tree name;
+  hash_map  ssa_info;
 
-  stmt = SSA_NAME_DEF_STMT (name);
-  if (!gimple_nop_p (stmt))
+  FOR_EACH_SSA_NAME (i, name, cfun)
{
- basic_block bb = gimple_bb (stmt);
- if (verify_def (bb, definition_block,
- name, stmt, virtual_operand_p (name)))
-   goto err;
+ gimple *stmt;
+ TREE_VISITED (name) = 0;
+
+ verify_ssa_name (name, virtual_operand_p (name));
+
+ stmt = SSA_NAME_DEF_STMT (name);
+ if (!gimple_nop_p (stmt))
+   {
+ basic_block bb = gimple_bb (stmt);
+ if (verify_def (bb, definition_block,
+ name, stmt, virtual_operand_p (name)))
+   goto err;
+   }
+
+ void *info = NULL;
+ if (POINTER_TYPE_P (TREE_TYPE (name)))
+   info = SSA_NAME_PTR_INFO (name);
+ else if (INTEGRAL_TYPE_P (TREE_TYPE (name)))
+   info = SSA_NAME_RANGE_INFO (name);
+ if (info)
+   {
+ bool existed;
+ tree &val = ssa_info.get_or_insert (info, &existed);
+ if (existed)
+   {
+ error ("shared SSA name info");
+ print_generic_expr (stderr, val, 0);
+ fprintf (stderr, " and ");
+ print_generic_expr (stderr, name, 0);
+ fprintf (stderr, "\n");
+ goto err;
+   }
+ else
+   val = name;
+   }
}
 }
 


Re: [PATCH 9e] Update "startwith" logic for pass-skipping to handle __RTL functions

2017-01-19 Thread Richard Biener
On Wed, Jan 18, 2017 at 5:36 PM, Jeff Law  wrote:
> On 01/17/2017 02:28 AM, Richard Biener wrote:
>>>
>>>
>>> This feels somewhat different, but still a hack.
>>>
>>> I don't have strong suggestions on how to approach this, but what we've
>>> got
>>> here feels like a hack and one prone to bitrot.
>>
>>
>> All the above needs a bit of cleanup in the way we use (or not use)
>> PROP_xxx.
>> For example right now you can't startwith a __GIMPLE with a pass inside
>> the
>> loop pipeline because those passes expect loops to be initialized and be
>> in
>> loop-closed SSA.  And with the hack above for the property providers
>> you'll
>> always run pass_crited (that's a bad user of a PROP_).
>>
>> Ideally we'd figure out required properties from the startwith pass
>> (but there's not
>> an easy way to compute it w/o actually "executing" the passes) and then
>> enable
>> enough passes on the way to it providing those properties.
>>
>> Or finally restructure things in a way that the pass manager automatically
>> runs
>> property provider passes before passes requiring properties that are
>> not yet available...
>>
>> Instead of those pass->name comparisions we could invent a new flag in the
>> pass structure whether a pass should always be run for __GIMPLE or ___RTL
>> but that's a bit noisy right now.
>>
>> So I'm fine with the (localized) "hacks" for the moment.
>
> David suggested that we could have a method in the pass manager that would
> be run if the pass is skipped.  "run_if_skipped" or some such.
>
> What I like about that idea is the hack and the real code end up in the same
> place.  So someone working on (for example) reload has a much better chance
> of catching that they need to update the run_if_skipped method as they make
> changes to reload.  It doesn't fix all the problems in this space, but I
> think it's cleaner than bundling the hacks into the pass manager itself.
>
> Would that work for you?  It does for me.

I think that walks in the wrong direction and just distributes the
hack over multiple
files.

I'd rather have it in one place.

Richard.

> jeff
>


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
On Wed, Jan 18, 2017 at 5:22 PM, Jakub Jelinek  wrote:
> On Wed, Jan 18, 2017 at 07:15:34PM +0300, Alexander Monakov wrote:
>> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
>> > We are talking here about addressable vars, right (so if we turn it into
>> > non-addressable, in the SIMT region we just use the normal PTX pseudos),
>> > right?  We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it
>> > clear it shouldn't be moved afterwards.  For the private vars used directly
>> > in SIMD region, for the vars from inlined functions I assume if they are
>> > addressable we emit clobbers for them too.  Or perhaps the alias oracle can
>> > say that SIMT_EXIT ifn can clobber any addressable var with that
>> > flag/attribute.  And yes, SRA would need to propagate it.
>>
>> What about motion in the other direction, upwards across SIMT_ENTER()?
>
> I think this is a question for Richard, whether it can be done in the alias
> oracle.  If yes, it supposedly can be done for both SIMT_ENTER and
> SIMT_EXIT.

Code motion is trivially avoided for all memory that escapes somewhere.  For
locals that are just address-taken that's not enough.  So indeed such code
may move into the SIMT region from both sides -- but that can already happen
with your proposed patch so it's nothing new.

>> > But I believe it is worth it, because inlining is essential for good
>> > performance of the simd regions.
>>
>> It is, but I think my approach is compatible with inlining too (and has a 
>> more
>> localized impact on the compiler).
>
> But your 2/5 patch disables inlining into the SIMT regions.  Or do you mean
> the approach with some new IFN for the pointers to privatized vars?
> How would that work with the inliner (all copies of addressable vars/params
> from functions inlined into the SIMT region would need to gain something
> similar)?
>
> Jakub


Re: [committed] Implement LANG_HOOKS_TYPE_FOR_SIZE for jit

2017-01-19 Thread Richard Biener
On Wed, Jan 18, 2017 at 10:45 PM, David Malcolm  wrote:
> The jit testcase test-nested-loops.c was crashing.
>
> Root cause is that deep inside loop optimization we're now exposing
> this call within fold-const.c which wasn't being hit before:
>
> 4082  /* Compute the mask to access the bitfield.  */
> 4083  unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
>
> and the jit's implementation of LANG_HOOKS_TYPE_FOR_SIZE was a
> placeholder that asserted it wasn't called.
>
> This patch implements a proper LANG_HOOKS_TYPE_FOR_SIZE for jit,
> by taking LTO's implementation.
>
> Fixes test-nested-loops.c, along with the related failures in
> test-combination.c and test-threads.c due to reusing the test.
>
> This fixes all known failures in jit.sum, putting it at 8609 passes.
>
> Committed to trunk as r244600.

I suppose we could instead make the lto hook the default (thus move it
to langhooks.c as lhd_type_for_size).  Note similar issues may arise
from type_for_mode?  Ah, I see you have that one...

Richard.

> gcc/jit/ChangeLog:
> * dummy-frontend.c (jit_langhook_type_for_size): Implement, using
> lto's lto_type_for_size.
> ---
>  gcc/jit/dummy-frontend.c | 49 
> 
>  1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
> index 8f28e7f..5955854 100644
> --- a/gcc/jit/dummy-frontend.c
> +++ b/gcc/jit/dummy-frontend.c
> @@ -207,12 +207,53 @@ jit_langhook_type_for_mode (enum machine_mode mode, int 
> unsignedp)
>return NULL;
>  }
>
> +/* Return an integer type with PRECISION bits of precision,
> +   that is unsigned if UNSIGNEDP is nonzero, otherwise signed.  */
> +
>  static tree
> -jit_langhook_type_for_size (unsigned int bits ATTRIBUTE_UNUSED,
> -   int unsignedp ATTRIBUTE_UNUSED)
> +jit_langhook_type_for_size (unsigned precision, int unsignedp)
>  {
> -  gcc_unreachable ();
> -  return NULL;
> +  int i;
> +
> +  if (precision == TYPE_PRECISION (integer_type_node))
> +return unsignedp ? unsigned_type_node : integer_type_node;
> +
> +  if (precision == TYPE_PRECISION (signed_char_type_node))
> +return unsignedp ? unsigned_char_type_node : signed_char_type_node;
> +
> +  if (precision == TYPE_PRECISION (short_integer_type_node))
> +return unsignedp ? short_unsigned_type_node : short_integer_type_node;
> +
> +  if (precision == TYPE_PRECISION (long_integer_type_node))
> +return unsignedp ? long_unsigned_type_node : long_integer_type_node;
> +
> +  if (precision == TYPE_PRECISION (long_long_integer_type_node))
> +return unsignedp
> +  ? long_long_unsigned_type_node
> +  : long_long_integer_type_node;
> +
> +  for (i = 0; i < NUM_INT_N_ENTS; i ++)
> +if (int_n_enabled_p[i]
> +   && precision == int_n_data[i].bitsize)
> +  return (unsignedp ? int_n_trees[i].unsigned_type
> + : int_n_trees[i].signed_type);
> +
> +  if (precision <= TYPE_PRECISION (intQI_type_node))
> +return unsignedp ? unsigned_intQI_type_node : intQI_type_node;
> +
> +  if (precision <= TYPE_PRECISION (intHI_type_node))
> +return unsignedp ? unsigned_intHI_type_node : intHI_type_node;
> +
> +  if (precision <= TYPE_PRECISION (intSI_type_node))
> +return unsignedp ? unsigned_intSI_type_node : intSI_type_node;
> +
> +  if (precision <= TYPE_PRECISION (intDI_type_node))
> +return unsignedp ? unsigned_intDI_type_node : intDI_type_node;
> +
> +  if (precision <= TYPE_PRECISION (intTI_type_node))
> +return unsignedp ? unsigned_intTI_type_node : intTI_type_node;
> +
> +  return NULL_TREE;
>  }
>
>  /* Record a builtin function.  We just ignore builtin functions.  */
> --
> 1.8.5.3
>


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 10:31:38AM +0100, Richard Biener wrote:
> On Wed, Jan 18, 2017 at 5:22 PM, Jakub Jelinek  wrote:
> > On Wed, Jan 18, 2017 at 07:15:34PM +0300, Alexander Monakov wrote:
> >> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> >> > We are talking here about addressable vars, right (so if we turn it into
> >> > non-addressable, in the SIMT region we just use the normal PTX pseudos),
> >> > right?  We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it
> >> > clear it shouldn't be moved afterwards.  For the private vars used 
> >> > directly
> >> > in SIMD region, for the vars from inlined functions I assume if they are
> >> > addressable we emit clobbers for them too.  Or perhaps the alias oracle 
> >> > can
> >> > say that SIMT_EXIT ifn can clobber any addressable var with that
> >> > flag/attribute.  And yes, SRA would need to propagate it.
> >>
> >> What about motion in the other direction, upwards across SIMT_ENTER()?
> >
> > I think this is a question for Richard, whether it can be done in the alias
> > oracle.  If yes, it supposedly can be done for both SIMT_ENTER and
> > SIMT_EXIT.
> 
> Code motion is trivially avoided for all memory that escapes somewhere.  For
> locals that are just address-taken that's not enough.  So indeed such code
> may move into the SIMT region from both sides -- but that can already happen
> with your proposed patch so it's nothing new.

But in the escape analysis we could consider all the specially marked
"omp simt private" addressable vars to escape and thus confine them into the
SIMT region that way, right?
Then at ompdevlower those would be moved into a struct and from there
onwards they wouldn't be special anymore, just fields in a structure.

Jakub


Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-19 Thread Tamar Christina
Hi All,

This is a slight modification of the earlier patch (Using a different constant 
in the mask creation.)

< +  HOST_WIDE_INT_M1 << bits));
---
> +  HOST_WIDE_INT_M1U << 
> bits));

Kind Regards,
Tamar


From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Tuesday, January 17, 2017 2:50:19 PM
To: GCC Patches; James Greenhalgh; Marcus Shawcroft; Richard Earnshaw
Cc: nd
Subject: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

Hi All,

This patch vectorizes the copysign builtin for AArch64
similar to how it is done for Arm.

AArch64 now generates:

...
.L4:
ldr q1, [x6, x3]
add w4, w4, 1
ldr q0, [x5, x3]
cmp w4, w7
bif v1.16b, v2.16b, v3.16b
fmulv0.2d, v0.2d, v1.2d
str q0, [x5, x3]

for the input:

 x * copysign(1.0, y)

On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%.
Regtested on  aarch64-none-linux-gnu and no regressions.

Ok for trunk?

gcc/
2017-01-17  Tamar Christina  

* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Added CASE_CFN_COPYSIGN.
* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-17  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 69fb756f0fbdc016f35ce1d08f2aaf092a034704..faba7a1a38b6e494e9589637d51c639e3126969d 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1447,6 +1447,16 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
 	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2di];
   else
 	return NULL_TREE;
+CASE_CFN_COPYSIGN:
+  if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv2sf];
+  else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv4sf];
+  else if (AARCH64_CHECK_BUILTIN_MODE (2, D))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv2df];
+  else
+	return NULL_TREE;
+
 default:
   return NULL_TREE;
 }
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
-rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
+rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 bool aarch64_simd_mem_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
 rtx aarch64_tls_get_addr (void);
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index d713d5d8b88837ec6f2dc51188fb252f8d5bc8bd..a67b7589e8badfbd0f13168557ef87e052eedcb1 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -151,6 +151,9 @@
   BUILTIN_VQN (TERNOP, raddhn2, 0)
   BUILTIN_VQN (TERNOP, rsubhn2, 0)
 
+  /* Implemented by copysign3.  */
+  BUILTIN_VHSDF (BINOP, copysign, 3)
+
   BUILTIN_VSQN_HSDI (UNOP, sqmovun, 0)
   /* Implemented by aarch64_qmovn.  */
   BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a12e2268ef9b023112f8d05db0a86957fee83273..b61f79a09462b8cecca7dd2cc4ac0eb4be2dbc79 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -338,6 +338,24 @@
   }
 )
 
+(define_expand "copysign3"
+  [(match_operand:VHSDF 0 "register_operand")
+   (match_operand:VHSDF 1 "register_operand")
+   (match_operand:VHSDF 2 "register_operand")]
+  "TARGET_FLOAT && TARGET_SIMD"
+{
+  rtx v_bitmask = gen_reg_rtx (mode);
+  int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+  emit_move_insn (v_bitmask,
+		  aarch64_simd_gen_const_vector_dup (mode,
+		 HOST_WIDE_INT_M1U << bits));
+  emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
+	 operands[2], operands[1]));
+  DONE;
+}
+)
+
 (define_insn "*aarch64_mul3_elt"
  [(set (match_operand:VMUL 0 "register_operand" "=w")
 (mult:VMUL
diff --git a/gcc/config/aarch64/aarch64

Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).

2017-01-19 Thread Richard Biener
On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng  wrote:
> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>  wrote:
>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška  wrote:
>>> Hello.
>>>
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> I'm not sure.  If we have such zero distance refs in the IL at the
>> time pcom runs then not handling
>> them will pessimize code-gen for cases where they are part of a larger
>> chain.  Esp. I don't like
> Do you mean different chains distributed because of MAX_DISTANCE by
> "larger chain"?  With the patch, such chain of refs would still be
> pred-commoned, just the arithmetic operation not combined, which could
> be handled by later DOM?
>> another stmt_dominates_stmt_p call and thus rather not handle length
>> == 0 at all...
> Not handle length == 0 chains at all may be sub-optimal.  As you said,
> such chain of refs at the point may simply because previous dom/cse
> fail to analyze the references.
>>
>> We already seem to go great length in associating stuff when combining
>> stuff thus isn't this
>> maybe an artifact of this association?  Maybe we simply need to sort
>> the new chain after
>> combining it so the root stmt comes last?
>>
>> Note that there seems to be only a single length per chain but not all
>> refs in a chain need to
>> have the same distance.  This means your fix is likely incomplete?
>> What prevents the situation
>> to arise for distance != 0?
> Yes, it's possible for two refs have the same distance in a chain with
> length > 0.  But that should not be a problem, because existing uses
> are replaced by newly generated PHI variables which always dominate
> the uses, right?

I must admit I don't know predcom in such detail but then can we handle
distance == 0 by simply inserting a PHI for those as well (a degenerate
one of course)?  Or can for distance == 0 the ref be not loop invariant?

Note that for length == 0 all refs in the chain will have a dependence distance
of zero.  So my first argument probably doesn't hold and we could simply
remove handling of length == 0 chains and rely on CSE?

Richard.

> Thanks,
> bin
>>
>> Richard.
>>
>>> Martin
>>>


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Richard Biener wrote:
> >> What about motion in the other direction, upwards across SIMT_ENTER()?
> >
> > I think this is a question for Richard, whether it can be done in the alias
> > oracle.  If yes, it supposedly can be done for both SIMT_ENTER and
> > SIMT_EXIT.
> 
> Code motion is trivially avoided for all memory that escapes somewhere.  For
> locals that are just address-taken that's not enough.  So indeed such code
> may move into the SIMT region from both sides -- but that can already happen
> with your proposed patch so it's nothing new.

Sorry, I don't follow.  There is no problem with moving references to data into
SIMT regions.  The issue is the other way: we need to prevent moving
references to data _out_ of SIMT regions -- this is why I said "upwards" across
SIMT_ENTER.  As far I can tell, my patch does ensure that by performing
allocation via IFNs.

(and we need to prevent data private to SIMT regions from becoming normal
addressable vars on the stack, hence the mention of SRA earlier)

Thanks.
Alexander


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 10:37 AM, Jakub Jelinek  wrote:
> On Thu, Jan 19, 2017 at 10:31:38AM +0100, Richard Biener wrote:
>> On Wed, Jan 18, 2017 at 5:22 PM, Jakub Jelinek  wrote:
>> > On Wed, Jan 18, 2017 at 07:15:34PM +0300, Alexander Monakov wrote:
>> >> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
>> >> > We are talking here about addressable vars, right (so if we turn it into
>> >> > non-addressable, in the SIMT region we just use the normal PTX pseudos),
>> >> > right?  We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make 
>> >> > it
>> >> > clear it shouldn't be moved afterwards.  For the private vars used 
>> >> > directly
>> >> > in SIMD region, for the vars from inlined functions I assume if they are
>> >> > addressable we emit clobbers for them too.  Or perhaps the alias oracle 
>> >> > can
>> >> > say that SIMT_EXIT ifn can clobber any addressable var with that
>> >> > flag/attribute.  And yes, SRA would need to propagate it.
>> >>
>> >> What about motion in the other direction, upwards across SIMT_ENTER()?
>> >
>> > I think this is a question for Richard, whether it can be done in the alias
>> > oracle.  If yes, it supposedly can be done for both SIMT_ENTER and
>> > SIMT_EXIT.
>>
>> Code motion is trivially avoided for all memory that escapes somewhere.  For
>> locals that are just address-taken that's not enough.  So indeed such code
>> may move into the SIMT region from both sides -- but that can already happen
>> with your proposed patch so it's nothing new.
>
> But in the escape analysis we could consider all the specially marked
> "omp simt private" addressable vars to escape and thus confine them into the
> SIMT region that way, right?

We could.  But that doesn't prevent vars from outside of the region to
bleed into
it which was what Alex was asking about?  For the OMP vars just placing
clobbers before EXIT and after ENTER will confine them as well.

Richard.

> Then at ompdevlower those would be moved into a struct and from there
> onwards they wouldn't be special anymore, just fields in a structure.
>
> Jakub


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 10:44 AM, Alexander Monakov  wrote:
> On Thu, 19 Jan 2017, Richard Biener wrote:
>> >> What about motion in the other direction, upwards across SIMT_ENTER()?
>> >
>> > I think this is a question for Richard, whether it can be done in the alias
>> > oracle.  If yes, it supposedly can be done for both SIMT_ENTER and
>> > SIMT_EXIT.
>>
>> Code motion is trivially avoided for all memory that escapes somewhere.  For
>> locals that are just address-taken that's not enough.  So indeed such code
>> may move into the SIMT region from both sides -- but that can already happen
>> with your proposed patch so it's nothing new.
>
> Sorry, I don't follow.  There is no problem with moving references to data 
> into
> SIMT regions.  The issue is the other way: we need to prevent moving
> references to data _out_ of SIMT regions -- this is why I said "upwards" 
> across
> SIMT_ENTER.  As far I can tell, my patch does ensure that by performing
> allocation via IFNs.

Yes, this way you introduce a data dependence.  Doing

SIMT_ENTER (&var, &var2, &var3, &var4...);

would do that as well with the disadvantage of forcing the vars addressable
(thus no SRA, etc.) unless we special-case SIMT_ENTER.  If we make sure
to not move clobbers we could also do

   SIMT_ENTER ();
   var = CLOBBER:
   var2 = CLOBBER;
...

The easiest way of ensuring all this is to outline the SIMT region...
(but you've been there...)

> (and we need to prevent data private to SIMT regions from becoming normal
> addressable vars on the stack, hence the mention of SRA earlier)

Richard.

> Thanks.
> Alexander


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote:
> > But in the escape analysis we could consider all the specially marked
> > "omp simt private" addressable vars to escape and thus confine them into the
> > SIMT region that way, right?
> 
> We could.  But that doesn't prevent vars from outside of the region to
> bleed into
> it which was what Alex was asking about?  For the OMP vars just placing
> clobbers before EXIT and after ENTER will confine them as well.

Movement of read accesses to non-"omp simt private" variables into the SIMT
region across SIMT_ENTER is not a problem I think, those vars still would be
allocated on the per-warp granularity and all simt threads would just read
the same value.  The problem would be only if writes to such variables
are moved later across SIMT_ENTER or earlier across SIMT_EXIT, that would
turn something initially non-racy into racy.
Would it help if we e.g. have an artificial (ABNORMAL) edge in between basic 
block with
SIMT_ENTER and basic block with SIMT_EXIT to make it clearer that those
calls aren't just ordinary calls, but very special control flow altering
statements?

Jakub


Fix false positive of symtab_node::equal_address_to

2017-01-19 Thread Jan Hubicka
Hi,
this patch fixes false positive of symtab_node::equal_address_to when
comparing non-interposable alias with interposable definition.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 244611)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2017-01-17  Jan Hubicka  
+
+   PR lto/78407
+   * symtab.c (symtab_node::equal_address_to): Fix comparing of
+   interposable aliases.
+
 2017-01-18  Peter Bergner  
 
PR target/78516
Index: symtab.c
===
--- symtab.c(revision 244611)
+++ symtab.c(working copy)
@@ -1989,13 +1989,12 @@ symtab_node::equal_address_to (symtab_no
   if (rs1 != rs2 && avail1 >= AVAIL_AVAILABLE && avail2 >= AVAIL_AVAILABLE)
 binds_local1 = binds_local2 = true;
 
-  if ((binds_local1 ? rs1 : this)
-   == (binds_local2 ? rs2 : s2))
+  if (binds_local1 && binds_local2 && rs1 == rs2)
 {
   /* We made use of the fact that alias is not weak.  */
-  if (binds_local1 && rs1 != this)
+  if (rs1 != this)
 refuse_visibility_changes = true;
-  if (binds_local2 && rs2 != s2)
+  if (rs2 != s2)
 s2->refuse_visibility_changes = true;
   return 1;
 }


Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).

2017-01-19 Thread Bin.Cheng
On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
 wrote:
> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng  wrote:
>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>>  wrote:
>>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška  wrote:
 Hello.


 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
>>>
>>> I'm not sure.  If we have such zero distance refs in the IL at the
>>> time pcom runs then not handling
>>> them will pessimize code-gen for cases where they are part of a larger
>>> chain.  Esp. I don't like
>> Do you mean different chains distributed because of MAX_DISTANCE by
>> "larger chain"?  With the patch, such chain of refs would still be
>> pred-commoned, just the arithmetic operation not combined, which could
>> be handled by later DOM?
>>> another stmt_dominates_stmt_p call and thus rather not handle length
>>> == 0 at all...
>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>> such chain of refs at the point may simply because previous dom/cse
>> fail to analyze the references.
>>>
>>> We already seem to go great length in associating stuff when combining
>>> stuff thus isn't this
>>> maybe an artifact of this association?  Maybe we simply need to sort
>>> the new chain after
>>> combining it so the root stmt comes last?
>>>
>>> Note that there seems to be only a single length per chain but not all
>>> refs in a chain need to
>>> have the same distance.  This means your fix is likely incomplete?
>>> What prevents the situation
>>> to arise for distance != 0?
>> Yes, it's possible for two refs have the same distance in a chain with
>> length > 0.  But that should not be a problem, because existing uses
>> are replaced by newly generated PHI variables which always dominate
>> the uses, right?
>
> I must admit I don't know predcom in such detail but then can we handle
> distance == 0 by simply inserting a PHI for those as well (a degenerate
> one of course)?  Or can for distance == 0 the ref be not loop invariant?
Not sure if I understand the question correctly.  Distance is
difference of niter between one ref and the root ref of the chain, so
0 distance/length doesn't mean a loop invariant, it's very likely two
(exactly the same) references in each loop iteration, the address of
reference is still a SCEV.  OTOH, invariant chain has invariant
address, and is handled separately.  For the first question, it's
length, rather than distance that decides how the chain is handled.
For length > 0 chain, we have to insert PHIs to pass carried result of
memory reference, even some refs may have 0 distance to the root ref.
>
> Note that for length == 0 all refs in the chain will have a dependence 
> distance
> of zero.  So my first argument probably doesn't hold and we could simply
> remove handling of length == 0 chains and rely on CSE?
I am not sure, that CSE opportunity of references exists at this point
means previous cse pass failed for some reason.  Predcom could be the
only pass that can handle such case as it understands data reference
better.  Note Martin's patch is not to skip handling of length == 0
chain, later ref will still be CSEed with result of root ref, only the
combination operation like chain1 + chain2 is skipped.  In this case,
following dom should be able to handle such (loop independent) cse
opportunities.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
 Martin



Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Jakub Jelinek wrote:
> On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote:
> > > But in the escape analysis we could consider all the specially marked
> > > "omp simt private" addressable vars to escape and thus confine them into 
> > > the
> > > SIMT region that way, right?
> > 
> > We could.  But that doesn't prevent vars from outside of the region to
> > bleed into
> > it which was what Alex was asking about?  For the OMP vars just placing
> > clobbers before EXIT and after ENTER will confine them as well.
> 
> Movement of read accesses to non-"omp simt private" variables into the SIMT
> region across SIMT_ENTER is not a problem I think, those vars still would be
> allocated on the per-warp granularity and all simt threads would just read
> the same value.  The problem would be only if writes to such variables
> are moved later across SIMT_ENTER or earlier across SIMT_EXIT, that would
> turn something initially non-racy into racy.

Hm, I don't share this concern.  You'd still be writing the same value from each
lane, there cannot exist other stores to this variable in the SIMT region
(because otherwise the original OpenMP SIMD code contained a race).  So I cannot
see how this can break program semantics.  Do you mean the formal race of
writing the same value from active lanes?  On PTX that is well-defined, and the
backend already uses that guarantee outside of SIMT regions.

> Would it help if we e.g. have an artificial (ABNORMAL) edge in between basic 
> block with
> SIMT_ENTER and basic block with SIMT_EXIT to make it clearer that those
> calls aren't just ordinary calls, but very special control flow altering
> statements?
> 
>   Jakub

Alexander



Re: [expand] Fix for PR rtl-optimization/79121 incorrect expansion of extend plus left shift

2017-01-19 Thread Richard Earnshaw (lists)
On 18/01/17 21:07, Jeff Law wrote:
> On 01/18/2017 11:08 AM, Richard Earnshaw (lists) wrote:
>> PR 79121 is a silent wrong code regression where, when generating a
>> shift from an extended value moving from one to two machine registers,
>> the type of the right shift is for the most significant word should be
>> determined by the signedness of the inner type, not the signedness of
>> the result type.
>>
>> gcc:
>> PR rtl-optimization/79121
>> * expr.c (expand_expr_real_2, case LSHIFT_EXPR): Look at the
>> signedness
>> of the inner type when shifting an extended value.
>>
>> testsuite:
>> * gcc.c-torture/execute/pr79121.c: New test.
>>
>> Bootstrapped on x86_64 and cross-tested on ARM.
> I had to refamiliarize myself with this code and nearly got the analysis
> wrong (again).
> 
> Due to the copying of the low word into the high word we have to select
> the type of shift based on the type of the object that was the source of
> the NOP conversion.  The code currently makes that determination based
> on the signedness of the shift, which is wrong.
> 
> 
> OK for the trunk.
> 
> jeff
> 
> 

Thanks, Jeff.  I made some minor tweaks to the comments (adding a bit
more about signed vs unsigned) and committed the following.

What about gcc-6?

R.
diff --git a/gcc/expr.c b/gcc/expr.c
index 4c54faf..2d8868e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9056,9 +9056,9 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
 
/* Left shift optimization when shifting across word_size boundary.
 
-  If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
-  native instruction to support this wide mode left shift.  Given below
-  scenario:
+  If mode == GET_MODE_WIDER_MODE (word_mode), then normally
+  there isn't native instruction to support this wide mode
+  left shift.  Given below scenario:
 
Type A = (Type) B  << C
 
@@ -9067,10 +9067,11 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
 
 | word_size |
 
-  If the shift amount C caused we shift B to across the word size
-  boundary, i.e part of B shifted into high half of destination
-  register, and part of B remains in the low half, then GCC will use
-  the following left shift expand logic:
+  If the shift amount C caused we shift B to across the word
+  size boundary, i.e part of B shifted into high half of
+  destination register, and part of B remains in the low
+  half, then GCC will use the following left shift expand
+  logic:
 
   1. Initialize dest_low to B.
   2. Initialize every bit of dest_high to the sign bit of B.
@@ -9080,20 +9081,30 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
   5. Logic right shift D by (word_size - C).
   6. Or the result of 4 and 5 to finalize dest_high.
 
-  While, by checking gimple statements, if operand B is coming from
-  signed extension, then we can simplify above expand logic into:
+  While, by checking gimple statements, if operand B is
+  coming from signed extension, then we can simplify above
+  expand logic into:
 
  1. dest_high = src_low >> (word_size - C).
  2. dest_low = src_low << C.
 
-  We can use one arithmetic right shift to finish all the purpose of
-  steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2.  */
+  We can use one arithmetic right shift to finish all the
+  purpose of steps 2, 4, 5, 6, thus we reduce the steps
+  needed from 6 into 2.
+
+  The case is similar for zero extension, except that we
+  initialize dest_high to zero rather than copies of the sign
+  bit from B.  Furthermore, we need to use a logical right shift
+  in this case.
+
+  The choice of sign-extension versus zero-extension is
+  determined entirely by whether or not B is signed and is
+  independent of the current setting of unsignedp.  */
 
temp = NULL_RTX;
if (code == LSHIFT_EXPR
&& target
&& REG_P (target)
-   && ! unsignedp
&& mode == GET_MODE_WIDER_MODE (word_mode)
&& GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)
&& TREE_CONSTANT (treeop1)
@@ -9114,6 +9125,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
rtx_insn *seq, *seq_old;
unsigned int high_off = subreg_highpart_offset (word_mode,
mode);
+   bool extend_unsigned
+ = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (def)));
rtx low = lowpart_subreg (word_mode, op0, mode);
rtx dest_low = lowpart_subreg (word_mode, target, mode);
  

Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-19 Thread Kirill Yukhin
Hi Andrew,
On 18 Jan 15:45, Andrew Senkevich wrote:
> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek :
> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
> >> > I've played a bit w/ SDE. And looks like operands are not early clobber:
> >> > TID0: INS 0x004003ee AVX512VEX kmovd k0, eax
> >> > TID0:   k0 := _
> >> > ...
> >> > TID0: INS 0x004003f4 AVX512VEX kshiftlw k0, k0, 0x3
> >> > TID0:   k0 := _fff8
> >> >
> >> > You can see that same dest and source works just fine.
> >>
> >> Hmm, I looked only on what ICC generates, and it was not correct way.
> >
> > I've just tried
> > int
> > main ()
> > {
> >   unsigned int a = 0x;
> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : 
> > "=r" (a) : "r" (a) : "k6");
> >   __builtin_printf ("%x\n", a);
> >   return 0;
> > }
> > on KNL and got 0x.
> > Are you going to report to the SDM authors so that they fix it up?
> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
> > at the end assigning DEST <- TEMP etc. would do.
>
> Yes, we will work on it.
>
> Attached patch refactored in part of builtints declarations and tests, is it 
> Ok?

Could you please add runtime tests for new intrinsics as well?


--
Thanks, K

> --
> WBR,
> Andrew


coretypes.h: change class rtx_def to struct

2017-01-19 Thread Gerald Pfeifer
This innocent looking patch shaves 1749 warnings in stage 1
when building with clang (such as on newer versions of FreeBSD).

Richi, at one point you indicated such changes would be fine
(though usually they go from struct to class)?

Gerald

2016-09-28  Gerald Pfeifer  

* coretypes.h (class rtx_def): Change to struct.

Index: gcc/coretypes.h
===
--- gcc/coretypes.h (revision 240576)
+++ gcc/coretypes.h (working copy)
@@ -59,7 +59,7 @@
 /* Subclasses of rtx_def, using indentation to show the class
hierarchy, along with the relevant invariant.
Where possible, keep this list in the same order as in rtl.def.  */
-class rtx_def;
+struct rtx_def;
   class rtx_expr_list;   /* GET_CODE (X) == EXPR_LIST */
   class rtx_insn_list;   /* GET_CODE (X) == INSN_LIST */
   class rtx_sequence;/* GET_CODE (X) == SEQUENCE */



[PATCH] Fix libgfortran bootstrap error on x86_64-mingw32 (PR target/79127)

2017-01-19 Thread Jakub Jelinek
Hi!

Apparently the Windows SEH has problems with the [xyz]mm16 and later
registers that were added with AVX512F (usable in 64-bit code only),
if any of the zmm16 to zmm31 registers are used in some function compiled
with avx512f or later, GCC emits .seh_savexmm %xmm16, ... directive or
similar and even binutils 2.27 doesn't handle that.

I don't know where the bug is (if the SEH format even allows registers
%xmm16 to %xmm31, what ABI is mingw meant to use for these registers
(if they are meant to be call saved (like %xmm6 to %xmm15) or call used),
and whether the bug is on the GCC side, or binutils side, or just that
AVX512* can't be really used safely on mingw (perhaps a quick hack could
be to make those registers fixed on mingw).

This patch doesn't address anything from it, just attempts to fix the
bootstrap problem by not using avx512f optimized code on mingw until that
issue is resolved.
The #ifdef __x86_64__ in there is because zmm16+ registers can only be used
with -m64 or -mx32, not with -m32 (similarly to xmm8 to xmm15).

Bootstrapped/regtested on x86_64-linux and i686-linux (where HAVE_AVX5412F
is still defined as before assuming not very old binutils) and by Rainer
as mentioned in the PR on x86_64-w64-mingw32, ok for trunk?

2017-01-19  Jakub Jelinek  

PR target/79127
* acinclude.m4 (LIBGFOR_CHECK_AVX512F): Ensure the test clobbers
some zmm16+ registers to verify they are handled by unwind info
properly if needed.
* configure: Regenerated.

--- libgfortran/acinclude.m4.jj 2016-12-05 10:28:28.0 +0100
+++ libgfortran/acinclude.m42017-01-18 16:36:23.360736182 +0100
@@ -437,7 +437,11 @@ AC_DEFUN([LIBGFOR_CHECK_AVX512F], [
typedef double __m512d __attribute__ ((__vector_size__ (64)));
__m512d _mm512_add (__m512d a)
{
- return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
+ __m512d b = __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
+#ifdef __x86_64__
+ asm volatile ("" : : : "zmm16", "zmm17", "zmm18", "zmm19");
+#endif
+ return b;
 }]], [[]])],
AC_DEFINE(HAVE_AVX512F, 1,
[Define if AVX512f instructions can be compiled.]),
--- libgfortran/configure.jj2017-01-17 10:28:41.0 +0100
+++ libgfortran/configure   2017-01-18 16:36:28.592668260 +0100
@@ -26300,7 +26300,11 @@ rm -f core conftest.err conftest.$ac_obj
typedef double __m512d __attribute__ ((__vector_size__ (64)));
__m512d _mm512_add (__m512d a)
{
- return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
+ __m512d b = __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
+#ifdef __x86_64__
+ asm volatile ("" : : : "zmm16", "zmm17", "zmm18", "zmm19");
+#endif
+ return b;
 }
 int
 main ()

Jakub


Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng  wrote:
> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
>  wrote:
>> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng  wrote:
>>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>>>  wrote:
 On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška  wrote:
> Hello.
>
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

 I'm not sure.  If we have such zero distance refs in the IL at the
 time pcom runs then not handling
 them will pessimize code-gen for cases where they are part of a larger
 chain.  Esp. I don't like
>>> Do you mean different chains distributed because of MAX_DISTANCE by
>>> "larger chain"?  With the patch, such chain of refs would still be
>>> pred-commoned, just the arithmetic operation not combined, which could
>>> be handled by later DOM?
 another stmt_dominates_stmt_p call and thus rather not handle length
 == 0 at all...
>>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>>> such chain of refs at the point may simply because previous dom/cse
>>> fail to analyze the references.

 We already seem to go great length in associating stuff when combining
 stuff thus isn't this
 maybe an artifact of this association?  Maybe we simply need to sort
 the new chain after
 combining it so the root stmt comes last?

 Note that there seems to be only a single length per chain but not all
 refs in a chain need to
 have the same distance.  This means your fix is likely incomplete?
 What prevents the situation
 to arise for distance != 0?
>>> Yes, it's possible for two refs have the same distance in a chain with
>>> length > 0.  But that should not be a problem, because existing uses
>>> are replaced by newly generated PHI variables which always dominate
>>> the uses, right?
>>
>> I must admit I don't know predcom in such detail but then can we handle
>> distance == 0 by simply inserting a PHI for those as well (a degenerate
>> one of course)?  Or can for distance == 0 the ref be not loop invariant?
> Not sure if I understand the question correctly.  Distance is
> difference of niter between one ref and the root ref of the chain, so
> 0 distance/length doesn't mean a loop invariant, it's very likely two
> (exactly the same) references in each loop iteration, the address of
> reference is still a SCEV.  OTOH, invariant chain has invariant
> address, and is handled separately.  For the first question, it's
> length, rather than distance that decides how the chain is handled.
> For length > 0 chain, we have to insert PHIs to pass carried result of
> memory reference, even some refs may have 0 distance to the root ref.
>>
>> Note that for length == 0 all refs in the chain will have a dependence 
>> distance
>> of zero.  So my first argument probably doesn't hold and we could simply
>> remove handling of length == 0 chains and rely on CSE?
> I am not sure, that CSE opportunity of references exists at this point
> means previous cse pass failed for some reason.

Or a later pass introduced it (in this case, the vectorizer).

> Predcom could be the
> only pass that can handle such case as it understands data reference
> better.  Note Martin's patch is not to skip handling of length == 0
> chain, later ref will still be CSEed with result of root ref, only the
> combination operation like chain1 + chain2 is skipped.  In this case,
> following dom should be able to handle such (loop independent) cse
> opportunities.

I must admit I don't completely understand the consequences of this
disabling but of course DOM should also be able to handle the CSE
(ok, DOM is known to be quite weak with memory equivalence but
it's not that data-dependence is really better in all cases).

Can't we simply re-order refs in new_chain appropriately or handle
this case in combinable_refs_p instead?

That is, I understand the patch as a hack as it should be always
possible to find dominating refs?

In fact the point of the assert tells us a simpler fix may be

Index: tree-predcom.c
===
--- tree-predcom.c  (revision 244519)
+++ tree-predcom.c  (working copy)
@@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2
  break;
}
 }
+  if (new_chain->length == 0
+  && ! new_chain->has_max_use_after)
+{
+  release_chain (new_chain);
+  return NULL;
+}

   ch1->combined = true;
   ch2->combined = true;

which obviously matches the assert we run into for the testcase?  I'd
be ok with that
(no stmt_dominates_stmt_p, heh) with a suitable comment before it.

Richard.


>
> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin

 Richard.

> Martin
>


Re: coretypes.h: change class rtx_def to struct

2017-01-19 Thread Richard Biener
On Thu, 19 Jan 2017, Gerald Pfeifer wrote:

> This innocent looking patch shaves 1749 warnings in stage 1
> when building with clang (such as on newer versions of FreeBSD).
> 
> Richi, at one point you indicated such changes would be fine
> (though usually they go from struct to class)?

In this case I'd rather prefer you make rtx_def a class (for consistency
with the other classes deriving from it).

That change is ok.

Thanks,
Richard.

> Gerald
> 
> 2016-09-28  Gerald Pfeifer  
> 
>   * coretypes.h (class rtx_def): Change to struct.
> 
> Index: gcc/coretypes.h
> ===
> --- gcc/coretypes.h   (revision 240576)
> +++ gcc/coretypes.h   (working copy)
> @@ -59,7 +59,7 @@
>  /* Subclasses of rtx_def, using indentation to show the class
> hierarchy, along with the relevant invariant.
> Where possible, keep this list in the same order as in rtl.def.  */
> -class rtx_def;
> +struct rtx_def;
>class rtx_expr_list;   /* GET_CODE (X) == EXPR_LIST */
>class rtx_insn_list;   /* GET_CODE (X) == INSN_LIST */
>class rtx_sequence;/* GET_CODE (X) == SEQUENCE */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: coretypes.h: change class rtx_def to struct

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 12:23:34PM +0100, Richard Biener wrote:
> On Thu, 19 Jan 2017, Gerald Pfeifer wrote:
> 
> > This innocent looking patch shaves 1749 warnings in stage 1
> > when building with clang (such as on newer versions of FreeBSD).
> > 
> > Richi, at one point you indicated such changes would be fine
> > (though usually they go from struct to class)?
> 
> In this case I'd rather prefer you make rtx_def a class (for consistency
> with the other classes deriving from it).
> 
> That change is ok.

Wouldn't it be better to just in configure add -Wno-whatever to disable the
bogus clang warning (if it has a -W* switch)?

Making rtx_def a class would require adding public:

Jakub


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 11:00 AM, Jakub Jelinek  wrote:
> On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote:
>> > But in the escape analysis we could consider all the specially marked
>> > "omp simt private" addressable vars to escape and thus confine them into 
>> > the
>> > SIMT region that way, right?
>>
>> We could.  But that doesn't prevent vars from outside of the region to
>> bleed into
>> it which was what Alex was asking about?  For the OMP vars just placing
>> clobbers before EXIT and after ENTER will confine them as well.
>
> Movement of read accesses to non-"omp simt private" variables into the SIMT
> region across SIMT_ENTER is not a problem I think, those vars still would be
> allocated on the per-warp granularity and all simt threads would just read
> the same value.  The problem would be only if writes to such variables
> are moved later across SIMT_ENTER or earlier across SIMT_EXIT, that would
> turn something initially non-racy into racy.
> Would it help if we e.g. have an artificial (ABNORMAL) edge in between basic 
> block with
> SIMT_ENTER and basic block with SIMT_EXIT to make it clearer that those
> calls aren't just ordinary calls, but very special control flow altering
> statements?

Yes, making them start/end BBs and having extra incoming/outgoing abnormal edges
is the usual trick of adding data dependencies on "anything".  But
it's also somewhat
awkward as you need to find a suitable source for the edges (each
other?  but then
we have a loop with abnormals which might or might not confuse things
-- eventually
all BBs will appear to be in an irreducible region, the SIMT loop
itself should be fine
though).

>
> Jakub


Re: coretypes.h: change class rtx_def to struct

2017-01-19 Thread Richard Biener
On Thu, 19 Jan 2017, Jakub Jelinek wrote:

> On Thu, Jan 19, 2017 at 12:23:34PM +0100, Richard Biener wrote:
> > On Thu, 19 Jan 2017, Gerald Pfeifer wrote:
> > 
> > > This innocent looking patch shaves 1749 warnings in stage 1
> > > when building with clang (such as on newer versions of FreeBSD).
> > > 
> > > Richi, at one point you indicated such changes would be fine
> > > (though usually they go from struct to class)?
> > 
> > In this case I'd rather prefer you make rtx_def a class (for consistency
> > with the other classes deriving from it).
> > 
> > That change is ok.
> 
> Wouldn't it be better to just in configure add -Wno-whatever to disable the
> bogus clang warning (if it has a -W* switch)?

Also works for me.

> Making rtx_def a class would require adding public:

Yes, but it would be consistent then.

Richard.


[Ada] Spurious dimensionality errors in inlined bodies.

2017-01-19 Thread Arnaud Charlet
This patch removes spurious dimensionality errors in programs that are compiled
with front-end inlining (as in SPARK mode). The original model of dimensionality
checking analyzes only nodes coming from source because the correctness of the
dimensioned program only depends on source user code. However, when inlining is
enabled, expressions can include both source references and references to
internal entities created for the formals of a subprogram in an inlined call.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-01-19  Ed Schonberg  

* sem_dim.adb (Analyze_Dimension): Analyze object declaration and
identifier nodes that do not come from source, to handle properly
dimensionality check within an inlined body which inclddes both
original operands and rewritten operands. This removes spurious
dimensionality errors in the presence of front-end inlining,
as well as in SPARK mode.

Index: sem_dim.adb
===
--- sem_dim.adb (revision 244612)
+++ sem_dim.adb (working copy)
@@ -1122,16 +1122,22 @@
   --  Aspect is an Ada 2012 feature. Note that there is no need to check
   --  dimensions for nodes that don't come from source, except for subtype
   --  declarations where the dimensions are inherited from the base type,
-  --  and for explicit dereferences generated when expanding iterators.
+  --  for explicit dereferences generated when expanding iterators, and
+  --  for object declarations generated for inlining.
 
   if Ada_Version < Ada_2012 then
  return;
 
-  elsif not Comes_From_Source (N)
-and then Nkind (N) /= N_Subtype_Declaration
-and then Nkind (N) /= N_Explicit_Dereference
-  then
- return;
+  elsif not Comes_From_Source (N) then
+ if Nkind_In (N, N_Explicit_Dereference,
+ N_Identifier,
+ N_Object_Declaration,
+ N_Subtype_Declaration)
+ then
+null;
+ else
+return;
+ end if;
   end if;
 
   case Nkind (N) is
@@ -2138,7 +2144,8 @@
 end if;
  end if;
 
- --  Removal of dimensions in expression
+ --  Remove dimensions in expression after checking consistency
+ --  with given type.
 
  Remove_Dimensions (Expr);
   end if;


[Ada] Memory leak on function returning a limited view result

2017-01-19 Thread Arnaud Charlet
This patch modifies the processing of subprograms to properly frag a function
as returning by reference when the return type is a limited view and the full
view of the type requires the secondary stack.


-- Source --


--  pack_1.ads

with Pack_2;

with Ada.Finalization; use Ada.Finalization;

package Pack_1 is
   type Priv_Typ is tagged private;
   Empty : constant Priv_Typ;

private
   type Priv_Typ is new Controlled with null record;
   Empty : constant Priv_Typ := (Controlled with null record);
end Pack_1;

--  pack_2.ads

limited with Pack_1;

package Pack_2 is
   function Leak return Pack_1.Priv_Typ;
end Pack_2;

--  pack_2.adb

with Pack_1;

package body Pack_2 is
   function Leak return Pack_1.Priv_Typ is
   begin
  return Pack_1.Empty;
   end Leak;
end Pack_2;

--  pack_main.adb

with Pack_1;
with Pack_2;

procedure Pack_Main is
   Obj : Pack_1.Priv_Typ;

begin
   Obj := Pack_2.Leak;
end Pack_Main;

-
-- Compilation --
-

$ gnatmake -q pack_main.adb -largs -lgmem
$ ./pack_main
$ [ -f gmem.out ] && echo ERROR

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-01-19  Hristian Kirtchev  

* exp_ch6.adb (Expand_N_Subprogram_Body): Mark the spec as
returning by reference not just for subprogram body stubs,
but for all subprogram cases.
* sem_util.adb: Code reformatting.
(Requires_Transient_Scope): Update the call to Results_Differ.
(Results_Differ): Update the parameter profile and the associated
comment on usage.

Index: sem_util.adb
===
--- sem_util.adb(revision 244612)
+++ sem_util.adb(working copy)
@@ -129,6 +129,24 @@
--  components in the selected variant to determine whether all of them
--  have a default.
 
+   function Old_Requires_Transient_Scope (Id : Entity_Id) return Boolean;
+   function New_Requires_Transient_Scope (Id : Entity_Id) return Boolean;
+   --  ???We retain the old and new algorithms for Requires_Transient_Scope for
+   --  the time being. New_Requires_Transient_Scope is used by default; the
+   --  debug switch -gnatdQ can be used to do Old_Requires_Transient_Scope
+   --  instead. The intent is to use this temporarily to measure before/after
+   --  efficiency. Note: when this temporary code is removed, the documentation
+   --  of dQ in debug.adb should be removed.
+
+   procedure Results_Differ
+ (Id  : Entity_Id;
+  Old_Val : Boolean;
+  New_Val : Boolean);
+   --  ???Debugging code. Called when the Old_Val and New_Val differ. This
+   --  routine will be removed eventially when New_Requires_Transient_Scope
+   --  becomes Requires_Transient_Scope and Old_Requires_Transient_Scope is
+   --  eliminated.
+
--
--  Abstract_Interface_List --
--
@@ -17013,6 +17031,232 @@
   Actual_Id := Next_Actual (Actual_Id);
end Next_Actual;
 
+   --
+   -- New_Requires_Transient_Scope --
+   --
+
+   function New_Requires_Transient_Scope (Id : Entity_Id) return Boolean is
+  function Caller_Known_Size_Record (Typ : Entity_Id) return Boolean;
+  --  This is called for untagged records and protected types, with
+  --  nondefaulted discriminants. Returns True if the size of function
+  --  results is known at the call site, False otherwise. Returns False
+  --  if there is a variant part that depends on the discriminants of
+  --  this type, or if there is an array constrained by the discriminants
+  --  of this type. ???Currently, this is overly conservative (the array
+  --  could be nested inside some other record that is constrained by
+  --  nondiscriminants). That is, the recursive calls are too conservative.
+
+  function Large_Max_Size_Mutable (Typ : Entity_Id) return Boolean;
+  --  Returns True if Typ is a nonlimited record with defaulted
+  --  discriminants whose max size makes it unsuitable for allocating on
+  --  the primary stack.
+
+  --
+  -- Caller_Known_Size_Record --
+  --
+
+  function Caller_Known_Size_Record (Typ : Entity_Id) return Boolean is
+ pragma Assert (Typ = Underlying_Type (Typ));
+
+  begin
+ if Has_Variant_Part (Typ) and then not Is_Definite_Subtype (Typ) then
+return False;
+ end if;
+
+ declare
+Comp : Entity_Id;
+
+ begin
+Comp := First_Entity (Typ);
+while Present (Comp) loop
+
+   --  Only look at E_Component entities. No need to look at
+   --  E_Discriminant entities, and we must ignore internal
+   --  subtypes generated for constrained components.
+
+   if Ekind (Comp) = E_Component then
+  declare
+   

[Ada] Better error message on illegal selected component with overloaded prefix.

2017-01-19 Thread Arnaud Charlet
This patch inproves the error report produced on a selected component Nam.Comp
which appears in an object declaration, when Nam has seeveral interpretations 
as a function, and there is non-visible package with the same name.

Compiling pqi.adb must yield:

   pqi.adb:15:07: no legal interpretations as function call,
   pqi.adb:15:07: package "Map" is not visible

---
with Map;
with Gtk_Widget; use Gtk_Widget;
procedure PQI is
  type Main_Window_Record_T is new Gtk_Widget.Gobject_Record with
record
  Ii : Integer;
end record;

  procedure Map (Truc : Boolean) is
  begin
null;
  end;

  X : Map.T;

begin
  null;
end;
---
package body Gtk_Widget is
  procedure Map (Jjj : Integer) is
  begin
null;
  end;
end Gtk_Widget;
---
package Gtk_Widget is
  procedure Map (Jjj : Integer);

  type GObject_Record is tagged private;

  type GObject is access all GObject_Record'Class;

  procedure Map (Widget : not null access Gobject_Record);

private
  type GObject_Record is tagged record
I : Integer;
  end record;
end Gtk_Widget;
---
package Map is
  type T is new Integer;
end Map;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-01-19  Ed Schonberg  

* sem_ch4.adb (Diagnose_Call): Improve error message when a
selected component has a prefix that might be interpreted
as a parameterless function call, but none of the candidate
interpretations is parameterless, and there is a hidden homonym
of the prefix that is a package.
* sem_ch8.adb (Find_Selected_Component): If the prefix might be
interpreted as a parameterless function call and its analysis
fails, do not call Analyze_Selected_Component.

Index: sem_ch4.adb
===
--- sem_ch4.adb (revision 244612)
+++ sem_ch4.adb (working copy)
@@ -5881,6 +5881,38 @@
  end loop;
   end if;
 
+  --  Before listing the possible candidates, check whether this
+  --  a prefix of a selected component that has been rewritten as
+  --  a parameterless function call because there is a callable
+  --  candidate interpretation. If there is a hidden package in
+  --  the list of homonyms of the function name (bad programming
+  --  style in any case) suggest that this is the intended entity.
+
+  if No (Parameter_Associations (N))
+and then Nkind (Parent (N)) = N_Selected_Component
+and then Nkind (Parent (Parent (N))) in N_Declaration
+and then Is_Overloaded (Nam)
+  then
+ declare
+Ent : Entity_Id;
+
+ begin
+Ent := Current_Entity (Nam);
+while Present (Ent) loop
+   if Ekind (Ent) = E_Package then
+  Error_Msg_N
+("no legal interpretations as function call,!", Nam);
+  Error_Msg_NE ("\package& is not visible", N, Ent);
+  Rewrite (Parent (N),
+New_Occurrence_Of (Any_Type, Sloc (N)));
+  return;
+   end if;
+
+   Ent := Homonym (Ent);
+end loop;
+ end;
+  end if;
+
   --   Analyze each candidate call again, with full error reporting
   --   for each.
 
Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 244612)
+++ sem_ch8.adb (working copy)
@@ -7048,7 +7048,18 @@
   --  Now analyze the reformatted node
 
   Analyze_Call (P);
-  Analyze_Selected_Component (N);
+
+  --  If the prefix is illegal after this transformation,
+  --  there may be visibility errors on the prefix. The
+  --  safest is to treat the selected component as an error.
+
+  if Error_Posted (P) then
+ Set_Etype (N, Any_Type);
+ return;
+
+  else
+ Analyze_Selected_Component (N);
+  end if;
end if;
 end if;
 


[PATCH] Fix false positive for -Walloc-size-larger-than (PR, bootstrap/79132).

2017-01-19 Thread Martin Liška
Hello.

Following patch fixes asan bootstrap, as mentioned in the PR.

Ready to be installed?
Thanks,
Martin
>From 6a3d1b85e124751fdb804ae86596d30ea98b54af Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 19 Jan 2017 10:25:55 +0100
Subject: [PATCH] Fix false positive for -Walloc-size-larger-than (PR
 bootstrap/79132).

gcc/ChangeLog:

2017-01-19  Martin Liska  

	PR bootstrap/79132
	* tree-ssa-reassoc.c (rewrite_expr_tree_parallel): Insert assert
	that would prevent us to call alloca with -1 as argument.
---
 gcc/tree-ssa-reassoc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 503edd3870d..4a796f48864 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4407,6 +4407,7 @@ rewrite_expr_tree_parallel (gassign *stmt, int width,
 {
   enum tree_code opcode = gimple_assign_rhs_code (stmt);
   int op_num = ops.length ();
+  gcc_assert (op_num > 0);
   int stmt_num = op_num - 1;
   gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
   int op_index = op_num - 1;
-- 
2.11.0



[Ada] Missing invariant procedure body in SPARK mode

2017-01-19 Thread Arnaud Charlet
This patch modifies the generation of the invariant procedure body as follows:

   * The body of the "partial" invariant procedure is still generated at the
   end of the visible declarations.

   * The body of the "full" invariant procedure is generated when the related
   type is frozen or at the end of the private declarations.

The last case ensures that the assertion expression will be resolved in the
proper context even when freezing does not take place before the end of the
private declarations has been reached. This scenario arises in two ways:

   * A private type is declared within a nested package. Reaching the end of
   the private declarations does not cause freezing because the package is not
   library-level.

   * The compilation is subject compilation switch -gnatd.F which enables SPARK
   mode. This in turn suppresses freezing.

This patch also ensures that an invariant procedure is never treated as a
primitive of a tagged type.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-01-19  Hristian Kirtchev  

* sem_ch3.adb Add with and use clauses for Exp_Ch7.
(Analyze_Declarations): Create the DIC and Invariant
procedure bodies s after all freezing has taken place.
(Build_Assertion_Bodies): New routine.
* sem_ch7.adb Remove the with and use clauses for Exp_Ch7
and Exp_Util.
(Analyze_Package_Specification): Remove the
generation of the DIC and Invariant procedure bodies. This is
now done by Analyze_Declarations.
* sem_disp.adb (Check_Dispatching_Operation): DIC and Invariant
procedures are never treated as primitives.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 244622)
+++ sem_ch3.adb (working copy)
@@ -33,6 +33,7 @@
 with Errout;use Errout;
 with Eval_Fat;  use Eval_Fat;
 with Exp_Ch3;   use Exp_Ch3;
+with Exp_Ch7;   use Exp_Ch7;
 with Exp_Ch9;   use Exp_Ch9;
 with Exp_Disp;  use Exp_Disp;
 with Exp_Dist;  use Exp_Dist;
@@ -2153,6 +2154,17 @@
   --  (They have the sloc of the label as found in the source, and that
   --  is ahead of the current declarative part).
 
+  procedure Build_Assertion_Bodies (Decls : List_Id; Context : Node_Id);
+  --  Create the subprogram bodies which verify the run-time semantics of
+  --  the pragmas listed below for each elibigle type found in declarative
+  --  list Decls. The pragmas are:
+  --
+  --Default_Initial_Condition
+  --Invariant
+  --Type_Invariant
+  --
+  --  Context denotes the owner of the declarative list.
+
   procedure Check_Entry_Contracts;
   --  Perform a pre-analysis of the pre- and postconditions of an entry
   --  declaration. This must be done before full resolution and creation
@@ -2195,6 +2207,85 @@
  end loop;
   end Adjust_Decl;
 
+  
+  -- Build_Assertion_Bodies --
+  
+
+  procedure Build_Assertion_Bodies (Decls : List_Id; Context : Node_Id) is
+ procedure Build_Assertion_Bodies_For_Type (Typ : Entity_Id);
+ --  Create the subprogram bodies which verify the run-time semantics
+ --  of the pragmas listed below for type Typ. The pragmas are:
+ --
+ --Default_Initial_Condition
+ --Invariant
+ --Type_Invariant
+
+ -
+ -- Build_Assertion_Bodies_For_Type --
+ -
+
+ procedure Build_Assertion_Bodies_For_Type (Typ : Entity_Id) is
+ begin
+--  Preanalyze and resolve the Default_Initial_Condition assertion
+--  expression at the end of the declarations to catch any errors.
+
+if Has_DIC (Typ) then
+   Build_DIC_Procedure_Body (Typ);
+end if;
+
+if Nkind (Context) = N_Package_Specification then
+
+   --  Preanalyze and resolve the invariants of a private type
+   --  at the end of the visible declarations to catch potential
+   --  errors. Inherited class-wide invariants are not included
+   --  because they have already been resolved.
+
+   if Decls = Visible_Declarations (Context)
+ and then Ekind_In (Typ, E_Limited_Private_Type,
+ E_Private_Type,
+ E_Record_Type_With_Private)
+ and then Has_Own_Invariants (Typ)
+   then
+  Build_Invariant_Procedure_Body
+(Typ   => Typ,
+ Partial_Invariant => True);
+
+   --  Preanalyze and resolve the invariants of a private type's
+   --  full view at the end of the private declarations to catch
+   --  potential errors.
+
+   elsif Decls = Pr

Re: transaction_safe exceptions prevent libstdc++ building for some targets

2017-01-19 Thread Joe Seymour
On 18/01/2017 19:24, DJ Delorie wrote:
> Joe Seymour  writes:
>>> the msp430 -mlarge multilib failing to build with...
 configure: error: Unknown underlying type for size_t
 make[1]: *** [configure-target-libstdc++-v3] Error 1
>>
>> This is still reproducible.
> 
> FYI the underlying type is uint20_t
> 
> I think I've complained that libstdc++ has a hard-coded list, rather
> than using a configure-time check, in the past...
> 

Thanks!

Here's the patch I'm proposing. I've tested it as follows:

- msp430-elf no longer encounters the error when configuring libstdc++-v3. Note
  that libstdc++-v3 doesn't build due to an ICE though.

- Configuring libstdc++-v3 for x86_64-unknown-linux-gnu produces:
  include/bits/c++config.h:#define _GLIBCXX_MANGLE_SIZE_T m
  Both with and without this patch.

- Configuring libstdc++-v3 for i686-unknown-linux-gnu produces:
  include/bits/c++config.h:#define _GLIBCXX_MANGLE_SIZE_T j
  Both with and without this patch.

If it's acceptable I would appreciate it if someone would commit it on my
behalf.

Thanks,

2017-01-19  Joe Seymour  

libstdc++-v3/
* acinclude.m4 (GLIBCXX_CHECK_SIZE_T_MANGLING): Support uint20_t.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 |8 ++--
 libstdc++-v3/configure|   18 ++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 4e04cce..d9859aa 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4460,8 +4460,12 @@ AC_DEFUN([GLIBCXX_CHECK_SIZE_T_MANGLING], [
[glibcxx_cv_size_t_mangling=y], [
   AC_TRY_COMPILE([],
  [extern __SIZE_TYPE__ x; extern unsigned short x;],
- [glibcxx_cv_size_t_mangling=t],
- [glibcxx_cv_size_t_mangling=x])
+ [glibcxx_cv_size_t_mangling=t], [
+AC_TRY_COMPILE([],
+   [extern __SIZE_TYPE__ x; extern __int20 unsigned 
x;],
+   [glibcxx_cv_size_t_mangling=u6uint20],
+   [glibcxx_cv_size_t_mangling=x])
+  ])
 ])
   ])
 ])
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 219a6a3..9bb9862 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -80707,6 +80707,21 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"; then :
   glibcxx_cv_size_t_mangling=t
 else
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+extern __SIZE_TYPE__ x; extern __int20 unsigned x;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  glibcxx_cv_size_t_mangling=u6uint20
+else
   glibcxx_cv_size_t_mangling=x
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
@@ -80721,6 +80736,9 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 
 fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
+fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $glibcxx_cv_size_t_mangling" 
>&5
 $as_echo "$glibcxx_cv_size_t_mangling" >&6; }
   if test $glibcxx_cv_size_t_mangling = x; then
-- 
1.7.1



Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).

2017-01-19 Thread Bin.Cheng
On Thu, Jan 19, 2017 at 11:22 AM, Richard Biener
 wrote:
> On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng  wrote:
>> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
>>  wrote:
>>> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng  wrote:
 On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
  wrote:
> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška  wrote:
>> Hello.
>>
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression 
>> tests.
>>
>> Ready to be installed?
>
> I'm not sure.  If we have such zero distance refs in the IL at the
> time pcom runs then not handling
> them will pessimize code-gen for cases where they are part of a larger
> chain.  Esp. I don't like
 Do you mean different chains distributed because of MAX_DISTANCE by
 "larger chain"?  With the patch, such chain of refs would still be
 pred-commoned, just the arithmetic operation not combined, which could
 be handled by later DOM?
> another stmt_dominates_stmt_p call and thus rather not handle length
> == 0 at all...
 Not handle length == 0 chains at all may be sub-optimal.  As you said,
 such chain of refs at the point may simply because previous dom/cse
 fail to analyze the references.
>
> We already seem to go great length in associating stuff when combining
> stuff thus isn't this
> maybe an artifact of this association?  Maybe we simply need to sort
> the new chain after
> combining it so the root stmt comes last?
>
> Note that there seems to be only a single length per chain but not all
> refs in a chain need to
> have the same distance.  This means your fix is likely incomplete?
> What prevents the situation
> to arise for distance != 0?
 Yes, it's possible for two refs have the same distance in a chain with
 length > 0.  But that should not be a problem, because existing uses
 are replaced by newly generated PHI variables which always dominate
 the uses, right?
>>>
>>> I must admit I don't know predcom in such detail but then can we handle
>>> distance == 0 by simply inserting a PHI for those as well (a degenerate
>>> one of course)?  Or can for distance == 0 the ref be not loop invariant?
>> Not sure if I understand the question correctly.  Distance is
>> difference of niter between one ref and the root ref of the chain, so
>> 0 distance/length doesn't mean a loop invariant, it's very likely two
>> (exactly the same) references in each loop iteration, the address of
>> reference is still a SCEV.  OTOH, invariant chain has invariant
>> address, and is handled separately.  For the first question, it's
>> length, rather than distance that decides how the chain is handled.
>> For length > 0 chain, we have to insert PHIs to pass carried result of
>> memory reference, even some refs may have 0 distance to the root ref.
>>>
>>> Note that for length == 0 all refs in the chain will have a dependence 
>>> distance
>>> of zero.  So my first argument probably doesn't hold and we could simply
>>> remove handling of length == 0 chains and rely on CSE?
>> I am not sure, that CSE opportunity of references exists at this point
>> means previous cse pass failed for some reason.
>
> Or a later pass introduced it (in this case, the vectorizer).
>
>> Predcom could be the
>> only pass that can handle such case as it understands data reference
>> better.  Note Martin's patch is not to skip handling of length == 0
>> chain, later ref will still be CSEed with result of root ref, only the
>> combination operation like chain1 + chain2 is skipped.  In this case,
>> following dom should be able to handle such (loop independent) cse
>> opportunities.
>
> I must admit I don't completely understand the consequences of this
> disabling but of course DOM should also be able to handle the CSE
> (ok, DOM is known to be quite weak with memory equivalence but
> it's not that data-dependence is really better in all cases).
>
> Can't we simply re-order refs in new_chain appropriately or handle
> this case in combinable_refs_p instead?
It's not refs need to be reordered, root ref always dominates others.
But yes, we need to find a dominator insertion place for combined
operation.  Looking at function reassociate_to_the_same_stmt, it
simply inserts new_stmt at root_stmt of root ref, which causes ICE in
this case.  The new_stmt needs to be inserted at a place also
dominating combination of later refs.  We can either compute the
information in place, or compute and pass the information from
previous combinable_refs_p.  This should be the real fix.

Thanks,
bin
>
> That is, I understand the patch as a hack as it should be always
> possible to find dominating refs?
>
> In fact the point of the assert tells us a simpler fix may be
>
> Index: tree-predcom.c
> ===
> --- tree-predcom.c  (revision 244519)
> +++ tree-predcom.c  (working copy)
> @@ -2330

Re: [PATCH, Fortran, pr70696, v2] [Coarray] ICE on EVENT POST of host-associated EVENT_TYPE coarray

2017-01-19 Thread Andre Vehreschild
Hi all,

unfortunately triggered this patch a regression in the opencoarray's testsuite,
which also occurs outside of opencoarray, when a caf-function is used in a
block in the main-program. This patch fixes the error and adds a testcase.

Bootstrapped and regtested ok on x86_64-linux/f25. Ok for trunk?

Regards,
Andre

On Wed, 18 Jan 2017 19:35:59 +0100
Andre Vehreschild  wrote:

> Hi Jerry,
> 
> thanks for the fast review. Committed as r244587.
> 
> Regards,
>   Andre
> 
> On Wed, 18 Jan 2017 09:38:40 -0800
> Jerry DeLisle  wrote:
> 
> > On 01/18/2017 04:26 AM, Andre Vehreschild wrote:  
> > > Hi all,
> > >
> > > the patch I proposed for this pr unfortunately did not catch all errors.
> > > Dominique figured, that the original testcase was not resolved (thanks for
> > > that).
> > >
> > > This patch resolves the linker problem by putting the static token into
> > > the parent function's decl list. Furthermore does the patch beautify the
> > > retrieval of the symbol in gfc_get_tree_for_caf_expr () and remove the
> > > following assert which is unnecessary then, because the symbol is either
> > > already present or created. And gfc_get_symbol_decl () can not return
> > > NULL.
> > >
> > > Bootstrapped and regtested ok on x86_64-linux/f25 and x86-linux/f25  for
> > > trunk. Bootstrapped and regtested ok on x86_64-linux/f25 for gcc-6
> > > (x86-linux has not been tested, because the VM is not that fast).
> > >
> > > Ok for trunk and gcc-6?
> > >
> > > Regards,
> > >   Andre
> > >
> > 
> > This one is OK, thanks.
> > 
> > Jerry  
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2017-01-19  Andre Vehreschild  

PR fortran/70696
* trans-decl.c (gfc_build_qualified_array): Add static decl to parent
function only, when the decl-context is not the translation unit.

gcc/testsuite/ChangeLog:

2017-01-19  Andre Vehreschild  

* gfortran.dg/coarray_43.f90: New test.


diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 51c23e8..5d246cd 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -971,7 +971,9 @@ gfc_build_qualified_array (tree decl, gfc_symbol * sym)
 	  DECL_CONTEXT (token) = sym->ns->proc_name->backend_decl;
 	  gfc_module_add_decl (cur_module, token);
 	}
-  else if (sym->attr.host_assoc)
+  else if (sym->attr.host_assoc
+	   && TREE_CODE (DECL_CONTEXT (current_function_decl))
+	   != TRANSLATION_UNIT_DECL)
 	gfc_add_decl_to_parent_function (token);
   else
 	gfc_add_decl_to_function (token);
diff --git a/gcc/testsuite/gfortran.dg/coarray_43.f90 b/gcc/testsuite/gfortran.dg/coarray_43.f90
new file mode 100644
index 000..d5ee4e1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray_43.f90
@@ -0,0 +1,13 @@
+! { dg-do link }
+! { dg-options "-fcoarray=lib -lcaf_single" }
+
+program coarray_43
+  implicit none
+  integer, parameter :: STR_LEN = 50
+  character(len=STR_LEN) :: str[*]
+  integer :: pos
+  write(str,"(2(a,i2))") "Greetings from image ",this_image()," of ",num_images()
+  block
+pos = scan(str[5], set="123456789")
+  end block
+end program


Re: [PATCH] Introduce --with-gcc-major-version-only configure option (take 2)

2017-01-19 Thread Franz Sirl

Am 2017-01-12 um 21:16 schrieb Jakub Jelinek:

libmpx/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.


Hi,

it seems libmpx/configure.ac is missing the acx.m4 include, because 
there is now a bare GCC_BASE_VER in the regenerated libmpx/configure.


The attached patch seem to fix it, but I'm not good with autoconf.

Franz

Index: libmpx/configure.ac
===
--- libmpx/configure.ac (revision 244613)
+++ libmpx/configure.ac (working copy)
@@ -1,6 +1,8 @@
 # -*- Autoconf -*-
 # Process this file with autoconf to produce a configure script.
 
+sinclude(../config/acx.m4)
+
 AC_PREREQ([2.64])
 AC_INIT(package-unused, version-unused, libmpx)
 


[RS6000] PR79144, cmpstrnsi optimization breaks glibc

2017-01-19 Thread Alan Modra
glibc compiled with current gcc-7 fails one test due to strcmp and
strncmp appearing in the PLT.  This is because the inline expansion of
those functions falls back to a function call, but not using the asm
name for the call.  Fixed by retrieving the asm name from the builtin
decl.  I used the builtin decl simply because it is in a handy table.

Bootstrapped and regression tested powerpc64le-linux.  OK to apply?

* config/rs6000/rs6000.c (expand_strn_compare): Get the asm name
for strcmp and strncmp from corresponding builtin decl.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 44d18e9..4c6bada 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19869,10 +19869,13 @@ expand_strn_compare (rtx operands[], int no_length)
}
 
   if (no_length)
-   emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strcmp"),
-target, LCT_NORMAL, GET_MODE (target), 2,
-force_reg (Pmode, XEXP (src1, 0)), Pmode,
-force_reg (Pmode, XEXP (src2, 0)), Pmode);
+   {
+ tree fun = builtin_decl_explicit (BUILT_IN_STRCMP);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+  target, LCT_NORMAL, GET_MODE (target), 2,
+  force_reg (Pmode, XEXP (src1, 0)), Pmode,
+  force_reg (Pmode, XEXP (src2, 0)), Pmode);
+   }
   else
{
  /* -m32 -mpowerpc64 results in word_mode being DImode even
@@ -19886,7 +19889,8 @@ expand_strn_compare (rtx operands[], int no_length)
 
  emit_move_insn (len_rtx, bytes_rtx);
 
- emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+ tree fun = builtin_decl_explicit (BUILT_IN_STRNCMP);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
   target, LCT_NORMAL, GET_MODE (target), 3,
   force_reg (Pmode, XEXP (src1, 0)), Pmode,
   force_reg (Pmode, XEXP (src2, 0)), Pmode,
@@ -20131,10 +20135,13 @@ expand_strn_compare (rtx operands[], int no_length)
 
   /* Construct call to strcmp/strncmp to compare the rest of the string.  
*/
   if (no_length)
-   emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strcmp"),
-target, LCT_NORMAL, GET_MODE (target), 2,
-force_reg (Pmode, XEXP (src1, 0)), Pmode,
-force_reg (Pmode, XEXP (src2, 0)), Pmode);
+   {
+ tree fun = builtin_decl_explicit (BUILT_IN_STRCMP);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+  target, LCT_NORMAL, GET_MODE (target), 2,
+  force_reg (Pmode, XEXP (src1, 0)), Pmode,
+  force_reg (Pmode, XEXP (src2, 0)), Pmode);
+   }
   else
{
  rtx len_rtx;
@@ -20144,7 +20151,8 @@ expand_strn_compare (rtx operands[], int no_length)
len_rtx = gen_reg_rtx (SImode);
 
  emit_move_insn (len_rtx, GEN_INT (bytes - compare_length));
- emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+ tree fun = builtin_decl_explicit (BUILT_IN_STRNCMP);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
   target, LCT_NORMAL, GET_MODE (target), 3,
   force_reg (Pmode, XEXP (src1, 0)), Pmode,
   force_reg (Pmode, XEXP (src2, 0)), Pmode,

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Don't expand strcmp and strncmp inline when -Os

2017-01-19 Thread Alan Modra
The inline expansions are non-trivial, so aren't really appropriate
for -Os.

Bootstrapped and regression tested powerpc64le-linux.  OK to apply?

* config/rs6000/rs6000.md (cmpstrnsi, cmpstrsi): Fail if
optimizing for size.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9ef3b11..d729f40 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9102,7 +9102,8 @@ (define_expand "cmpstrnsi"
  (use (match_operand:SI 4))])]
   "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
 {
-  if (expand_strn_compare (operands, 0))
+  if (!optimize_insn_for_size_p ()
+  && expand_strn_compare (operands, 0))
 DONE;
   else 
 FAIL;
@@ -9121,7 +9122,8 @@ (define_expand "cmpstrsi"
  (use (match_operand:SI 3))])]
   "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
 {
-  if (expand_strn_compare (operands, 1))
+  if (!optimize_insn_for_size_p ()
+  && expand_strn_compare (operands, 1))
 DONE;
   else 
 FAIL;

-- 
Alan Modra
Australia Development Lab, IBM


[PR middle-end/79123] cast false positive in -Walloca-larger-than=

2017-01-19 Thread Aldy Hernandez
In the attached testcase, we have a clearly bounded case of alloca which 
is being incorrectly reported:


void g (int *p, int *q)
{
   size_t n = (size_t)(p - q);

   if (n < 10)
 f (__builtin_alloca (n));
}

The problem is that VRP gives us an anti-range for `n' which may be out 
of range:


  # RANGE ~[2305843009213693952, 16140901064495857663]
   n_9 = (long unsigned int) _4;

We do a less than stellar job with casts and VR_ANTI_RANGE's, mostly 
because we're trying various heuristics to make up for the fact that we 
have crappy range info from VRP.  More specifically, we're basically 
punting on an VR_ANTI_RANGE and ignoring that the casted result (n_9) 
has a bound later on.


Luckily, we already have code to check simple ranges coming into the 
alloca by looking into all the basic blocks feeding it.  The attached 
patch delays the final decision on anti ranges until we have examined 
the basic blocks and determined for that we are definitely out of range.


I expect all this to disappear with Andrew's upcoming range info overhaul.

OK for trunk?

commit 07677ba03a01cbd1f1c4747b4df333b35d0d3afd
Author: Aldy Hernandez 
Date:   Thu Jan 19 05:44:58 2017 -0500

PR middle-end/79123
* gimple-ssa-warn-alloca.c (alloca_call_type): Make sure
casts from signed to unsigned really don't have a range.

diff --git a/gcc/gimple-ssa-warn-alloca.c b/gcc/gimple-ssa-warn-alloca.c
index a27eea1..d553a34 100644
--- a/gcc/gimple-ssa-warn-alloca.c
+++ b/gcc/gimple-ssa-warn-alloca.c
@@ -272,6 +272,7 @@ static struct alloca_type_and_limit
 alloca_call_type (gimple *stmt, bool is_vla, tree *invalid_casted_type)
 {
   gcc_assert (gimple_alloca_call_p (stmt));
+  bool tentative_cast_from_signed = false;
   tree len = gimple_call_arg (stmt, 0);
   tree len_casted = NULL;
   wide_int min, max;
@@ -352,8 +353,26 @@ alloca_call_type (gimple *stmt, bool is_vla, tree 
*invalid_casted_type)
  // with this heuristic.  Hopefully, this VR_ANTI_RANGE
  // nonsense will go away, and we won't have to catch the
  // sign conversion problems with this crap.
+ //
+ // This is here to catch things like:
+ // void foo(signed int n) {
+ //   if (n < 100)
+ // alloca(n);
+ //   ...
+ // }
  if (cast_from_signed_p (len, invalid_casted_type))
-   return alloca_type_and_limit (ALLOCA_CAST_FROM_SIGNED);
+   {
+ // Unfortunately this also triggers:
+ //
+ // __SIZE_TYPE__ n = (__SIZE_TYPE__)blah;
+ // if (n < 100)
+ //   alloca(n);
+ //
+ // ...which is clearly bounded.  So, double check that
+ // the paths leading up to the size definitely don't
+ // have a bound.
+ tentative_cast_from_signed = true;
+   }
}
   // No easily determined range and try other things.
 }
@@ -371,10 +390,12 @@ alloca_call_type (gimple *stmt, bool is_vla, tree 
*invalid_casted_type)
  ret = alloca_call_type_by_arg (len, len_casted,
 EDGE_PRED (bb, ix), max_size);
  if (ret.type != ALLOCA_OK)
-   return ret;
+   break;
}
 }
 
+  if (tentative_cast_from_signed && ret.type != ALLOCA_OK)
+return alloca_type_and_limit (ALLOCA_CAST_FROM_SIGNED);
   return ret;
 }
 
diff --git a/gcc/testsuite/gcc.dg/Walloca-13.c 
b/gcc/testsuite/gcc.dg/Walloca-13.c
new file mode 100644
index 000..f9bdcef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloca-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-Walloca-larger-than=100 -O2" } */
+
+void f (void*);
+
+void g (int *p, int *q)
+{
+  __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
+  if (n < 100)
+f (__builtin_alloca (n));
+}


[PATCH][AArch64] Purge leftover occurrences of aarch64_nopcrelative_literal_loads

2017-01-19 Thread Kyrill Tkachov

Hi all,

The patch that renamed all uses of aarch64_nopcrelative_literal_loads into 
aarch64_pcrelative_literal_loads missed out
its extern declaration in aarch64-protos.h and a couple of its uses in 
aarch64.md.
The aarch64_nopcrelative_literal_loads doesn't get initialised anywhere (since 
it's unlinked from the
command-line option handling machinery) so the code that uses it is bogus.

In any case, its use in the aarch64_reload_movcp and 
aarch64_reload_movcp
expanders is redundant since they are only ever called through 
aarch64_secondary_reload which gates their use
on !aarch64_pcrelative_literal_loads already. Since these are not standard 
names, their conditions don't actually
matter in any way or checked at any point in the compilation AFAICS.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-01-19  Kyrylo Tkachov  

* config/aarch64/aarch64-protos.h (aarch64_nopcrelative_literal_loads):
Delete.
* config/aarch64/aarch64.md
(aarch64_reload_movcp): Delete reference to
aarch64_nopcrelative_literal_loads.
(aarch64_reload_movcp): Likewise.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..17d8a89ef0ce58b28fc8fc4713edcc4b194bbc90 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -453,7 +453,6 @@ int aarch64_ccmp_mode_to_code (enum machine_mode mode);
 bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
 bool aarch64_operands_ok_for_ldpstp (rtx *, bool, enum machine_mode);
 bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, enum machine_mode);
-extern bool aarch64_nopcrelative_literal_loads;
 
 extern void aarch64_asm_output_pool_epilogue (FILE *, const char *,
 	  tree, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 7aaebd230ddb702447dd4a5d1ba4ab05441cb10a..2b0c2cc01e72d635f85ce4c56be1407986377ab3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5044,7 +5044,7 @@ (define_expand "aarch64_reload_movcp"
  [(set (match_operand:GPF_TF 0 "register_operand" "=w")
(mem:GPF_TF (match_operand 1 "aarch64_constant_pool_symref" "S")))
   (clobber (match_operand:P 2 "register_operand" "=&r"))]
- "TARGET_FLOAT && aarch64_nopcrelative_literal_loads"
+ "TARGET_FLOAT"
  {
aarch64_expand_mov_immediate (operands[2], XEXP (operands[1], 0));
emit_move_insn (operands[0], gen_rtx_MEM (mode, operands[2]));
@@ -5057,7 +5057,7 @@ (define_expand "aarch64_reload_movcp"
  [(set (match_operand:VALL 0 "register_operand" "=w")
(mem:VALL (match_operand 1 "aarch64_constant_pool_symref" "S")))
   (clobber (match_operand:P 2 "register_operand" "=&r"))]
- "TARGET_FLOAT && aarch64_nopcrelative_literal_loads"
+ "TARGET_FLOAT"
  {
aarch64_expand_mov_immediate (operands[2], XEXP (operands[1], 0));
emit_move_insn (operands[0], gen_rtx_MEM (mode, operands[2]));


Re: [PATCH] Introduce --with-gcc-major-version-only configure option (take 2)

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 01:13:14PM +0100, Franz Sirl wrote:
> Am 2017-01-12 um 21:16 schrieb Jakub Jelinek:
> > libmpx/
> > * configure.ac: Add GCC_BASE_VER.
> > * Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
> > get version from BASE-VER file.
> > * configure: Regenerated.
> 
> Hi,
> 
> it seems libmpx/configure.ac is missing the acx.m4 include, because there is
> now a bare GCC_BASE_VER in the regenerated libmpx/configure.
> 
> The attached patch seem to fix it, but I'm not good with autoconf.

Oops, sorry, dunno how this got unnoticed in my testing.
The include belongs to aclocal.m4 IMHO, I've committed this as obvious:

2017-01-19  Jakub Jelinek  

PR other/79046
* aclocal.m4: Include ../config/acx.m4.
* configure: Regenerated.
* Makefile.in: Regenerated.
* mpxrt/Makefile.in: Regenerated.
* mpxwrap/Makefile.in: Regenerated.

--- libmpx/aclocal.m4.jj2015-05-13 18:57:48.0 +0200
+++ libmpx/aclocal.m4   2017-01-19 13:17:35.260138792 +0100
@@ -705,6 +705,7 @@ AC_SUBST([am__tar])
 AC_SUBST([am__untar])
 ]) # _AM_PROG_TAR
 
+m4_include([../config/acx.m4])
 m4_include([../config/lead-dot.m4])
 m4_include([../config/multi.m4])
 m4_include([../config/override.m4])
--- libmpx/configure.jj 2017-01-17 10:28:41.0 +0100
+++ libmpx/configure2017-01-19 13:18:03.142784322 +0100
@@ -603,6 +603,7 @@ ac_subst_vars='am__EXEEXT_FALSE
 am__EXEEXT_TRUE
 LTLIBOBJS
 LIBOBJS
+get_gcc_base_ver
 XCFLAGS
 enable_static
 enable_shared
@@ -731,6 +732,7 @@ with_pic
 enable_fast_install
 with_gnu_ld
 enable_libtool_lock
+with_gcc_major_version_only
 '
   ac_precious_vars='build_alias
 host_alias
@@ -1377,6 +1379,8 @@ Optional Packages:
   --with-pic  try to use only PIC/non-PIC objects [default=use
   both]
   --with-gnu-ld   assume the C compiler uses GNU ld [default=no]
+  --with-gcc-major-version-only
+  use only GCC major number in filesystem paths
 
 Some influential environment variables:
   CC  C compiler command
@@ -11230,7 +11234,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11233 "configure"
+#line 11237 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11336,7 +11340,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11339 "configure"
+#line 11343 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11586,7 +11590,19 @@ else
 fi
 
 # Determine what GCC version number to use in filesystem paths.
-GCC_BASE_VER
+
+  get_gcc_base_ver="cat"
+
+# Check whether --with-gcc-major-version-only was given.
+if test "${with_gcc_major_version_only+set}" = set; then :
+  withval=$with_gcc_major_version_only; if test x$with_gcc_major_version_only 
= xyes ; then
+get_gcc_base_ver="sed -e 's/^\([0-9]*\).*\$\$/\1/'"
+  fi
+
+fi
+
+
+
 
 ac_config_files="$ac_config_files Makefile libmpx.spec"
 
--- libmpx/Makefile.in.jj   2017-01-17 10:28:41.0 +0100
+++ libmpx/Makefile.in  2017-01-19 13:18:09.693701040 +0100
@@ -59,7 +59,8 @@ DIST_COMMON = ChangeLog $(srcdir)/Makefi
$(srcdir)/config.h.in $(srcdir)/../mkinstalldirs \
$(srcdir)/libmpx.spec.in
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/../config/lead-dot.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
+   $(top_srcdir)/../config/lead-dot.m4 \
$(top_srcdir)/../config/multi.m4 \
$(top_srcdir)/../config/override.m4 \
$(top_srcdir)/../ltoptions.m4 $(top_srcdir)/../ltsugar.m4 \
@@ -216,6 +217,7 @@ dvidir = @dvidir@
 enable_shared = @enable_shared@
 enable_static = @enable_static@
 exec_prefix = @exec_prefix@
+get_gcc_base_ver = @get_gcc_base_ver@
 host = @host@
 host_alias = @host_alias@
 host_cpu = @host_cpu@
--- libmpx/mpxrt/Makefile.in.jj (revision 244570)
+++ libmpx/mpxrt/Makefile.in(working copy)
@@ -55,7 +55,8 @@ target_triplet = @target@
 subdir = mpxrt
 DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/../config/lead-dot.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
+   $(top_srcdir)/../config/lead-dot.m4 \
$(top_srcdir)/../config/multi.m4 \
$(top_srcdir)/../config/override.m4 \
$(top_srcdir)/../ltoptions.m4 $(top_srcdir)/../ltsugar.m4 \
@@ -210,6 +211,7 @@ dvidir = @dvidir@
 enable_shared = @enable_shared@
 enable_static = @enable_static@
 exec_prefix = @exec_prefix@
+get_gcc_base_ver = @get_gcc_base_ver@
 host = @host@
 host_alias = @host_alias@
 host_cpu = @host_cpu@
--- libmpx/mpxwrap/Makefile.in.jj   (revision 244570)
+++ libmpx/mpxwrap/Makefile.in  (working copy)
@@ -55,7 +55,8 @@ target_triplet = @target@
 subdir = mpxwrap
 DIST_COMMON = $(srcdir)/Makefile.in $(srcdi

Re: [PR middle-end/79123] cast false positive in -Walloca-larger-than=

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 1:17 PM, Aldy Hernandez  wrote:
> In the attached testcase, we have a clearly bounded case of alloca which is
> being incorrectly reported:
>
> void g (int *p, int *q)
> {
>size_t n = (size_t)(p - q);
>
>if (n < 10)
>  f (__builtin_alloca (n));
> }
>
> The problem is that VRP gives us an anti-range for `n' which may be out of
> range:
>
>   # RANGE ~[2305843009213693952, 16140901064495857663]
>n_9 = (long unsigned int) _4;
>
> We do a less than stellar job with casts and VR_ANTI_RANGE's, mostly because
> we're trying various heuristics to make up for the fact that we have crappy
> range info from VRP.  More specifically, we're basically punting on an
> VR_ANTI_RANGE and ignoring that the casted result (n_9) has a bound later
> on.
>
> Luckily, we already have code to check simple ranges coming into the alloca
> by looking into all the basic blocks feeding it.  The attached patch delays
> the final decision on anti ranges until we have examined the basic blocks
> and determined for that we are definitely out of range.
>
> I expect all this to disappear with Andrew's upcoming range info overhaul.
>
> OK for trunk?

I _really_ wonder why all the range consuming warnings are not emitted
from VRP itself (like we do for -Warray-bounds).  There we'd still see
a range for the argument derived from the if () rather than needing to
do our own mini-VRP from the needessly "incomplete" range-info on
SSA vars.

Richard.

>


Re: Fortran, committed: Forall-with-temporary problems(pr 50069 and pr 55086).

2017-01-19 Thread Christophe Lyon
Hi,


On 18 January 2017 at 22:45, Louis Krupp  wrote:
> Fixed in revision 244601.
>

I've noticed a new failure on arm/aarch64:
  compiler driver --help=fortran option(s): "^ +-.*[^:.]$" absent from
output: "  -ftest-forall-temp  Force creation of temporary to
test infrequently-executed forall code"
when testing gcc. (I mean the error appears in gcc.sum/gcc.log, not
gfortran.sum)

The output above the failure does contain:
[...]
  -fstack-arrays  Put all local arrays on stack.
  -ftest-forall-temp  Force creation of temporary to test
infrequently-executed forall code
  -funderscoring  Append underscores to externally visible names.
[...]
so I'm not sure why there is a failure?

Christophe


Re: [PR middle-end/79123] cast false positive in -Walloca-larger-than=

2017-01-19 Thread Aldy Hernandez

On 01/19/2017 07:45 AM, Richard Biener wrote:

On Thu, Jan 19, 2017 at 1:17 PM, Aldy Hernandez  wrote:

In the attached testcase, we have a clearly bounded case of alloca which is
being incorrectly reported:

void g (int *p, int *q)
{
   size_t n = (size_t)(p - q);

   if (n < 10)
 f (__builtin_alloca (n));
}

The problem is that VRP gives us an anti-range for `n' which may be out of
range:

  # RANGE ~[2305843009213693952, 16140901064495857663]
   n_9 = (long unsigned int) _4;

We do a less than stellar job with casts and VR_ANTI_RANGE's, mostly because
we're trying various heuristics to make up for the fact that we have crappy
range info from VRP.  More specifically, we're basically punting on an
VR_ANTI_RANGE and ignoring that the casted result (n_9) has a bound later
on.

Luckily, we already have code to check simple ranges coming into the alloca
by looking into all the basic blocks feeding it.  The attached patch delays
the final decision on anti ranges until we have examined the basic blocks
and determined for that we are definitely out of range.

I expect all this to disappear with Andrew's upcoming range info overhaul.

OK for trunk?


I _really_ wonder why all the range consuming warnings are not emitted
from VRP itself (like we do for -Warray-bounds).  There we'd still see
a range for the argument derived from the if () rather than needing to
do our own mini-VRP from the needessly "incomplete" range-info on
SSA vars.

Richard.


My original implementation was within VRP itself, using the 
ASSERT_EXPR's, but Jeff suggested doing it in it's own pass.  I can't 
remember the details (I could look it up), but I think it had to do with 
getting better information post-inlining about the call to alloca (not 
sure).


Also, with Andrew's GCC8 work on ranges, it seemed pointless to bend 
over backwards handling ASSERT_EXPR's etc.  And IIRC, implementing it in 
VRP was ugly.


Perhaps Jeff remembers the details better.

Aldy


Re: [PATCH] Fix libgfortran bootstrap error on x86_64-mingw32 (PR target/79127)

2017-01-19 Thread FX
> 2017-01-19  Jakub Jelinek  
> 
>   PR target/79127
>   * acinclude.m4 (LIBGFOR_CHECK_AVX512F): Ensure the test clobbers
>   some zmm16+ registers to verify they are handled by unwind info
>   properly if needed.
>   * configure: Regenerated.

OK to commit, with an additional comment and a link to the PR.
Thanks!

FX


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > Inlining needs to do just like omp-low; if we take the current framework, it
> > would need to collect addressable locals into one struct, replace 
> > references to
> > those locals by field references in the inlined body.  Then it needs to
> > appropriately increase allocation size/alignment in SIMT_ENTER() call 
> > arguments.
> > And finally it would need to initialize the pointer to structure, either
> > immediately after SIMT_ENTER, or in a more fine-grained manner by a
> > __builtin_alloca_with_align-like function (__b_a_w_a is not usable for that
> > itself, because currently for known sizes gcc can make it a local variable).
> 
> One of the problems with that is that it means that you can't easily turn
> addressable private variables into non-addressable ones once you force them
> into such struct that can't be easily SRA split.
> In contrast, if you can get the variable flags/attributes work, if they
> become non-addressable (which is especially important to get rid of C++
> abstraction penalties), you simply don't add them into the specially
> allocated block.

I agree; I'd like to implement the approach with per-variable attributes once
it's clear how it ought to work (right now I'm not sure if placing CLOBBERs on
both entry and exit would be enough; if I understood correctly, Richard is
saying they might be moved, unless the middle-end is changed to prevent it).

Do you want me to pursue that during the early days of stage 4?  I think it
would be nice to have the issue addressed in some way for the upcoming release.

Thanks.
Alexander


Re: [PR middle-end/79123] cast false positive in -Walloca-larger-than=

2017-01-19 Thread Richard Biener
On Thu, Jan 19, 2017 at 2:02 PM, Aldy Hernandez  wrote:
> On 01/19/2017 07:45 AM, Richard Biener wrote:
>>
>> On Thu, Jan 19, 2017 at 1:17 PM, Aldy Hernandez  wrote:
>>>
>>> In the attached testcase, we have a clearly bounded case of alloca which
>>> is
>>> being incorrectly reported:
>>>
>>> void g (int *p, int *q)
>>> {
>>>size_t n = (size_t)(p - q);
>>>
>>>if (n < 10)
>>>  f (__builtin_alloca (n));
>>> }
>>>
>>> The problem is that VRP gives us an anti-range for `n' which may be out
>>> of
>>> range:
>>>
>>>   # RANGE ~[2305843009213693952, 16140901064495857663]
>>>n_9 = (long unsigned int) _4;
>>>
>>> We do a less than stellar job with casts and VR_ANTI_RANGE's, mostly
>>> because
>>> we're trying various heuristics to make up for the fact that we have
>>> crappy
>>> range info from VRP.  More specifically, we're basically punting on an
>>> VR_ANTI_RANGE and ignoring that the casted result (n_9) has a bound later
>>> on.
>>>
>>> Luckily, we already have code to check simple ranges coming into the
>>> alloca
>>> by looking into all the basic blocks feeding it.  The attached patch
>>> delays
>>> the final decision on anti ranges until we have examined the basic blocks
>>> and determined for that we are definitely out of range.
>>>
>>> I expect all this to disappear with Andrew's upcoming range info
>>> overhaul.
>>>
>>> OK for trunk?
>>
>>
>> I _really_ wonder why all the range consuming warnings are not emitted
>> from VRP itself (like we do for -Warray-bounds).  There we'd still see
>> a range for the argument derived from the if () rather than needing to
>> do our own mini-VRP from the needessly "incomplete" range-info on
>> SSA vars.
>>
>> Richard.
>
>
> My original implementation was within VRP itself, using the ASSERT_EXPR's,
> but Jeff suggested doing it in it's own pass.  I can't remember the details
> (I could look it up), but I think it had to do with getting better
> information post-inlining about the call to alloca (not sure).
>
> Also, with Andrew's GCC8 work on ranges, it seemed pointless to bend over
> backwards handling ASSERT_EXPR's etc.  And IIRC, implementing it in VRP was
> ugly.

You don't need to handle ASSERT_EXPRs you simply use the VRP lattice after
propagation (or during if you hook into EVRP of course).  And VRP runs after
inlining.

Richard.

> Perhaps Jeff remembers the details better.
>
> Aldy


Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook

2017-01-19 Thread Richard Earnshaw (lists)
On 17/01/17 15:11, Jiong Wang wrote:
> 
> 
> On 17/01/17 13:57, Richard Earnshaw (lists) wrote:
>> On 16/01/17 14:29, Jiong Wang wrote:
>>>
 I can see the reason for doing this is if you want to seperate the
 interpretion
 of GCC CFA reg-note and the final DWARF CFA operation.  My
 understanding is all
 reg notes defined in gcc/reg-note.def should have general meaning,
 even the
 CFA_WINDOW_SAVE.  For those which are architecture specific we might
 need a
 mechanism to define them in backend only.
 For general reg-notes in gcc/reg-note.def, they are not always have
 the
 corresponding standard DWARF CFA operation, for example
 CFA_WINDOW_SAVE,
 therefore if we want to achieve what you described, I think we also
 need to
 define a new target hook which maps a GCC CFA reg-note into
 architecture DWARF
 CFA operation.

 Regards,
 Jiong


>>> Here is the patch.
>>>
>> Hmm, I really wasn't expecting any more than something like the
>> following in dwarf2cfi.c:
>>
>> @@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn)
>>  handled_one = true;
>>  break;
>>
>> +  case REG_CFA_TOGGLE_RA_MANGLE:
>> case REG_CFA_WINDOW_SAVE:
>> +   /* We overload both of these operations onto the same DWARF
>> opcode.  */
>>  dwarf2out_frame_debug_cfa_window_save ();
>>  handled_one = true;
>>  break;
>>
>> This keeps the two reg notes separate within the compiler, but emits the
>> same dwarf operation during final output.  This avoids the need for new
>> hooks or anything more complicated.
> 
> This was my initial thoughts and the patch would be very small as you've
> demonstrated.  I later moved to this complexer patch as I am thinking it's
> better to completely treat notes in reg-notes.def as having generic
> meaning and
> maps them to standard DWARF CFA if there is, otherwise maps them to target
> private DWARF CFA through this new hook.  This give other targets a
> chance to
> map, for example REG_CFA_TOGGLE_RA_MANGLE, to their architecture DWARF
> number.
> 
> The introduction of new hook looks be very low risk in this stage, the only
> painful thing is the header file needs to be reorganized as we need to
> use some
> DWARF type and reg-note type in targhooks.c.
> 
> Anyway, if the new hook patch is too heavy, I have attached the the
> simplified
> version which simply defines the new REG_CFA_TOGGLE_RA_MANGLE and maps
> to same
> code of REG_CFA_WINDOW_SAVE.
> 
> 

Yes, this is much more like what I had in mind.

OK.

R.

> gcc/
> 
> 2017-01-17  Jiong Wang  
> 
> * reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note.
> * combine-stack-adj.c (no_unhandled_cfa): Handle
> REG_CFA_TOGGLE_RA_MANGLE.
> * dwarf2cfi.c
> (dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE.
> * config/aarch64/aarch64.c (aarch64_expand_prologue): Generates
> DWARF
> info for return address signing.
> (aarch64_expand_epilogue): Likewise.
> 
> 
> k.patch
> 
> 
> diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
> index 
> 20cd59ad08329e9f4f834bfc01d6f9ccc4485283..9ec14a3e44363f35f6419c38233ce5eebddd3458
>  100644
> --- a/gcc/combine-stack-adj.c
> +++ b/gcc/combine-stack-adj.c
> @@ -208,6 +208,7 @@ no_unhandled_cfa (rtx_insn *insn)
>case REG_CFA_SET_VDRAP:
>case REG_CFA_WINDOW_SAVE:
>case REG_CFA_FLUSH_QUEUE:
> +  case REG_CFA_TOGGLE_RA_MANGLE:
>   return false;
>}
>  
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 3bcad76b68b6ea7c9d75d150d79c45fb74d6bf0d..6451b08191cf1a44aed502930da8603111f6e8ca
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3553,7 +3553,11 @@ aarch64_expand_prologue (void)
>  
>/* Sign return address for functions.  */
>if (aarch64_return_address_signing_enabled ())
> -emit_insn (gen_pacisp ());
> +{
> +  insn = emit_insn (gen_pacisp ());
> +  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
> +  RTX_FRAME_RELATED_P (insn) = 1;
> +}
>  
>if (flag_stack_usage_info)
>  current_function_static_stack_size = frame_size;
> @@ -3707,7 +3711,11 @@ aarch64_expand_epilogue (bool for_sibcall)
>  */
>if (aarch64_return_address_signing_enabled ()
>&& (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return))
> -emit_insn (gen_autisp ());
> +{
> +  insn = emit_insn (gen_autisp ());
> +  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
> +  RTX_FRAME_RELATED_P (insn) = 1;
> +}
>  
>/* Stack adjustment for exception handler.  */
>if (crtl->calls_eh_return)
> diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
> index 
> 2748e2fa48e4794181496b26df9b51b7e51e7b84..2a527c9fecab091dccb417492e5dbb2ade244be2
>  100644
> --- a/gcc/dwarf2cfi.c
> +++ b/gcc/dwarf2cfi.c
> @@ -2098,7 +2098,9 @@ dwarf2out_fra

Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-19 Thread Richard Earnshaw (lists)
On 18/01/17 17:07, Jiong Wang wrote:
> On 12/01/17 18:10, Jiong Wang wrote:
>> On 06/01/17 11:47, Jiong Wang wrote:
>>> This is the update on libgcc unwinder support according to new DWARF
>>> proposal.
>>>
>>> As Joseph commented, duplication of unwind-dw2.c is not encouraged in
>>> libgcc,
>>> But from this patch, you can see there are a few places we need to
>>> modify for
>>> AArch64 in unwind-aarch64.c, so the file duplication approach is
>>> acceptable?
>>>
>>>
>>> libgcc/
>>>
>>> 2017-01-06  Jiong Wang  
>>>
>>> * config/aarch64/unwind-aarch64.c
>>> (DWARF_REGNUM_AARCH64_RA_STATE,
>>> RA_A_SIGNED_BIT): New macros.
>>> (execute_cfa_program): Multiplex DW_CFA_GNU_window_save on
>>> AArch64.
>>> (uw_frame_state_for): Clear bit[0] of
>>> DWARF_REGNUM_AARCH64_RA_STATE.
>>> (uw_update_context): Authenticate return address according to
>>> DWARF_REGNUM_AARCH64_RA_STATE.
>>> (uw_init_context_1): Strip signature of seed address.
>>> (uw_install_context): Re-authenticate EH handler's address.
>>>
>> Ping~
>>
>> For comparision, I have also attached the patch using the target macros.
>>
>> Four new target macros are introduced:
>>
>>   MD_POST_EXTRACT_ROOT_ADDR
>>   MD_POST_EXTRACT_FRAME_ADDR
>>   MD_POST_FROB_EH_HANDLER_ADDR
>>   MD_POST_INIT_CONTEXT
>>
>> MD_POST_EXTRACT_ROOT_ADDR is to do target private post processing on
>> the address
>> inside _Unwind* functions, they are serving as root address to start the
>> unwinding.  MD_POST_EXTRACT_FRAME_ADDR is to do target private post
>> processing
>> on th address inside the real user program which throws the exceptions.
>>
>> MD_POST_FROB_EH_HANDLER_ADDR is to do target private frob on the EH
>> handler's
>> address before we install it into current context.
>>
>> MD_POST_INIT_CONTEXT it to do target private initialization on the
>> context
>> structure after common initialization.
>>
>> One "__aarch64__" macro check is needed to multiplex DW_CFA_window_save.
> 
> Ping ~
> 
> Could global reviewers or libgcc maintainers please give a review on the
> generic
> part change?
> 
> One small change is I removed MD_POST_INIT_CONTEXT as I found there is
> MD_FROB_UPDATE_CONTEXT which serve the same purpose.  I still need to
> define
> 
>MD_POST_EXTRACT_ROOT_ADDR
>MD_POST_EXTRACT_FRAME_ADDR
>MD_POST_FROB_EH_HANDLER_ADDR
> 
> And do one __aarch64__ check to multiplexing DW_CFA_GNU_window_save.
> 
> Thanks.
> 
> libgcc/ChangeLog:
> 
> 2017-01-18  Jiong Wang  
> 
> * config/aarch64/aarch64-unwind.h: New file.
> (DWARF_REGNUM_AARCH64_RA_STATE): Define.
> (MD_POST_EXTRACT_ROOT_ADDR): Define.
> (MD_POST_EXTRACT_FRAME_ADDR): Define.
> (MD_POST_FROB_EH_HANDLER_ADDR): Define.
> (MD_FROB_UPDATE_CONTEXT): Define.
> (aarch64_post_extract_frame_addr): New function.
> (aarch64_post_frob_eh_handler_addr): New function.
> (aarch64_frob_update_context): New function.
> * config/aarch64/linux-unwind.h: Include aarch64-unwind.h
> * config.host (aarch64*-*-elf, aarch64*-*-rtems*,
> aarch64*-*-freebsd*):
> Initialize md_unwind_header to include aarch64-unwind.h.
> * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
> (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for
> __aarch64__.
> (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
> (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR.
> (uw_frob_return_addr): New function.
> (_Unwind_DebugHook): Use uw_frob_return_addr.
> 
> 

Comments inline.

> 1.patch
> 
> 
> diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
> index 
> 8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2
>  100644
> --- a/libgcc/unwind-dw2.c
> +++ b/libgcc/unwind-dw2.c
> @@ -136,6 +136,8 @@ struct _Unwind_Context
>  #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
>/* Context which has version/args_size/by_value fields.  */
>  #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
> +  /* Bit reserved on AArch64, return address has been signed with A key.  */
> +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)

Why is this here?   It appears to only be used within the
AArch64-specific header file.

>_Unwind_Word flags;
>/* 0 for now, can be increased when further fields are added to
>   struct _Unwind_Context.  */
> @@ -1185,6 +1187,11 @@ execute_cfa_program (const unsigned char *insn_ptr,
> break;
>  
>   case DW_CFA_GNU_window_save:
> +#ifdef __aarch64__
> +   /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
> +  return address signing status.  */
> +   fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
> +#else
> /* ??? Hardcoded for SPARC register window configuration.  */
> if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32)
>   for (reg = 16; reg < 32; ++reg)
> @@ -1192,6 

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote:
> On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > > Inlining needs to do just like omp-low; if we take the current framework, 
> > > it
> > > would need to collect addressable locals into one struct, replace 
> > > references to
> > > those locals by field references in the inlined body.  Then it needs to
> > > appropriately increase allocation size/alignment in SIMT_ENTER() call 
> > > arguments.
> > > And finally it would need to initialize the pointer to structure, either
> > > immediately after SIMT_ENTER, or in a more fine-grained manner by a
> > > __builtin_alloca_with_align-like function (__b_a_w_a is not usable for 
> > > that
> > > itself, because currently for known sizes gcc can make it a local 
> > > variable).
> > 
> > One of the problems with that is that it means that you can't easily turn
> > addressable private variables into non-addressable ones once you force them
> > into such struct that can't be easily SRA split.
> > In contrast, if you can get the variable flags/attributes work, if they
> > become non-addressable (which is especially important to get rid of C++
> > abstraction penalties), you simply don't add them into the specially
> > allocated block.
> 
> I agree; I'd like to implement the approach with per-variable attributes once
> it's clear how it ought to work (right now I'm not sure if placing CLOBBERs on
> both entry and exit would be enough; if I understood correctly, Richard is
> saying they might be moved, unless the middle-end is changed to prevent it).

I think we drop CLOBBERs in certain cases, though primarily those with
MEM_REF on the lhs rather than just VAR_DECL, or even with VAR_DECL in EH
optimizations if the clobbers are the sole thing in the EH pad.
I think adding the abnormal edges would look safest to me, after all, before
it is fully lowered it is kind like a loop, some threads in the warp might
bypass it.  We also use abnormal edges for vfork etc.

> Do you want me to pursue that during the early days of stage 4?  I think it
> would be nice to have the issue addressed in some way for the upcoming 
> release.

Yeah, I'd appreciate if this could be resolved during stage 4, especially if
the changes will have as few changes to the non-GOMP_USE_SIMT paths as
possible.

Jakub


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-19 Thread Tamar Christina
Hi Joseph & Jeff,

Thanks for the feedback!

> > Generally, I don't see tests added that these new functions are correct
> > for float, double and long double, which would detect such issues if run
> > for a target with IBM long double.
>
> Specifically, I think gcc.dg/tg-tests.h should have tests added for
> __builtin_issubnormal and __builtin_iszero (i.e. have the existing test
> inputs also tested with the new functions).

Right, sorry I missed these, I had tests for them in the aarch64 specific 
backend
but not in the general area.

> Also, I don't think the call to perform_ibm_extended_fixups in
> is_subnormal is correct.  Subnormal for IBM long double is *not* the same
> as subnormal double high part.  Likewise it's incorrect in is_normal as
> well.

The calls to is_zero and is_subnormal were incorrect indeed. I've corrected them
by not calling the fixup code and to instead make sure it falls through into the
old fp based code which did normal floating point operations on the number. This
is the same code as was before in fpclassify so it should work.

As for is_normal, the code is almost identical as the code that used to be in
fold_builtin_interclass_mathfn in BUILT_IN_ISNORMAL, with the exception that I
don't check <= max_value but instead < inifity, so I can reuse the same 
constant.

The code was mostly just copy pasted from that procedure (with now a bug fix in
that it uses the abs value and not the original one).

> Since the patch adds new built-in functions __builtin_issubnormal and
> __builtin_iszero, it also needs to update c-typeck.c:convert_arguments to
> make those functions remove excess precision.  This is mentioned in the
> PRs 77925 and 77926 for addition of those functions (which as I noted in
>  should be
> included in the ChangeLog entry for the patch).

Hmm, I had missed this detail when I looked at the tickets. I've added it now.

> > +#include "stor-layout.h"
> > +#include "target.h"
> Presumably these are for the endianness & mode checking?  It's a bit of
> a wart checking those in gimple-low, but I can live with it.  We might
> consider ways to avoid this layering violation in the future.

Yes, one way to possibly avoid it is to just pass them along as arguments to the
top level function. I can make a ticket to resolve this when stage1 opens again
if you'd like.

Are you ok with the changes Joseph?

The changelog has also been updated:

gcc/
2017-01-19  Tamar Christina  

PR middle-end/77925
PR middle-end/77926
PR middle-end/66462

* gcc/builtins.c (fold_builtin_fpclassify): Removed.
(fold_builtin_interclass_mathfn): Removed.
(expand_builtin): Added builtins to lowering list.
(fold_builtin_n): Removed fold_builtin_varargs.
(fold_builtin_varargs): Removed.
* gcc/builtins.def (BUILT_IN_ISZERO, BUILT_IN_ISSUBNORMAL): Added.
* gcc/real.h (get_min_float): Added.
(real_format): Added is_ieee_compatible field.
* gcc/real.c (get_min_float): Added.
(ieee_single_format): Set is_ieee_compatible flag.
* gcc/gimple-low.c (lower_stm): Define BUILT_IN_FPCLASSIFY,
CASE_FLT_FN (BUILT_IN_ISINF), BUILT_IN_ISINFD32, BUILT_IN_ISINFD64,
BUILT_IN_ISINFD128, BUILT_IN_ISNAND32, BUILT_IN_ISNAND64,
BUILT_IN_ISNAND128, BUILT_IN_ISNAN, BUILT_IN_ISNORMAL, BUILT_IN_ISZERO,
BUILT_IN_ISSUBNORMAL, CASE_FLT_FN (BUILT_IN_FINITE), BUILT_IN_FINITED32
BUILT_IN_FINITED64, BUILT_IN_FINITED128, BUILT_IN_ISFINITE.
(lower_builtin_fpclassify, is_nan, is_normal, is_infinity): Added.
(is_zero, is_subnormal, is_finite, use_ieee_int_mode): Likewise.
(lower_builtin_isnan, lower_builtin_isinfinite): Likewise.
(lower_builtin_isnormal, lower_builtin_iszero): Likewise.
(lower_builtin_issubnormal, lower_builtin_isfinite): Likewise.
(emit_tree_cond, get_num_as_int, emit_tree_and_return_var): Added.
(mips_single_format): Likewise.
(motorola_single_format): Likewise.
(spu_single_format): Likewise.
(ieee_double_format): Likewise.
(mips_double_format): Likewise.
(motorola_double_format): Likewise.
(ieee_extended_motorola_format): Likewise.
(ieee_extended_intel_128_format): Likewise.
(ieee_extended_intel_96_round_53_format): Likewise.
(ibm_extended_format): Likewise.
(mips_extended_format): Likewise.
(ieee_quad_format): Likewise.
(mips_quad_format): Likewise.
(vax_f_format): Likewise.
(vax_d_format): Likewise.
(vax_g_format): Likewise.
(decimal_single_format): Likewise.
(decimal_quad_format): Likewise.
(iee_half_format): Likewise.
(mips_single_format): Likewise.
(arm_half_format): Likewise.
(real_internal_format): Likewise.
* gcc/doc/extend.texi: Added documentation for built-ins.
* gcc/c/c-ty

C++ PATCH for c++/79130 (direct-initialization of arrays with decomposition)

2017-01-19 Thread Jason Merrill
Jakub pointed out that parenthesized decomposition of an array wasn't
properly using direct-initialization.  Rather than pass the flags down
into build_vec_init at this point in GCC 7 development, let's turn the
initializer into something that build_vec_init recognizes as
direct-initialization.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit ea395e328d0f3e3d7800ddb319f0e25b6c363ce9
Author: Jason Merrill 
Date:   Wed Jan 18 16:04:31 2017 -0500

PR c++/79130 - decomposition and direct-initialization

* init.c (build_aggr_init): Communicate direct-initialization to
build_vec_init.
(build_vec_init): Check for array copy sooner.
* parser.c (cp_parser_decomposition_declaration): Remove call to
build_x_compound_expr_from_list.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 8f68c88b..15388b1 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -1574,20 +1574,24 @@ build_aggr_init (tree exp, tree init, int flags, 
tsubst_flags_t complain)
   TREE_READONLY (exp) = 0;
   TREE_THIS_VOLATILE (exp) = 0;
 
-  if (init && init != void_type_node
-  && TREE_CODE (init) != TREE_LIST
-  && !(TREE_CODE (init) == TARGET_EXPR
-  && TARGET_EXPR_DIRECT_INIT_P (init))
-  && !DIRECT_LIST_INIT_P (init))
-flags |= LOOKUP_ONLYCONVERTING;
-
   if (TREE_CODE (type) == ARRAY_TYPE)
 {
   tree itype = init ? TREE_TYPE (init) : NULL_TREE;
   int from_array = 0;
 
   if (VAR_P (exp) && DECL_DECOMPOSITION_P (exp))
-   from_array = 1;
+   {
+ from_array = 1;
+ if (init && DECL_P (init)
+ && !(flags & LOOKUP_ONLYCONVERTING))
+   {
+ /* Wrap the initializer in a CONSTRUCTOR so that build_vec_init
+recognizes it as direct-initialization.  */
+ init = build_constructor_single (init_list_type_node,
+  NULL_TREE, init);
+ CONSTRUCTOR_IS_DIRECT_INIT (init) = true;
+   }
+   }
   else
{
  /* An array may not be initialized use the parenthesized
@@ -1621,6 +1625,13 @@ build_aggr_init (tree exp, tree init, int flags, 
tsubst_flags_t complain)
   return stmt_expr;
 }
 
+  if (init && init != void_type_node
+  && TREE_CODE (init) != TREE_LIST
+  && !(TREE_CODE (init) == TARGET_EXPR
+  && TARGET_EXPR_DIRECT_INIT_P (init))
+  && !DIRECT_LIST_INIT_P (init))
+flags |= LOOKUP_ONLYCONVERTING;
+
   if ((VAR_P (exp) || TREE_CODE (exp) == PARM_DECL)
   && !lookup_attribute ("warn_unused", TYPE_ATTRIBUTES (type)))
 /* Just know that we've seen something for this node.  */
@@ -3825,6 +3836,18 @@ build_vec_init (tree base, tree maxindex, tree init,
   && from_array != 2)
 init = TARGET_EXPR_INITIAL (init);
 
+  bool direct_init = false;
+  if (from_array && init && BRACE_ENCLOSED_INITIALIZER_P (init)
+  && CONSTRUCTOR_NELTS (init) == 1)
+{
+  tree elt = CONSTRUCTOR_ELT (init, 0)->value;
+  if (TREE_CODE (TREE_TYPE (elt)) == ARRAY_TYPE)
+   {
+ direct_init = DIRECT_LIST_INIT_P (init);
+ init = elt;
+   }
+}
+
   /* If we have a braced-init-list, make sure that the array
  is big enough for all the initializers.  */
   bool length_check = (init && TREE_CODE (init) == CONSTRUCTOR
@@ -3905,18 +3928,6 @@ build_vec_init (tree base, tree maxindex, tree init,
   base = get_temp_regvar (ptype, rval);
   iterator = get_temp_regvar (ptrdiff_type_node, maxindex);
 
-  bool direct_init = false;
-  if (from_array && init && BRACE_ENCLOSED_INITIALIZER_P (init)
-  && CONSTRUCTOR_NELTS (init) == 1)
-{
-  tree elt = CONSTRUCTOR_ELT (init, 0)->value;
-  if (TREE_CODE (TREE_TYPE (elt)) == ARRAY_TYPE)
-   {
- direct_init = DIRECT_LIST_INIT_P (init);
- init = elt;
-   }
-}
-
   /* If initializing one array from another, initialize element by
  element.  We rely upon the below calls to do the argument
  checking.  Evaluate the initializer before entering the try block.  */
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6d3b877..29dcfea 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13026,9 +13026,6 @@ cp_parser_decomposition_declaration (cp_parser *parser,
   *init_loc = cp_lexer_peek_token (parser->lexer)->location;
   tree initializer = cp_parser_initializer (parser, &is_direct_init,
&non_constant_p);
-  if (TREE_CODE (initializer) == TREE_LIST)
-   initializer = build_x_compound_expr_from_list (initializer, ELK_INIT,
-  tf_warning_or_error);
 
   if (decl != error_mark_node)
{
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp6.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp6.C
index ed6fce4..7a8a239 100644
--- a/gcc/testsuite/g++.dg/cpp1z/decomp6.C
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp6.C
@@ -89,4 +89,40 @@ main ()
   }
   

Re: [expand] Fix for PR rtl-optimization/79121 incorrect expansion of extend plus left shift

2017-01-19 Thread Jeff Law

On 01/19/2017 03:37 AM, Richard Earnshaw (lists) wrote:

On 18/01/17 21:07, Jeff Law wrote:

On 01/18/2017 11:08 AM, Richard Earnshaw (lists) wrote:

PR 79121 is a silent wrong code regression where, when generating a
shift from an extended value moving from one to two machine registers,
the type of the right shift is for the most significant word should be
determined by the signedness of the inner type, not the signedness of
the result type.

gcc:
PR rtl-optimization/79121
* expr.c (expand_expr_real_2, case LSHIFT_EXPR): Look at the
signedness
of the inner type when shifting an extended value.

testsuite:
* gcc.c-torture/execute/pr79121.c: New test.

Bootstrapped on x86_64 and cross-tested on ARM.

I had to refamiliarize myself with this code and nearly got the analysis
wrong (again).

Due to the copying of the low word into the high word we have to select
the type of shift based on the type of the object that was the source of
the NOP conversion.  The code currently makes that determination based
on the signedness of the shift, which is wrong.


OK for the trunk.

jeff




Thanks, Jeff.  I made some minor tweaks to the comments (adding a bit
more about signed vs unsigned) and committed the following.

What about gcc-6?
I'd think it should be fine for gcc-6 as well.I don't think that 
code has changed since it first went in.  So the analysis and patch 
should both still apply.


jeff


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-19 Thread Joseph Myers
On Thu, 19 Jan 2017, Tamar Christina wrote:

> > Also, I don't think the call to perform_ibm_extended_fixups in
> > is_subnormal is correct.  Subnormal for IBM long double is *not* the same
> > as subnormal double high part.  Likewise it's incorrect in is_normal as
> > well.
> 
> The calls to is_zero and is_subnormal were incorrect indeed. I've 
> corrected them by not calling the fixup code and to instead make sure it 
> falls through into the old fp based code which did normal floating point 
> operations on the number. This is the same code as was before in 
> fpclassify so it should work.

For is_zero it's fine to test based on the high part for IBM long double; 
an IBM long double is (zero, infinite, NaN, finite) if and only if the 
high part is.  The problem is the different threshold between normal and 
subnormal.

> As for is_normal, the code is almost identical as the code that used to 
> be in fold_builtin_interclass_mathfn in BUILT_IN_ISNORMAL, with the 
> exception that I don't check <= max_value but instead < inifity, so I 
> can reuse the same constant.

The old code set orig_arg before converting IBM long double to double.  
Your code sets it after the conversion.  The old code set min_exp based on 
a string set from REAL_MODE_FORMAT (orig_mode)->emin - 1; your code uses 
the adjusted mode.  Both of those are incorrect for IBM long double.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-19 Thread Jiong Wang

Thanks for the review.

On 19/01/17 14:18, Richard Earnshaw (lists) wrote:





diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 
8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2
 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -136,6 +136,8 @@ struct _Unwind_Context
 #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
   /* Context which has version/args_size/by_value fields.  */
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Bit reserved on AArch64, return address has been signed with A key.  */
+#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)


Why is this here?   It appears to only be used within the
AArch64-specific header file.


I was putting it here so that when we allocate the next general purpose bit, we
know clearly that bit 3 is allocated to AArch64 already, and the new general bit
needs to go to the next one.  This can avoid bit collision.




...

+/* Frob exception handler's address kept in TARGET before installing into
+   CURRENT context.  */
+
+static void *
+uw_frob_return_addr (struct _Unwind_Context *current,
+ struct _Unwind_Context *target)
+{
+  void *ret_addr = __builtin_frob_return_addr (target->ra);
+#ifdef MD_POST_FROB_EH_HANDLER_ADDR
+  ret_addr = MD_POST_FROB_EH_HANDLER_ADDR (current, target, ret_addr);
+#endif
+  return ret_addr;
+}
+


I think this function should be marked inline.  The optimizers would
probably inline it anyway, but it seems wrong for us to rely on that.


Thanks, fixed.

Does the updated patch looks OK to you know?

libgcc/

2017-01-19  Jiong Wang  

* config/aarch64/aarch64-unwind.h: New file.
(DWARF_REGNUM_AARCH64_RA_STATE): Define.
(MD_POST_EXTRACT_ROOT_ADDR): Define.
(MD_POST_EXTRACT_FRAME_ADDR): Define.
(MD_POST_FROB_EH_HANDLER_ADDR): Define.
(MD_FROB_UPDATE_CONTEXT): Define.
(aarch64_post_extract_frame_addr): New function.
(aarch64_post_frob_eh_handler_addr): New function.
(aarch64_frob_update_context): New function.
* config/aarch64/linux-unwind.h: Include aarch64-unwind.h
* config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*):
Initialize md_unwind_header to include aarch64-unwind.h.
* unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__.
(uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
(uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR.
(uw_frob_return_addr): New function.
(_Unwind_DebugHook): Use uw_frob_return_addr.

diff --git a/libgcc/config.host b/libgcc/config.host
index 6f2e458e74e776a6b7a310919558bcca76389232..540bfa9635802adabb36a2d1b7cf3416462c59f3 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -331,11 +331,13 @@ aarch64*-*-elf | aarch64*-*-rtems*)
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	md_unwind_header=aarch64/aarch64-unwind.h
 	;;
 aarch64*-*-freebsd*)
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	md_unwind_header=aarch64/aarch64-unwind.h
 	;;
 aarch64*-*-linux*)
 	extra_parts="$extra_parts crtfastmath.o"
diff --git a/libgcc/config/aarch64/aarch64-unwind.h b/libgcc/config/aarch64/aarch64-unwind.h
new file mode 100644
index ..a43d965b358f3e830b85fc42c7bceacf7d41a671
--- /dev/null
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -0,0 +1,87 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef AARCH64_UNWIND_H
+#define AARCH64_UNWIND_H
+
+#define DWARF_REGNUM_AARCH64_RA_STATE 34
+
+#define MD_POST_EXTRACT_ROOT_ADDR(addr)  __builtin_aarch64_xpaclri (addr)
+#define MD_POST_EXTRACT_FRAME_ADDR(context, fs, addr) \
+  aarch64_post_extract_frame_addr (context, fs,

Re: [PATCH, Fortran, pr70696, v2] [Coarray] ICE on EVENT POST of host-associated EVENT_TYPE coarray

2017-01-19 Thread Steve Kargl
On Thu, Jan 19, 2017 at 01:07:50PM +0100, Andre Vehreschild wrote:
> Hi all,
> 
> unfortunately triggered this patch a regression in the opencoarray's 
> testsuite,
> which also occurs outside of opencoarray, when a caf-function is used in a
> block in the main-program. This patch fixes the error and adds a testcase.
> 
> Bootstrapped and regtested ok on x86_64-linux/f25. Ok for trunk?
> 

Yes.

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow


Re: [PATCH] Fix IPA CP where it forgot to add a reference in cgraph

2017-01-19 Thread Martin Liška
On 01/18/2017 11:18 PM, Jan Hubicka wrote:
>>
>> 2016-12-19  Martin Liska  
>>
>>  * cgraphclones.c (cgraph_node::create_virtual_clone):
>>  Create either IPA_REF_LOAD of IPA_REF_READ depending on
>>  whether new_tree is a VAR_DECL or an ADDR_EXPR.
>>  * ipa-cp.c (create_specialized_node): Add reference just for
>>  ADDR_EXPRs.
>>  * symtab.c (symtab_node::maybe_create_reference): Remove guard
>>  as it's guarded in callers.
>> ---
>>  gcc/cgraphclones.c | 6 +-
>>  gcc/ipa-cp.c   | 3 ++-
>>  gcc/symtab.c   | 2 --
>>  3 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
>> index 349892dab67..6c8fe156f23 100644
>> --- a/gcc/cgraphclones.c
>> +++ b/gcc/cgraphclones.c
>> @@ -624,7 +624,11 @@ cgraph_node::create_virtual_clone (vec 
>> redirect_callers,
>>|| in_lto_p)
>>  new_node->unique_name = true;
>>FOR_EACH_VEC_SAFE_ELT (tree_map, i, map)
>> -new_node->maybe_create_reference (map->new_tree, IPA_REF_ADDR, NULL);
>> +{
>> +  ipa_ref_use use_type
>> += TREE_CODE (map->new_tree) == ADDR_EXPR ? IPA_REF_ADDR : IPA_REF_LOAD;
>> +  new_node->maybe_create_reference (map->new_tree, use_type, NULL);
>> +}
>>  
>>if (ipa_transforms_to_apply.exists ())
>>  new_node->ipa_transforms_to_apply
>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>> index d3b50524457..fd312b56fde 100644
>> --- a/gcc/ipa-cp.c
>> +++ b/gcc/ipa-cp.c
>> @@ -3787,7 +3787,8 @@ create_specialized_node (struct cgraph_node *node,
>>   args_to_skip, "constprop");
>>ipa_set_node_agg_value_chain (new_node, aggvals);
>>for (av = aggvals; av; av = av->next)
>> -new_node->maybe_create_reference (av->value, IPA_REF_ADDR, NULL);
>> +if (TREE_CODE (av->value) == ADDR_EXPR)
>> +  new_node->maybe_create_reference (av->value, IPA_REF_ADDR, NULL);
>>  
>>if (dump_file && (dump_flags & TDF_DETAILS))
>>  {
>> diff --git a/gcc/symtab.c b/gcc/symtab.c
>> index 73168a8db09..562a4a2f6a6 100644
>> --- a/gcc/symtab.c
>> +++ b/gcc/symtab.c
>> @@ -598,8 +598,6 @@ symtab_node::maybe_create_reference (tree val, enum 
>> ipa_ref_use use_type,
>>   gimple *stmt)
>>  {
>>STRIP_NOPS (val);
>> -  if (TREE_CODE (val) != ADDR_EXPR)
>> -return NULL;
> 
> Perhaps maybe_create_reference should drop the use_type argument (it is used
> with IPA_REF_ADDR only anyway) and should do the parsing itself?
> I.e. if there is reference do IPA_REF_LOAD and if there is ADDR_EXPR do
> IPA_REF_ADDR.  Why one can not have handled component refs in there?
> 
> Honza

Ok, this is updated version that I've been testing. Added a reference to PR that
we just identified that is also caused by IPA CP and will be fixed by the patch.

Thanks,
Martin

>>val = get_base_var (val);
>>if (val && VAR_OR_FUNCTION_DECL_P (val))
>>  {
>> -- 
>> 2.11.0
>>
> 

>From f42c5ea2ce09ecbf02e472ce31add53189115d66 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 19 Dec 2016 11:03:34 +0100
Subject: [PATCH] Fix IPA CP where it forgot to add a reference in cgraph (PR
 ipa/71190).

gcc/ChangeLog:

2017-01-19  Martin Liska  

	PR ipa/71190
	* cgraph.h (maybe_create_reference): Remove argument and
	update comment.
	* cgraphclones.c (cgraph_node::create_virtual_clone): Remove one
	argument.
	* ipa-cp.c (create_specialized_node): Likewise.
	* symtab.c (symtab_node::maybe_create_reference): Handle
	VAR_DECLs and ADDR_EXPRs and select ipa_ref_use type.
---
 gcc/cgraph.h   |  6 ++
 gcc/cgraphclones.c |  2 +-
 gcc/ipa-cp.c   |  2 +-
 gcc/symtab.c   | 24 +++-
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index db2915c5751..5410a71176a 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -131,11 +131,9 @@ public:
 			 enum ipa_ref_use use_type, gimple *stmt);
 
   /* If VAL is a reference to a function or a variable, add a reference from
- this symtab_node to the corresponding symbol table node.  USE_TYPE specify
- type of the use and STMT the statement (if it exists).  Return the new
+ this symtab_node to the corresponding symbol table node.  Return the new
  reference or NULL if none was created.  */
-  ipa_ref *maybe_create_reference (tree val, enum ipa_ref_use use_type,
-   gimple *stmt);
+  ipa_ref *maybe_create_reference (tree val, gimple *stmt);
 
   /* Clone all references from symtab NODE to this symtab_node.  */
   void clone_references (symtab_node *node);
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index a17663519a9..c2337e84553 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -624,7 +624,7 @@ cgraph_node::create_virtual_clone (vec redirect_callers,
   || in_lto_p)
 new_node->unique_name = true;
   FOR_EACH_VEC_SAFE_ELT (tree_map, i, map)
-new_node->maybe_create_reference (map->new_tree, IPA_REF_ADDR, NULL);
+new_node->maybe_create_reference (map

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> > Sorry for not noticing this earlier, but ...
> > 
> > > +#ifdef __LP64__
> > > +typedef unsigned long long CUdeviceptr;
> > > +#else
> > > +typedef unsigned CUdeviceptr;
> > > +#endif
> > 
> > I think this #ifdef doesn't do the right thing on MinGW.
> > Would it be fine to simplify it?  In my code I have
> > 
> >   typedef uintptr_t CUdeviceptr;
> 
> I think it depends on if we want to use CUdeviceptr typed variables
> in printf like format strings, or C++ overloading (then the exact
> type is significant and we should go for probably
> 
> -#ifdef __LP64__
> +#if defined(__LP64__) || defined(_WIN64)
> 
> (is that the right define for 64-bit MingW, right?).

Yes, _WIN64; libsanitizer has a similar test.  Alternatively, I guess,

  #if __SIZEOF_POINTER__ == 8

> Otherwise, I think using uintptr_t is a problem, because we'd need to
> #include  (the header only includes ).

Note that plugin-nvptx.c already includes .  But, anyway, I agree that
there's value in defining the exact type via the #if.

Alexander


Re: [PATCH] Fix IPA CP where it forgot to add a reference in cgraph

2017-01-19 Thread Jan Hubicka
> >> 2016-12-19  Martin Liska  
> >>
> >>* cgraphclones.c (cgraph_node::create_virtual_clone):
> >>Create either IPA_REF_LOAD of IPA_REF_READ depending on
> >>whether new_tree is a VAR_DECL or an ADDR_EXPR.
> >>* ipa-cp.c (create_specialized_node): Add reference just for
> >>ADDR_EXPRs.
> >>* symtab.c (symtab_node::maybe_create_reference): Remove guard
> >>as it's guarded in callers.

Path is OK
>  ipa_ref *
> -symtab_node::maybe_create_reference (tree val, enum ipa_ref_use use_type,
> -  gimple *stmt)
> +symtab_node::maybe_create_reference (tree val, gimple *stmt)
>  {
>STRIP_NOPS (val);
> -  if (TREE_CODE (val) != ADDR_EXPR)
> -return NULL;
> +  ipa_ref_use use_type;
> +
> +  switch (TREE_CODE (val))
> +{
> +case VAR_DECL:
> +  use_type = IPA_REF_LOAD;
> +  break;
> +case ADDR_EXPR:
> +  use_type = IPA_REF_ADDR;
> +  break;
> +default:
> +  return NULL;
> +}

I would add assert into default that we don't get handled_component_ref here so 
we are sure
we don't miss any declarations (because the bug leads to quite esoteric issues, 
it is better
to be safe than sorry)

Honza


Re: Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Cesar Philippidis
On 01/18/2017 06:22 AM, Richard Biener wrote:
> On Wed, Jan 18, 2017 at 3:11 PM, Alexander Monakov  wrote:
>> On Wed, 18 Jan 2017, Richard Biener wrote:
 After OpenMP lowering, inlining might break this by inlining functions with
 address-taken locals into SIMD regions.  For now, such inlining is 
 disallowed
 (this penalizes only SIMT code), but eventually that can be handled by
 collecting those locals into an allocated struct in a similar manner.
>>>
>>> Can you do the allocation fully after inlining instead?
>>
>> Hm.  I'm not really sure what you mean, because I may not emit GIMPLE that 
>> takes
>> addresses of an incomplete struct's fields, and changing layout of an 
>> existing
>> completely layed out struct is not trivial either.  But I have an idea, see 
>> below.
>>
>> Let's consider what the last patch implements; starting from
>>
>>   #pragma omp simd private(tmp)
>>   for (int i = n1; i < n2; i++)
>> foo (&tmp);
>>
>> it emits GIMPLE that looks like this:
>>
>>   struct {
>> int tmp;
>>   } *omp_simt = IFN_GOMP_SIMT_ENTER (sizeof *omp_simt);
>>
>>   /* This temporary is needed because we populate the struct and (re)gimplify
>>  references to privatized variables in one pass; replacing 'tmp' directly
>>  with '&omp_simt->tmp' wouldn't work, because struct layout is not known
>>  until all fields are added, and gimplification wouldn't be able to emit
>>  the corresponding MEM_REF.  */
>>   int *tmp_ptr = &omp_simt->tmp;
>>
>>   for (int i = n1; i < n2; i++)
>> foo (tmp_ptr);
>>
>>   *.omp_simt = {CLOBBER};
>>   IFN_GOMP_SIMT_EXIT (.omp_simt);
>>
>>
>> So I guess a way to keep allocation layout implicit until after inlining is
>> this: instead of exposing the helper struct in the IR immediately, somehow 
>> keep
>> it on the side, associated only with the SIMT region, and not finalized.  
>> This
>> would allow to populate it as needed during inlining, but the downside is 
>> that
>> references to privatized vars would get weirder: they would need to be via 
>> IFNs
>> that track association with the loop and the privatized variable.  Like this:
>>
>>   void *omp_simt = IFN_GOMP_SIMT_ENTER_BY_UID (simduid);
>>
>>   int *tmp_ptr = IFN_GOMP_SIMT_VAR_REF (omp_simt, simduid, uid_for_tmp);
>>
>>   for (...)
>> foo (tmp_ptr);
>>
>>   *tmp_ptr = {CLOBBER}; /* ??? for each privatized variable? */
>>   IFN_GOMP_SIMT_EXIT (.omp_simt);
>>
>> (note how in this scheme we'd need to emit separate CLOBBERs for each field)
>>
>> But absence of explicit struct would hurt alias analysis I'm afraid: it 
>> wouldn't
>> be able to deduce that references to different privatized variable do not 
>> alias
>> until after calls to SIMT_VAR_REF are replaced.  Or is that not an issue?
> 
> It probably is.
> 
> But I guess I was asking whether you could initially emit
> 
>  void *omp_simt = IFN_GOMP_SIMT_ENTER (0);
> 
>   for (int i = n1; i < n2; i++)
>  foo (&tmp);
> 
>   IFN_GOMP_SIMT_EXIT (omp_simt);
> 
> and only after inlining do liveness / use analysis of everything between
> SIMT_ENTER and SIMT_EXIT doing the rewriting only at that point.

We're doing something similar to this in OpenACC. However, all of the
variable broadcasting happens in the backend. One explicit limitation
(not by design, but rather a simplification) of our implementation is
because private variables are only broadcasted, private arrays don't
quite behave as expected. E.g.

  #pragma acc parallel loop
  {
int array[N];

#pragma acc loop
for (...)
  array[] = ...

// use array here
  }

Here, only the values of thread 0 get updated after the inner loop
terminates. For the most part, local variables are generally expected to
be private inside parallel loops, because any write to those variables
creates dependencies.

I have seen a couple of strategies on how to resolve this private array
problem. But as of right now, the behavior of private arrays in OpenACC
is undefined, so we're going to leave it as-is.

How many levels of parallelism does OpenMP have? OpenACC has three,
gang, worker and vector. On nvptx targets, gangs are mapped to CTA
blocks, workers to warps, and vectors to individual threads.

Alex are you only planning on supporting two levels of parallelism? If
so, maybe it would be more straightforward to move those private/local
variables into nvptx .shared memory.

Another thing that you are going to need to consider is barriers and
synchronization. Part of the reason for using those function markers is
to explicitly form SESE regions, so that we can insert barriers as
necessary. Synchronization on nvptx is one of those things where 99% of
the time the code runs fine without any explicit barriers, but the other
1% it will cause mysterious failures.

Cesar


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-19 Thread Tamar Christina

> > The calls to is_zero and is_subnormal were incorrect indeed. I've
> > corrected them by not calling the fixup code and to instead make sure it
> > falls through into the old fp based code which did normal floating point
> > operations on the number. This is the same code as was before in
> > fpclassify so it should work.
>
> For is_zero it's fine to test based on the high part for IBM long double;
> an IBM long double is (zero, infinite, NaN, finite) if and only if the
> high part is.  The problem is the different threshold between normal and
> subnormal.

Ah ok, I've added it back in for is_zero then.

> > As for is_normal, the code is almost identical as the code that used to
> > be in fold_builtin_interclass_mathfn in BUILT_IN_ISNORMAL, with the
> > exception that I don't check <= max_value but instead < inifity, so I
> > can reuse the same constant.
>
> The old code set orig_arg before converting IBM long double to double.
> Your code sets it after the conversion.  The old code set min_exp based on
> a string set from REAL_MODE_FORMAT (orig_mode)->emin - 1; your code uses
> the adjusted mode.  Both of those are incorrect for IBM long double.

Hmm, this is correct, I've made the change to be like it was before, but
there's something I don't quite get about the old code, if it's building rmin
the orig_mode which is larger than mode, but then creating the real using
build_real (type, rmin) using the adjusted type, shouldn't it not be getting
truncated?

I've updated and attached the patch.

Tamar


From: Joseph Myers 
Sent: Thursday, January 19, 2017 2:46:06 PM
To: Tamar Christina
Cc: Jeff Law; GCC Patches; Wilco Dijkstra; rguent...@suse.de; Michael Meissner; 
nd
Subject: Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers 
in GIMPLE.

On Thu, 19 Jan 2017, Tamar Christina wrote:

> > Also, I don't think the call to perform_ibm_extended_fixups in
> > is_subnormal is correct.  Subnormal for IBM long double is *not* the same
> > as subnormal double high part.  Likewise it's incorrect in is_normal as
> > well.
>
> The calls to is_zero and is_subnormal were incorrect indeed. I've
> corrected them by not calling the fixup code and to instead make sure it
> falls through into the old fp based code which did normal floating point
> operations on the number. This is the same code as was before in
> fpclassify so it should work.

For is_zero it's fine to test based on the high part for IBM long double;
an IBM long double is (zero, infinite, NaN, finite) if and only if the
high part is.  The problem is the different threshold between normal and
subnormal.

> As for is_normal, the code is almost identical as the code that used to
> be in fold_builtin_interclass_mathfn in BUILT_IN_ISNORMAL, with the
> exception that I don't check <= max_value but instead < inifity, so I
> can reuse the same constant.

The old code set orig_arg before converting IBM long double to double.
Your code sets it after the conversion.  The old code set min_exp based on
a string set from REAL_MODE_FORMAT (orig_mode)->emin - 1; your code uses
the adjusted mode.  Both of those are incorrect for IBM long double.

--
Joseph S. Myers
jos...@codesourcery.com
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3ac2d44148440b124559ba7cd3de483b7a74b72d..d8ff9c70ae6b9e72e09b8cbd9a0bd41b6830b83e 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -160,7 +160,6 @@ static tree fold_builtin_0 (location_t, tree);
 static tree fold_builtin_1 (location_t, tree, tree);
 static tree fold_builtin_2 (location_t, tree, tree, tree);
 static tree fold_builtin_3 (location_t, tree, tree, tree, tree);
-static tree fold_builtin_varargs (location_t, tree, tree*, int);
 
 static tree fold_builtin_strpbrk (location_t, tree, tree, tree);
 static tree fold_builtin_strstr (location_t, tree, tree, tree);
@@ -2202,19 +2201,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   switch (DECL_FUNCTION_CODE (fndecl))
 {
 CASE_FLT_FN (BUILT_IN_ILOGB):
-  errno_set = true; builtin_optab = ilogb_optab; break;
-CASE_FLT_FN (BUILT_IN_ISINF):
-  builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
-case BUILT_IN_ISFINITE:
-CASE_FLT_FN (BUILT_IN_FINITE):
-case BUILT_IN_FINITED32:
-case BUILT_IN_FINITED64:
-case BUILT_IN_FINITED128:
-case BUILT_IN_ISINFD32:
-case BUILT_IN_ISINFD64:
-case BUILT_IN_ISINFD128:
-  /* These builtins have no optabs (yet).  */
+  errno_set = true;
+  builtin_optab = ilogb_optab;
   break;
 default:
   gcc_unreachable ();
@@ -2233,8 +2221,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 }
 
 /* Expand a call to one of the builtin math functions that operate on
-   floating point argument and output an integer result (ilogb, isinf,
-   isnan, etc).
+   floating point argument and output an integer result (ilogb, etc).
Return 0 if a normal call should be emitted rather than expanding the
function in-line.  

[PATCH] Fix further -fdebug-types-section bugs (PR debug/79129)

2017-01-19 Thread Jakub Jelinek
Hi!

This is on top of the PR78835 patch I've posted yesterday.
We ICE on the following testcase reduced from libstdc++ ios_failure.cc.
If there are nested comdat types, as B (outer type) and A (inner type) in the
testcase, we first move A to a type unit and add there copies of its
children, and preserve a skeleton copy of A in the main CU with the
originals of their children (i.e. something other DIEs may refer to).
Then we try to move B into another type unit, but mishandle the children,
the DIE for foo subprogram ends up in the B type unit rather than in the
main CU where it belongs.  Of course nothing can refer to that there from
outside of that type unit.

Either hunk in dwarf2out.c is actually enough to fix the ICE.  The second
hunk fixes the bug that after the moving of the children etc. we end up with
node.old_die == node.new_die, which is obviously wrong.  We always want
old_die to be the DIE that will be in the type unit while new_die one that
will remain in the main CU.  The first hunk is an optimization, my
understanding is that the creation of the skeleton DIEs is so that uses
outside of the type unit (i.e. from the main CU) can actually refer to those
DIEs (when we lookup_decl_die etc.; from main CU to type units we can only
refer using DW_FORM_ref_sig8 to the type unit's single chosen type die).
As break_out_comdat_types and its helpers don't actually adjust random
references, for nested types I actually think it is enough to have in the
type unit of the outer type just a skeleton unit of the inner type with
DW_AT_signature of the type residing in yet another type unit, I think there
is no way to refer to its children from that type unit.  So there is no need
to duplicate the children and we can just move them to the main CU.

Bootstrapped/regtested on i686-linux with -fdebug-types-section hacked in
common.opt to be the default, ok for trunk?

Note Ada bootstrap on x86_64-linux with -fdebug-types-section is still
broken, but that is preexisting, not introduced by this patch:
.../gcc/gnat1 -gnatwa -quiet -nostdinc -dumpbase g-pehage.adb -auxbase-strip 
g-pehage.o -O2 -Wextra -Wall -g -fpic -gnatpg -mtune=generic -march=x86-64 
-gnatO g-pehage.o g-pehage.adb -o /tmp/ccrAYuNB.s -fdebug-types-section
raised STORAGE_ERROR : stack overflow or erroneous memory access

2017-01-19  Jakub Jelinek  

PR debug/79129
* dwarf2out.c (generate_skeleton_bottom_up): For children with
comdat_type_p set, just clone them, but keep the children in the
original DIE.

* g++.dg/debug/dwarf2/pr79129.C: New test.

--- gcc/dwarf2out.c.jj  2017-01-18 13:40:09.0 +0100
+++ gcc/dwarf2out.c 2017-01-19 10:46:36.297776725 +0100
@@ -7918,6 +7918,19 @@ generate_skeleton_bottom_up (skeleton_ch
add_child_die (parent->new_die, c);
c = prev;
  }
+   else if (c->comdat_type_p)
+ {
+   /* This is the skeleton of earlier break_out_comdat_types
+  type.  Clone the existing DIE, but keep the children
+  under the original (which is in the main CU).  */
+   dw_die_ref clone = clone_die (c);
+
+   replace_child (c, clone, prev);
+   generate_skeleton_ancestor_tree (parent);
+   add_child_die (parent->new_die, c);
+   c = clone;
+   continue;
+ }
else
  {
/* Clone the existing DIE, move the original to the skeleton
@@ -7936,6 +7949,7 @@ generate_skeleton_bottom_up (skeleton_ch
replace_child (c, clone, prev);
generate_skeleton_ancestor_tree (parent);
add_child_die (parent->new_die, c);
+   node.old_die = clone;
node.new_die = c;
c = clone;
  }
--- gcc/testsuite/g++.dg/debug/dwarf2/pr79129.C.jj  2017-01-19 
11:15:01.876727938 +0100
+++ gcc/testsuite/g++.dg/debug/dwarf2/pr79129.C 2017-01-19 11:14:25.0 
+0100
@@ -0,0 +1,12 @@
+/* PR debug/79129 */
+/* { dg-do compile } */
+/* { dg-options "-gdwarf-4 -O2 -fdebug-types-section" } */
+
+struct B
+{
+  struct A { void foo (int &); };
+  A *bar ();
+  ~B () { int a = 1; bar ()->foo (a); }
+};
+struct C { ~C (); B c; };
+C::~C () {}

Jakub


Re: [WPA PATCH] Comdat group splitting

2017-01-19 Thread Jan Hubicka
> honza,
> this is the fix for the partitioned WPA bug I was tracking down.
> 
> We have base and complete dtors sharing a comdat group (one's an alias for
> the other).  The linker tells us the complete dtor is PREVAILING_DEF, as
> it's referenced from some other library.  The base dtor is UNKNOWN.
> 
> We therefore internalize the base dtor, making it PREVAILING_DEF_IRONLY and
> split the comdat group.  But the comdat group splitting also internalizes
> the complete dtor /and/ it does so inconsistently.
> 
> The bug manifested at runtime w/o link error as the complete dtor resolved
> to zero in the final link from wpa partitions that didn't contain its
> definition. (For extra fun, that was via a call to __cxa_at_exit registering
> a null function pointer, and getting the subsequent seg fault at program
> exit.)  When we created WPA partitions node->externally_visible was still
> set, so we thought the now-internalized complete dtor was still externally
> visible -- but varasm looks at the DECL itself and emits an internal one.
> Plus the references to it were weak (& now hidden), so resolved to zero,
> rather than link error.  And the external library either had its own
> definition which then prevailed for it.  All rather 'ew'.
> 
> Anyway, this patch does 3 things
> 1) Moves the next->unique_name adjustment to before make_decl_local for
> members of the comdat group -- that matches the behaviour of the decl of
> interest itself.

That part is OK.
> 
> 2) For LDPR_PREVAILING_DEF members we don't make_decl_local, but instead
> clear DECL_COMDAT and DECL_WEAK.  Thus forcing this decl to be the
> prevailing decl in the final link
> 
> 3) For decls we localize, we also clear node->externally_visible and
> node->force_by_abi.  That matches the behavior for the decl of interest too
> and will clue the wpa partitioning logic into knowing it needs to
> hidden-externalize the decl.

So at the moment it works in a way
 1) we walk first symbol of the comdat and it is LDPR_PREVAILING_DEF and thus
we set externall visible flag
 2) we walk second symbol of the comdat and it is LDPR_PREVAILING_DEF_IRONLY
and thus we decide to privatize the whole comdat group, during this
process we force the first symbol local and clear the externally_visible
flag?

I think at a time we decide on external visibility of a symbol in a comdat
group, we need to check that the comdat group as a whole is not exporte (i.e.
no LDPR_PREVAILING_DEF_EXP or incremental linking).  Then we know we can 
dissolve
the comdat group (without actually affecting visibility) and then we can
handle each symbol independently.

Honza
> 
> nathan
> 
> -- 
> Nathan Sidwell

> 2017-01-18  Nathan Sidwell  
> 
>   * ipa-visibility.c (localize_node): Set comdat's unique name
>   before adjusting resolution. Make PREVAILING_DEF members strongly
>   public. Set visibility to false for localized decls.
> 
> Index: ipa-visibility.c
> ===
> --- ipa-visibility.c  (revision 244546)
> +++ ipa-visibility.c  (working copy)
> @@ -542,16 +542,32 @@ localize_node (bool whole_program, symta
>for (symtab_node *next = node->same_comdat_group;
>  next != node; next = next->same_comdat_group)
>   {
> -   next->set_comdat_group (NULL);
> -   if (!next->alias)
> - next->set_section (NULL);
> -   if (!next->transparent_alias)
> - next->make_decl_local ();
> next->unique_name
>   |= ((next->resolution == LDPR_PREVAILING_DEF_IRONLY
>|| next->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
>   && TREE_PUBLIC (next->decl)
>   && !flag_incremental_link);
> +
> +   next->set_comdat_group (NULL);
> +   if (!next->alias)
> + next->set_section (NULL);
> +   if (next->transparent_alias)
> + /* Do nothing.  */;
> +   else if (next->resolution == LDPR_PREVAILING_DEF)
> + {
> +   /* Make this a strong defn, so the external
> +  users don't mistakenly choose some other
> +  instance.  */
> +   DECL_COMDAT (next->decl) = false;
> +   DECL_WEAK (next->decl) = false;
> + }
> +   else
> + {
> +   next->externally_visible = false;
> +   next->forced_by_abi = false;
> +   next->resolution = LDPR_PREVAILING_DEF_IRONLY;
> +   next->make_decl_local ();
> + }
>   }
>  
>/* Now everything's localized, the grouping has no meaning, and



Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-19 Thread Tamar Christina
Hi All,

It seems the entry in config/aarch64/aarch64-builtins.c isn't needed, as such 
I've simplified the patch
and the changelog.

Ok for trunk?

Tamar


gcc/
2017-01-19  Tamar Christina  

* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-19  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Thursday, January 19, 2017 9:38:09 AM
To: GCC Patches; James Greenhalgh; Marcus Shawcroft; Richard Earnshaw
Cc: nd
Subject: Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

Hi All,

This is a slight modification of the earlier patch (Using a different constant 
in the mask creation.)

< +  HOST_WIDE_INT_M1 << bits));
---
> +  HOST_WIDE_INT_M1U << 
> bits));

Kind Regards,
Tamar


From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Tuesday, January 17, 2017 2:50:19 PM
To: GCC Patches; James Greenhalgh; Marcus Shawcroft; Richard Earnshaw
Cc: nd
Subject: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

Hi All,

This patch vectorizes the copysign builtin for AArch64
similar to how it is done for Arm.

AArch64 now generates:

...
.L4:
ldr q1, [x6, x3]
add w4, w4, 1
ldr q0, [x5, x3]
cmp w4, w7
bif v1.16b, v2.16b, v3.16b
fmulv0.2d, v0.2d, v1.2d
str q0, [x5, x3]

for the input:

 x * copysign(1.0, y)

On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%.
Regtested on  aarch64-none-linux-gnu and no regressions.

Ok for trunk?

gcc/
2017-01-17  Tamar Christina  

* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Added CASE_CFN_COPYSIGN.
* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-17  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
-rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
+rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 bool aarch64_simd_mem_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
 rtx aarch64_tls_get_addr (void);
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index d713d5d8b88837ec6f2dc51188fb252f8d5bc8bd..a67b7589e8badfbd0f13168557ef87e052eedcb1 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -151,6 +151,9 @@
   BUILTIN_VQN (TERNOP, raddhn2, 0)
   BUILTIN_VQN (TERNOP, rsubhn2, 0)
 
+  /* Implemented by copysign3.  */
+  BUILTIN_VHSDF (BINOP, copysign, 3)
+
   BUILTIN_VSQN_HSDI (UNOP, sqmovun, 0)
   /* Implemented by aarch64_qmovn.  */
   BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a12e2268ef9b023112f8d05db0a86957fee83273..b61f79a09462b8cecca7dd2cc4ac0eb4be2dbc79 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -338,6 +338,24 @@
   }
 )
 
+(define_expand "copysign3"
+  [(match_operand:VHSDF 0 "register_operand")
+   (match_operand:VHSDF 1 "register_operand")
+   (match_operand:VHSDF 2 "register_operand")]
+  "TARGET_FLOAT && TARGET_SIMD"
+{
+  rtx v_bitmask = gen_reg_rtx (mode);
+  int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+  emit_move_insn (v_bitmask,
+		  aarch64_simd_gen_const_vector_dup (mode,
+		 HOST_WIDE_INT_M1U << bits));
+  emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
+	 operands[2], operands[1]));
+  DONE;
+}
+)
+
 (define_insn "*aarch64_mul3_elt"
  [(set (match_operand:VMUL 0 "register_operand" "=w")
 (mult

Re: libgo patch committed: Update to Go1.8rc1

2017-01-19 Thread Rainer Orth
Hi Ian,

> On Mon, Jan 16, 2017 at 7:21 AM, Rainer Orth
>  wrote:
>>
>> I'm getting further on Solaris now, but the build still fails:
>
> I committed this patch to fix the remaining build problems on Solaris.
> Bootstrapped and ran some of the Go tests on i386-sun-solaris11 and
> x86_64-pc-linux-gnu.

worked fine, thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, Fortran, pr70696, v2] [Coarray] ICE on EVENT POST of host-associated EVENT_TYPE coarray

2017-01-19 Thread Andre Vehreschild
Hi Steve,

thanks for the review. Committed as r244637.

Regards,
Andre

On Thu, 19 Jan 2017 06:51:19 -0800
Steve Kargl  wrote:

> On Thu, Jan 19, 2017 at 01:07:50PM +0100, Andre Vehreschild wrote:
> > Hi all,
> > 
> > unfortunately triggered this patch a regression in the opencoarray's
> > testsuite, which also occurs outside of opencoarray, when a caf-function is
> > used in a block in the main-program. This patch fixes the error and adds a
> > testcase.
> > 
> > Bootstrapped and regtested ok on x86_64-linux/f25. Ok for trunk?
> >   
> 
> Yes.
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 06:09:35PM +0300, Alexander Monakov wrote:
> > -#ifdef __LP64__
> > +#if defined(__LP64__) || defined(_WIN64)
> > 
> > (is that the right define for 64-bit MingW, right?).
> 
> Yes, _WIN64; libsanitizer has a similar test.  Alternatively, I guess,
> 
>   #if __SIZEOF_POINTER__ == 8
> 
> > Otherwise, I think using uintptr_t is a problem, because we'd need to
> > #include  (the header only includes ).
> 
> Note that plugin-nvptx.c already includes .  But, anyway, I agree 
> that
> there's value in defining the exact type via the #if.

I've committed then.

2017-01-19  Jakub Jelinek  

* plugin/cuda/cuda.h (CUdeviceptr): Typedef to unsigned long long even
for _WIN64.

--- libgomp/plugin/cuda/cuda.h  (revision 244570)
+++ libgomp/plugin/cuda/cuda.h  (working copy)
@@ -35,7 +35,7 @@ libcuda.so.1 are not available.  */
 
 typedef void *CUcontext;
 typedef int CUdevice;
-#ifdef __LP64__
+#if defined(__LP64__) || defined(_WIN64)
 typedef unsigned long long CUdeviceptr;
 #else
 typedef unsigned CUdeviceptr;


Jakub


Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-19 Thread Tamar Christina

It seems I can drop even more:

gcc/
2017-01-19  Tamar Christina  

* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-19  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.

Tamar

From: Tamar Christina
Sent: Thursday, January 19, 2017 3:47:00 PM
To: GCC Patches; James Greenhalgh; Marcus Shawcroft; Richard Earnshaw
Cc: nd
Subject: Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

Hi All,

It seems the entry in config/aarch64/aarch64-builtins.c isn't needed, as such 
I've simplified the patch
and the changelog.

Ok for trunk?

Tamar


gcc/
2017-01-19  Tamar Christina  

* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-19  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Thursday, January 19, 2017 9:38:09 AM
To: GCC Patches; James Greenhalgh; Marcus Shawcroft; Richard Earnshaw
Cc: nd
Subject: Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

Hi All,

This is a slight modification of the earlier patch (Using a different constant 
in the mask creation.)

< +  HOST_WIDE_INT_M1 << bits));
---
> +  HOST_WIDE_INT_M1U << 
> bits));

Kind Regards,
Tamar


From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Tuesday, January 17, 2017 2:50:19 PM
To: GCC Patches; James Greenhalgh; Marcus Shawcroft; Richard Earnshaw
Cc: nd
Subject: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

Hi All,

This patch vectorizes the copysign builtin for AArch64
similar to how it is done for Arm.

AArch64 now generates:

...
.L4:
ldr q1, [x6, x3]
add w4, w4, 1
ldr q0, [x5, x3]
cmp w4, w7
bif v1.16b, v2.16b, v3.16b
fmulv0.2d, v0.2d, v1.2d
str q0, [x5, x3]

for the input:

 x * copysign(1.0, y)

On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%.
Regtested on  aarch64-none-linux-gnu and no regressions.

Ok for trunk?

gcc/
2017-01-17  Tamar Christina  

* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Added CASE_CFN_COPYSIGN.
* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-17  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
-rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
+rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 bool aarch64_simd_mem_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
 rtx aarch64_tls_get_addr (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a12e2268ef9b023112f8d05db0a86957fee83273..b61f79a09462b8cecca7dd2cc4ac0eb4be2dbc79 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -338,6 +338,24 @@
   }
 )
 
+(define_expand "copysign3"
+  [(match_operand:VHSDF 0 "register_operand")
+   (match_operand:VHSDF 1 "register_operand")
+   (match_operand:VHSDF 2 "register_operand")]
+  "TARGET_FLOAT && TARGET_SIMD"
+{
+  rtx v_bitmask = gen_reg_rtx (mode);
+  int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+  emit_move_insn (v_bitmask,
+		  aarch64_simd_gen_const_vector_dup (mode,
+		 HOST_WIDE_INT_M1U << bits));
+  emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitm

Re: C++ PATCH for c++/79130 (direct-initialization of arrays with decomposition)

2017-01-19 Thread Jason Merrill
On Thu, Jan 19, 2017 at 9:43 AM, Jason Merrill  wrote:
> Jakub pointed out that parenthesized decomposition of an array wasn't
> properly using direct-initialization.  Rather than pass the flags down
> into build_vec_init at this point in GCC 7 development, let's turn the
> initializer into something that build_vec_init recognizes as
> direct-initialization.

And another issue from Jakub's email.
commit 6f412aaccd57bcc7c9226516b008b088df2f86c6
Author: Jason Merrill 
Date:   Thu Jan 19 10:38:10 2017 -0500

Array decomposition fix.

* decl.c (check_initializer): Always use build_aggr_init for array
decomposition.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 75baf94..792ebcc 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6299,14 +6299,14 @@ check_initializer (tree decl, tree init, int flags, 
vec **cleanups)
   if (type == error_mark_node)
return NULL_TREE;
 
-  if ((type_build_ctor_call (type) || CLASS_TYPE_P (type)
-  || (DECL_DECOMPOSITION_P (decl) && TREE_CODE (type) == ARRAY_TYPE))
- && !(flags & LOOKUP_ALREADY_DIGESTED)
- && !(init && BRACE_ENCLOSED_INITIALIZER_P (init)
-  && CP_AGGREGATE_TYPE_P (type)
-  && (CLASS_TYPE_P (type)
-  || !TYPE_NEEDS_CONSTRUCTING (type)
-  || type_has_extended_temps (type
+  if (((type_build_ctor_call (type) || CLASS_TYPE_P (type))
+  && !(flags & LOOKUP_ALREADY_DIGESTED)
+  && !(init && BRACE_ENCLOSED_INITIALIZER_P (init)
+   && CP_AGGREGATE_TYPE_P (type)
+   && (CLASS_TYPE_P (type)
+   || !TYPE_NEEDS_CONSTRUCTING (type)
+   || type_has_extended_temps (type
+ || (DECL_DECOMPOSITION_P (decl) && TREE_CODE (type) == ARRAY_TYPE))
{
  init_code = build_aggr_init_full_exprs (decl, init, flags);
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp21.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp21.C
new file mode 100644
index 000..d046ed5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp21.C
@@ -0,0 +1,16 @@
+// { dg-options -std=c++1z }
+
+int a[3];
+struct S { int b, c, d; } s;
+void
+foo ()
+{
+  auto [ b, c, d ] = a;
+  auto [ e, f, g ] = s;
+  auto [ h, i, j ] { s };
+  auto [ k, l, m ] { s, };
+  auto [ n, o, p ] { a };
+  auto [ q, r, t ] ( s );
+  auto [ u, v, w ] ( s, );  // { dg-error "expected primary-expression 
before '.' token" }
+  auto [ x, y, z ] ( a );   // { dg-error "expression list treated as 
compound expression in initializer" "" { target *-*-* } .-1 }
+}


[PATCH] Make LTO's implementation of LANG_HOOKS_TYPE_FOR_SIZE the default

2017-01-19 Thread David Malcolm
On Thu, 2017-01-19 at 10:36 +0100, Richard Biener wrote:
> On Wed, Jan 18, 2017 at 10:45 PM, David Malcolm 
> wrote:
> > The jit testcase test-nested-loops.c was crashing.
> > 
> > Root cause is that deep inside loop optimization we're now exposing
> > this call within fold-const.c which wasn't being hit before:
> > 
> > 4082  /* Compute the mask to access the bitfield.  */
> > 4083  unsigned_type = lang_hooks.types.type_for_size
> > (*pbitsize, 1);
> > 
> > and the jit's implementation of LANG_HOOKS_TYPE_FOR_SIZE was a
> > placeholder that asserted it wasn't called.
> > 
> > This patch implements a proper LANG_HOOKS_TYPE_FOR_SIZE for jit,
> > by taking LTO's implementation.
> > 
> > Fixes test-nested-loops.c, along with the related failures in
> > test-combination.c and test-threads.c due to reusing the test.
> > 
> > This fixes all known failures in jit.sum, putting it at 8609
> > passes.
> > 
> > Committed to trunk as r244600.
> 
> I suppose we could instead make the lto hook the default (thus move
> it
> to langhooks.c as lhd_type_for_size).  Note similar issues may arise
> from type_for_mode?  Ah, I see you have that one...

The following patch does that.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk? 

gcc/jit/ChangeLog:
* dummy-frontend.c (jit_langhook_type_for_size): Delete.
(LANG_HOOKS_TYPE_FOR_SIZE): Don't redefine.

gcc/ChangeLog:
* langhooks-def.h (lhd_type_for_size): New decl.
(LANG_HOOKS_TYPE_FOR_SIZE): Define as lhd_type_for_size.
* langhooks.c (lhd_type_for_size): New function, taken from
lto_type_for_size.

gcc/lto/ChangeLog:
* lto-lang.c (builtin_type_for_size): Convert call to
lto_type_for_size to one through the langhook.
(lto_type_for_size): move to langhooks.c and rename to
lhd_type_for_size.
(LANG_HOOKS_TYPE_FOR_SIZE): Don't redefine.
---
 gcc/jit/dummy-frontend.c | 52 
 gcc/langhooks-def.h  |  2 ++
 gcc/langhooks.c  | 50 ++
 gcc/lto/lto-lang.c   | 56 +---
 4 files changed, 53 insertions(+), 107 deletions(-)

diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
index 4c7932b..87f583f 100644
--- a/gcc/jit/dummy-frontend.c
+++ b/gcc/jit/dummy-frontend.c
@@ -207,55 +207,6 @@ jit_langhook_type_for_mode (enum machine_mode mode, int 
unsignedp)
   return NULL;
 }
 
-/* Return an integer type with PRECISION bits of precision,
-   that is unsigned if UNSIGNEDP is nonzero, otherwise signed.  */
-
-static tree
-jit_langhook_type_for_size (unsigned precision, int unsignedp)
-{
-  int i;
-
-  if (precision == TYPE_PRECISION (integer_type_node))
-return unsignedp ? unsigned_type_node : integer_type_node;
-
-  if (precision == TYPE_PRECISION (signed_char_type_node))
-return unsignedp ? unsigned_char_type_node : signed_char_type_node;
-
-  if (precision == TYPE_PRECISION (short_integer_type_node))
-return unsignedp ? short_unsigned_type_node : short_integer_type_node;
-
-  if (precision == TYPE_PRECISION (long_integer_type_node))
-return unsignedp ? long_unsigned_type_node : long_integer_type_node;
-
-  if (precision == TYPE_PRECISION (long_long_integer_type_node))
-return unsignedp
-  ? long_long_unsigned_type_node
-  : long_long_integer_type_node;
-
-  for (i = 0; i < NUM_INT_N_ENTS; i ++)
-if (int_n_enabled_p[i]
-   && precision == int_n_data[i].bitsize)
-  return (unsignedp ? int_n_trees[i].unsigned_type
- : int_n_trees[i].signed_type);
-
-  if (precision <= TYPE_PRECISION (intQI_type_node))
-return unsignedp ? unsigned_intQI_type_node : intQI_type_node;
-
-  if (precision <= TYPE_PRECISION (intHI_type_node))
-return unsignedp ? unsigned_intHI_type_node : intHI_type_node;
-
-  if (precision <= TYPE_PRECISION (intSI_type_node))
-return unsignedp ? unsigned_intSI_type_node : intSI_type_node;
-
-  if (precision <= TYPE_PRECISION (intDI_type_node))
-return unsignedp ? unsigned_intDI_type_node : intDI_type_node;
-
-  if (precision <= TYPE_PRECISION (intTI_type_node))
-return unsignedp ? unsigned_intTI_type_node : intTI_type_node;
-
-  return NULL_TREE;
-}
-
 /* Record a builtin function.  We just ignore builtin functions.  */
 
 static tree
@@ -295,9 +246,6 @@ jit_langhook_getdecls (void)
 #undef LANG_HOOKS_TYPE_FOR_MODE
 #define LANG_HOOKS_TYPE_FOR_MODE   jit_langhook_type_for_mode
 
-#undef LANG_HOOKS_TYPE_FOR_SIZE
-#define LANG_HOOKS_TYPE_FOR_SIZE   jit_langhook_type_for_size
-
 #undef LANG_HOOKS_BUILTIN_FUNCTION
 #define LANG_HOOKS_BUILTIN_FUNCTIONjit_langhook_builtin_function
 
diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index db7587b..eb68084 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -52,6 +52,7 @@ extern void lhd_print_error_function (diagnostic_context *,
  cons

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Jakub Jelinek wrote:
> On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote:
> > > One of the problems with that is that it means that you can't easily turn
> > > addressable private variables into non-addressable ones once you force 
> > > them
> > > into such struct that can't be easily SRA split.
> > > In contrast, if you can get the variable flags/attributes work, if they
> > > become non-addressable (which is especially important to get rid of C++
> > > abstraction penalties), you simply don't add them into the specially
> > > allocated block.
> > 
> > I agree; I'd like to implement the approach with per-variable attributes 
> > once
> > it's clear how it ought to work (right now I'm not sure if placing CLOBBERs 
> > on
> > both entry and exit would be enough; if I understood correctly, Richard is
> > saying they might be moved, unless the middle-end is changed to prevent it).
> 
> I think we drop CLOBBERs in certain cases, though primarily those with
> MEM_REF on the lhs rather than just VAR_DECL, or even with VAR_DECL in EH
> optimizations if the clobbers are the sole thing in the EH pad.
> I think adding the abnormal edges would look safest to me, after all, before
> it is fully lowered it is kind like a loop, some threads in the warp might
> bypass it.  We also use abnormal edges for vfork etc.

In that case, Richard, Jakub, may I ask you to have another look at the
alternative approach I gave in response to Richard early in this subthread?  It
already enforces proper dependencies and doesn't commit data to one struct until
the very last moment.  So perhaps it would be simpler to integrate promotion of
non-addressable vars into there.  That is, if after inlining we have

  void *omp_simt = GOMP_SIMT_ENTER_BY_UID (simduid, dummy_size, dummy_align);

  // replacement for originally private or inlined 'int var1;'
  int *pvar1 = GOMP_SIMT_VAR_REF (omp_simt, simduid, uid_for_var1);

  for ( ... ) { ... }

  *pvar1 = CLOBBER;
  GOMP_SIMT_EXIT (omp_simt);

and we see that pvar1 is only used inside MEM_REFs, we can recreate 'int var1;'
(now that we're sure it's non-addressable) and replace *pvar1 accordingly.

Teaching the alias oracle that GOMP_SIMT_VAR_REF returns unique pointers should
be reasonable, right?

Thanks.
Alexander


RE: [PATCH, MIPS] Target flag and build option to disable indexed memory OPs.

2017-01-19 Thread Matthew Fortune
Hi Doug,

I've committed this on your behalf to get the testcases in and also
add the description of when this feature is required.  Thanks for the
patch.  Committed code inline below.

r244640

gcc/

PR target/78176
* config.gcc (supported_defaults): Add lxc1-sxc1.
(with_lxc1_sxc1): Add validation.
(all_defaults): Add lxc1-sxc1.
* config/mips/mips.opt (mlxc1-sxc1): New option.
* gcc/config/mips/mips.h (OPTION_DEFAULT_SPECS): Add a default for
mlxc1-sxc1.
(TARGET_CPU_CPP_BUILTINS): Add builtin_define for
__mips_no_lxc1_sxc1.
(ISA_HAS_LXC1_SXC1): Gate with mips_lxc1_sxc1.
* gcc/doc/invoke.texi (-mlxc1-sxc1): Document the new option.
* doc/install.texi (--with-lxc1-sxc1): Document the new option.

gcc/testsuite/

* gcc.target/mips/lxc1-sxc1-1.c: New file.
* gcc.target/mips/lxc1-sxc1-2.c: Likewise.
* gcc.target/mips/mips.exp (mips_option_groups): Add ghost option
HAS_LXC1.
(mips_option_groups): Add -m[no-]lxc1-sxc1.
(mips-dg-init): Detect default -mno-lxc1-sxc1.
(mips-dg-options): Handle HAS_LXC1 arch upgrade/downgrade.

Matthew

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244640 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   | 15 
 gcc/config.gcc  | 19 -
 gcc/config/mips/mips.h  |  8 +++-
 gcc/config/mips/mips.opt|  4 ++
 gcc/doc/install.texi| 19 +
 gcc/doc/invoke.texi |  6 +++
 gcc/testsuite/ChangeLog | 11 ++
 gcc/testsuite/gcc.target/mips/lxc1-sxc1-1.c | 60 +
 gcc/testsuite/gcc.target/mips/lxc1-sxc1-2.c | 60 +
 gcc/testsuite/gcc.target/mips/mips.exp  | 12 +-
 10 files changed, 209 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/lxc1-sxc1-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/lxc1-sxc1-2.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 20b703f..f933e1ad 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2017-01-19  Doug Gilmore  
+
+   PR target/78176
+   * config.gcc (supported_defaults): Add lxc1-sxc1.
+   (with_lxc1_sxc1): Add validation.
+   (all_defaults): Add lxc1-sxc1.
+   * config/mips/mips.opt (mlxc1-sxc1): New option.
+   * gcc/config/mips/mips.h (OPTION_DEFAULT_SPECS): Add a default for
+   mlxc1-sxc1.
+   (TARGET_CPU_CPP_BUILTINS): Add builtin_define for
+   __mips_no_lxc1_sxc1.
+   (ISA_HAS_LXC1_SXC1): Gate with mips_lxc1_sxc1.
+   * gcc/doc/invoke.texi (-mlxc1-sxc1): Document the new option.
+   * doc/install.texi (--with-lxc1-sxc1): Document the new option.
+
 2017-01-19  Richard Biener  
 
PR tree-optimization/72488
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 90308cd..dd8c08c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3940,7 +3940,7 @@ case "${target}" in
;;
 
mips*-*-*)
-   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci"
+   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1"
 
case ${with_float} in
"" | soft | hard)
@@ -4063,6 +4063,21 @@ case "${target}" in
exit 1
;;
esac
+
+   case ${with_lxc1_sxc1} in
+   yes)
+   with_lxc1_sxc1=lxc1-sxc1
+   ;;
+   no)
+   with_lxc1_sxc1=no-lxc1-sxc1
+   ;;
+   "")
+   ;;
+   *)
+   echo "Unknown lxc1-sxc1 type used in --with-lxc1-sxc1" 
1>&2
+   exit 1
+   ;;
+   esac
;;
 
nds32*-*-*)
@@ -4496,7 +4511,7 @@ case ${target} in
 esac
 
 t=
-all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls"
+all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1"
 for option in $all_defaults
 do
eval "val=\$with_"`echo $option | sed s/-/_/g`
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index fbd7011..4205589 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -637,6 +637,8 @@ struct mips_cpu_info {
\
   if (TARGET_CACHE_BUILTIN)
\
builtin_define ("__GCC_HAVE_BUILTIN_MIPS_CACHE");  

RE: [PATCH,gcc/MIPS] Make loongson3a use fused madd.d

2017-01-19 Thread Matthew Fortune
Hi Paul,

Your latest version of the patch is now committed.  I have been doing some
work on the recursive build failure but the issue is complex and involves
LRA so I went ahead with committing your change independently.  It also
turns out that (at least when targeting loongson3a) there are stage2/3
object comparison issues even after resolving the LRA bug with an initial
fix.

r244641

Thanks,
Matthew


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-19 Thread Joseph Myers
On Thu, 19 Jan 2017, Tamar Christina wrote:

> > The old code set orig_arg before converting IBM long double to double.
> > Your code sets it after the conversion.  The old code set min_exp based on
> > a string set from REAL_MODE_FORMAT (orig_mode)->emin - 1; your code uses
> > the adjusted mode.  Both of those are incorrect for IBM long double.
> 
> Hmm, this is correct, I've made the change to be like it was before, but
> there's something I don't quite get about the old code, if it's building rmin
> the orig_mode which is larger than mode, but then creating the real using
> build_real (type, rmin) using the adjusted type, shouldn't it not be getting
> truncated?

The value is a power of 2, which is *larger* than the minimum normal value 
for double (as they have the same least subnormal value).

Looking further at the code, my only remaining comments are for the cases 
where you aren't using the integer path: in is_normal you use LT_EXPR to 
compare with +Inf, but need to use __builtin_isless, likewise in 
is_subnormal again you should be using __builtin_isless and 
__builtin_isgreater, in is_finite you should be using 
__builtin_islessequal.  All the existing expressions will raise spurious 
"invalid" exceptions for quiet NaN arguments.  (I'm presuming that the 
output of these functions goes through folding that converts 
__builtin_isless to !UNGE_EXPR, etc.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2017-01-19 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 04:34:48PM +0100, Martin Liška wrote:
> Hello.
> 
> During bootstrap, I came to following test-case:
> 
> struct A
> {
>   int regno;
> };
> struct
> {
>   A base;
> } typedef *df_ref;
> int *a;
> void
> fn1 (int N)
> {
>   for (int i = 0; i < N; i++)
> {
>   df_ref b;
>   a[(b)->base.regno]++;
> }
> }

Well, in this case it is UB too, just not actually out of bounds access,
but use of uninitialized variable.
Perhaps what we should do, in addition to turning ASAN_MARK (POISON, &b, ...)
into b = ASAN_POISON (); turn ASAN_MARK (UNPOISON, &b, ...) into
b = b_YYY(D);
The following seems to do the job:
--- gcc/tree-ssa.c.jj   2017-01-19 17:20:15.0 +0100
+++ gcc/tree-ssa.c  2017-01-19 17:29:58.015356370 +0100
@@ -1911,7 +1911,16 @@ execute_update_addresses_taken (void)
gsi_replace (&gsi, call, GSI_SAME_STMT);
  }
else
- gsi_remove (&gsi, true);
+ {
+   /* In ASAN_MARK (UNPOISON, &b, ...) the variable
+  is uninitialized.  Avoid dependencies on
+  previous out of scope value.  */
+   tree clobber
+ = build_constructor (TREE_TYPE (var), NULL);
+   TREE_THIS_VOLATILE (clobber) = 1;
+   gimple *g = gimple_build_assign (var, clobber);
+   gsi_replace (&gsi, g, GSI_SAME_STMT);
+ }
continue;
  }
  }

Jakub


Re: [PATCH][AArch64] Purge leftover occurrences of aarch64_nopcrelative_literal_loads

2017-01-19 Thread James Greenhalgh
On Thu, Jan 19, 2017 at 12:17:40PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> The patch that renamed all uses of aarch64_nopcrelative_literal_loads into
> aarch64_pcrelative_literal_loads missed out its extern declaration in
> aarch64-protos.h and a couple of its uses in aarch64.md.
> The aarch64_nopcrelative_literal_loads doesn't get initialised anywhere
> (since it's unlinked from the command-line option handling machinery) so the
> code that uses it is bogus.
> 
> In any case, its use in the aarch64_reload_movcp and
> aarch64_reload_movcp expanders is redundant since they are
> only ever called through aarch64_secondary_reload which gates their use on
> !aarch64_pcrelative_literal_loads already. Since these are not standard
> names, their conditions don't actually matter in any way or checked at any
> point in the compilation AFAICS.

But more importantly, as it stands if this condition were ever not
eliminated there would be a link time error - so the code is dead and
dangerous and removing it is not concerning even at this stage in
development.

This patch is OK.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..17d8a89ef0ce58b28fc8fc4713edcc4b194bbc90
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -453,7 +453,6 @@ int aarch64_ccmp_mode_to_code (enum machine_mode mode);
>  bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
>  bool aarch64_operands_ok_for_ldpstp (rtx *, bool, enum machine_mode);
>  bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, enum machine_mode);
> -extern bool aarch64_nopcrelative_literal_loads;
>  
>  extern void aarch64_asm_output_pool_epilogue (FILE *, const char *,
> tree, HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 7aaebd230ddb702447dd4a5d1ba4ab05441cb10a..2b0c2cc01e72d635f85ce4c56be1407986377ab3
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5044,7 +5044,7 @@ (define_expand 
> "aarch64_reload_movcp"
>   [(set (match_operand:GPF_TF 0 "register_operand" "=w")
> (mem:GPF_TF (match_operand 1 "aarch64_constant_pool_symref" "S")))
>(clobber (match_operand:P 2 "register_operand" "=&r"))]
> - "TARGET_FLOAT && aarch64_nopcrelative_literal_loads"
> + "TARGET_FLOAT"
>   {
> aarch64_expand_mov_immediate (operands[2], XEXP (operands[1], 0));
> emit_move_insn (operands[0], gen_rtx_MEM (mode, 
> operands[2]));
> @@ -5057,7 +5057,7 @@ (define_expand "aarch64_reload_movcp"
>   [(set (match_operand:VALL 0 "register_operand" "=w")
> (mem:VALL (match_operand 1 "aarch64_constant_pool_symref" "S")))
>(clobber (match_operand:P 2 "register_operand" "=&r"))]
> - "TARGET_FLOAT && aarch64_nopcrelative_literal_loads"
> + "TARGET_FLOAT"
>   {
> aarch64_expand_mov_immediate (operands[2], XEXP (operands[1], 0));
> emit_move_insn (operands[0], gen_rtx_MEM (mode, operands[2]));



[PATCH, v2] (9e) Update "startwith" logic for pass-skipping to handle __RTL functions

2017-01-19 Thread David Malcolm
On Mon, 2017-01-16 at 14:42 -0700, Jeff Law wrote:
> On 01/09/2017 07:38 PM, David Malcolm wrote:
> > gcc/ChangeLog:
> > * passes.c: Include "insn-addr.h".
> > (should_skip_pass_p): Add logging.  Update logic for running
> > "expand" to be compatible with both __GIMPLE and __RTL.  Guard
> > property-provider override so it is only done for gimple
> > passes.
> > Don't skip dfinit.
> > (skip_pass): New function.
> > (execute_one_pass): Call skip_pass when skipping passes.
> > ---
> >  gcc/passes.c | 65
> > +---
> >  1 file changed, 58 insertions(+), 7 deletions(-)
> > 
> > diff --git a/gcc/passes.c b/gcc/passes.c
> > index 31262ed..6954d1e 100644
> > --- a/gcc/passes.c
> > +++ b/gcc/passes.c
> > @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not
> > see
> >  #include "cfgrtl.h"
> >  #include "tree-ssa-live.h"  /* For remove_unused_locals.  */
> >  #include "tree-cfgcleanup.h"
> > +#include "insn-addr.h" /* for INSN_ADDRESSES_ALLOC.  */
> insn-addr?  Yuk.
> 
> 
> > 
> >  using namespace gcc;
> > 
> > @@ -2315,26 +2316,73 @@ should_skip_pass_p (opt_pass *pass)
> >if (!cfun->pass_startwith)
> >  return false;
> > 
> > -  /* We can't skip the lowering phase yet -- ideally we'd
> > - drive that phase fully via properties.  */
> > -  if (!(cfun->curr_properties & PROP_ssa))
> > -return false;
> > + /* For __GIMPLE functions, we have to at least start when we
> > leave
> > + SSA.  */
> > +  if (pass->properties_destroyed & PROP_ssa)
> > +{
> > +  if (!quiet_flag)
> > +   fprintf (stderr, "starting anyway when leaving SSA: %s\n",
> > pass->name);
> > +  cfun->pass_startwith = NULL;
> > +  return false;
> > +}
> This seems to need a comment -- it's not obvious how destroying the
> SSA
> property maps to a pass that can not be skipped.

Added:

  /* For __GIMPLE functions, we have to at least start when we leave
 SSA.  Hence, we need to detect the "expand" pass, and stop skipping
 when we encounter it.  A cheap way to identify "expand" is it to
 detect the destruction of PROP_ssa.
 For __RTL functions, we invoke "rest_of_compilation" directly, which
 is after "expand", and hence we don't reach this conditional.  */

> > -  /* And also run any property provider.  */
> > -  if (pass->properties_provided != 0)
> > +  /* Run any property provider.  */
> > +  if (pass->type == GIMPLE_PASS
> > +  && pass->properties_provided != 0)
> >  return false;
> So comment needed here too.  I read this as "if a gimple pass
> provides a
> property then it should not be skipped.  Which means that an RTL pass
> that provides a property can?

Added:

  /* For GIMPLE passes, run any property provider (but continue skipping
 afterwards).
 We don't want to force running RTL passes that are property providers:
 "expand" is covered above, and the only pass other than "expand" that
 provides a property is "into_cfglayout" (PROP_cfglayout), which does
 too much for a dumped __RTL function.  */

...the problem being that into_cfglayout's execute vfunc calls
cfg_layout_initialize, which does a lot more that just
cfg_layout_rtl_register_cfg_hooks (the skip hack does just the latter).

> > +  /* Don't skip df init; later RTL passes need it.  */
> > +  if (strstr (pass->name, "dfinit") != NULL)
> > +return false;
> Which seems like a failing in RTL passes saying they need DF init.

There isn't a "PROP_df"; should there be?
Or is this hack accepable?

> > +/* Skip the given pass, for handling passes before "startwith"
> > +   in __GIMPLE and__RTL-marked functions.
> > +   In theory, this ought to be a no-op, but some of the RTL passes
> > +   need additional processing here.  */
> > +
> > +static void
> > +skip_pass (opt_pass *pass)
> ...
> This all feels like a failing in how we handle state in the RTL
> world.
> And I suspect it's prone to error.  Imagine if I'm hacking on
> something
> in the RTL world and my code depends on something else being set up. 
>   I
> really ought to have a way within my pass to indicate what I depend
> on.
> Having it hidden away in passes.c makes it easy to miss/forget.

Indeed, it's a hack.  I preferred the vfunc idea, but Richi prefers
to keep it all in one place.

> > +{
> > +  /* Pass "reload" sets the global "reload_completed", and many
> > + things depend on this (e.g. instructions in .md files).  */
> > +  if (strcmp (pass->name, "reload") == 0)
> > +reload_completed = 1;
> Seems like this ought to be a property provided by LRA/reload.

If we have a __RTL function with a "startwith" of a pass after reload,
we don't want to run "reload" when iterating through the pass list to
reach the start pass, since presumably it could change the insns.  So
if LRA/reload provide a property, say PROP_reload_completed, we'd still
need a way to *not* run reload, whilst setting the reload_completed
global.  So I don't think that a pro

Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-19 Thread Andrew Senkevich
2017-01-19 13:39 GMT+03:00 Kirill Yukhin :
> Hi Andrew,
> On 18 Jan 15:45, Andrew Senkevich wrote:
>> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek :
>> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
>> >> > I've played a bit w/ SDE. And looks like operands are not early clobber:
>> >> > TID0: INS 0x004003ee AVX512VEX kmovd k0, eax
>> >> > TID0:   k0 := _
>> >> > ...
>> >> > TID0: INS 0x004003f4 AVX512VEX kshiftlw k0, k0, 0x3
>> >> > TID0:   k0 := _fff8
>> >> >
>> >> > You can see that same dest and source works just fine.
>> >>
>> >> Hmm, I looked only on what ICC generates, and it was not correct way.
>> >
>> > I've just tried
>> > int
>> > main ()
>> > {
>> >   unsigned int a = 0x;
>> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" 
>> > : "=r" (a) : "r" (a) : "k6");
>> >   __builtin_printf ("%x\n", a);
>> >   return 0;
>> > }
>> > on KNL and got 0x.
>> > Are you going to report to the SDM authors so that they fix it up?
>> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
>> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
>> > at the end assigning DEST <- TEMP etc. would do.
>>
>> Yes, we will work on it.
>>
>> Attached patch refactored in part of builtints declarations and tests, is it 
>> Ok?
>
> Could you please add runtime tests for new intrinsics as well?

Attached with runtime tests.

gcc/
* config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
* config/i386/avx512dqintrin.h: Ditto.
* config/i386/avx512fintrin.h: Ditto.
* config/i386/i386-builtin-types.def: Add new types.
* gcc/config/i386/i386.c: Handle new types.
* config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
__builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
__builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
__builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
__builtin_ia32_kshiftridi): New.
* config/i386/sse.md (k): Rename *k.

gcc/testsuite/
* gcc.target/i386/avx512bw-kshiftld-1.c: New test.
* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
* gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
* gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
* gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
* gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftld-2.c: Ditto.
* gcc.target/i386/avx512bw-kshiftlq-2.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrd-2.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrq-2.c: Ditto.
* gcc.target/i386/avx512dq-kshiftlb-2.c: Ditto.
* gcc.target/i386/avx512dq-kshiftrb-2.c: Ditto.
* gcc.target/i386/avx512f-kshiftlw-2.c: Ditto.
* gcc.target/i386/avx512f-kshiftrw-2.c: Ditto.
* gcc.target/i386/avx-1.c: Test new intrinsics.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.


--
WBR,
Andrew


avx512-kmask-intrin-part4.patch
Description: Binary data


Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-19 Thread Marek Polacek
On Mon, Jan 09, 2017 at 02:39:30PM +0100, Marek Polacek wrote:
> On Mon, Jan 09, 2017 at 12:18:01PM +0100, Jakub Jelinek wrote:
> > On Mon, Jan 09, 2017 at 10:21:47AM +0100, Marek Polacek wrote:
> > > +/* Callback function to determine whether an expression TP or one of its
> > > +   subexpressions comes from macro expansion.  Used to suppress bogus
> > > +   warnings.  */
> > > +
> > > +static tree
> > > +expr_from_macro_expansion_r (tree *tp, int *, void *)
> > > +{
> > > +  if (CAN_HAVE_LOCATION_P (*tp)
> > > +  && from_macro_expansion_at (EXPR_LOCATION (*tp)))
> > > +return integer_zero_node;
> > > +
> > > +  return NULL_TREE;
> > > +}
> > 
> > I know this is hard issue, but won't it disable the warning way too often?
> > 
> > Perhaps it is good enough for the initial version (GCC 7), but doesn't it 
> > stop
> > whenever one uses NULL in the branches, or some other trivial macros like
> > that?  Perhaps we want to do the analysis if there is anything from macro
> > expansion side-by-side on both the expressions and if you find something
> > from a macro expansion, then still warn if both corresponding expressions
> > are from the same macro expansion (either only non-function like one, or
> > perhaps also function-like one with the same arguments, if it is possible
> > to figure out those somehow)?  And perhaps it would be nice to choose
> > warning level, whether you want to warn only under these rules (no macros
> > or something smarter if implemented) vs. some certainly non-default more
> > aggressive mode that will just warn no matter what macros there are.
> 
> I agree that not warning for 
>   if (foo)
> return NULL;
>   else
> return NULL;
> is bad.  But how can I compare those expressions side-by-side?  I'm not 
> finding
> anything. :(

Seems like ENOTIME to address this; will you be ok with the patch as-is
(modulo Jeff comments), if I open a PR about the above test case?

Thanks,

Marek


Re: [PR middle-end/79123] cast false positive in -Walloca-larger-than=

2017-01-19 Thread Martin Sebor

On 01/19/2017 05:45 AM, Richard Biener wrote:

On Thu, Jan 19, 2017 at 1:17 PM, Aldy Hernandez  wrote:

In the attached testcase, we have a clearly bounded case of alloca which is
being incorrectly reported:

void g (int *p, int *q)
{
   size_t n = (size_t)(p - q);

   if (n < 10)
 f (__builtin_alloca (n));
}

The problem is that VRP gives us an anti-range for `n' which may be out of
range:

  # RANGE ~[2305843009213693952, 16140901064495857663]
   n_9 = (long unsigned int) _4;

We do a less than stellar job with casts and VR_ANTI_RANGE's, mostly because
we're trying various heuristics to make up for the fact that we have crappy
range info from VRP.  More specifically, we're basically punting on an
VR_ANTI_RANGE and ignoring that the casted result (n_9) has a bound later
on.

Luckily, we already have code to check simple ranges coming into the alloca
by looking into all the basic blocks feeding it.  The attached patch delays
the final decision on anti ranges until we have examined the basic blocks
and determined for that we are definitely out of range.

I expect all this to disappear with Andrew's upcoming range info overhaul.

OK for trunk?


I _really_ wonder why all the range consuming warnings are not emitted
from VRP itself (like we do for -Warray-bounds).  There we'd still see
a range for the argument derived from the if () rather than needing to
do our own mini-VRP from the needessly "incomplete" range-info on
SSA vars.


Can you explain why the range info is only available in VRP and
not outside, via the get_range_info() API?  It sounds as though
the API shouldn't be relied on (it was virtually unused before
GCC 7).

To answer your question, the gimple-ssa-sprintf pass that depends
heavily on ranges would also benefit from having access to the data
computed in the strlen pass that's not available outside it.

The -Wstringop-overflow and -Walloc-size-larger-than warnings depend
on both VRP and tree-object-size.

I have been thinking about how to integrate these passes in GCC 8
to improve the overall quality.  (By integrating I don't necessarily
mean merging the code but rather having them share data or be able
to make calls into one another.)

I'd be very interested in your thoughts on this.

Thanks
Martin


Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 05:52:14PM +0100, Marek Polacek wrote:
> > I agree that not warning for 
> >   if (foo)
> > return NULL;
> >   else
> > return NULL;
> > is bad.  But how can I compare those expressions side-by-side?  I'm not 
> > finding
> > anything. :(
> 
> Seems like ENOTIME to address this; will you be ok with the patch as-is
> (modulo Jeff comments), if I open a PR about the above test case?

Yeah.

Jakub


Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-19 Thread James Greenhalgh
On Thu, Jan 19, 2017 at 03:55:36PM +, Tamar Christina wrote:
> 
> It seems I can drop even more:

The AArch64 parts of this look fine to me, and based on benchmarking of the
patch they are low risk for high reward (and other targets have had a
vectorized copysign for a while without issue).

However, the testsuite change looks wrong.

> diff --git a/gcc/testsuite/gcc.target/arm/vect-copysignf.c 
> b/gcc/testsuite/gcc.dg/vect/vect-copysignf.c
> similarity index 91%
> rename from gcc/testsuite/gcc.target/arm/vect-copysignf.c
> rename to gcc/testsuite/gcc.dg/vect/vect-copysignf.c
> index 
> 425f1b78af7b07be6929f9e5bc1118ca901bc9ce..dc961d0223399c6e7ee8209d22ca77f6d22dbd70
>  100644
> --- a/gcc/testsuite/gcc.target/arm/vect-copysignf.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-copysignf.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-require-effective-target arm_neon_hw { target { arm*-*-* } } } */
>  /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
>  /* { dg-add-options "arm_neon" } */

It is a bit redundant as add_options_for_arm_neon will check you are an ARM
target before doing anything, but this dg-add-options could be guarded by
{ target { arm*-*-* } } for clarity. Though, in the gcc.dg/vect/ directory
I'd be surprised if you needed this at all.

I see we have a check_effective_target test for 
{ target { vect_call_copysignf } } . It seems that would be most
appropriate for this test - then you can also drop the effective-target
arm_neon_hw test.

You might want to look at gcc.dg/vect/fast-math-bb-slp-call-1.c - It seems
that gives you most of what you are looking for from this test.

That will mean updating check_effective_target_vect_call_copysignf in
testsuite/lib/target-supports.exp .

Thanks,
James



Re: [committed] libitm: Disable TSX on processors on which it may be broken.

2017-01-19 Thread Uros Bizjak
On Wed, Jan 18, 2017 at 11:08 PM, Uros Bizjak  wrote:
> On Wed, Jan 18, 2017 at 10:48 PM, Uros Bizjak  wrote:
>> Hello!
>>
>>> This fix follows the same approach that glibc uses to disable TSX on
>>> processors on which it is broken.  TSX can also be disabled through a
>>> microcode update on these processors, but glibc consensus is that it
>>> cannot be detected reliably whether the microcode update has been
>>> applied.  Thus, we just look for affected models/steppings.
>>>
>>> Tested on x86_64-linux (but I don't have a machine with broken TSX
>>> available).
>>>
>>>libitm/ChangeLog
>>>
>>>* config/x86/target.h (htm_available): Add check for some processors
>>>on which TSX is broken.
>>
>> +  __cpuid (0, a, b, c, d);
>> +  if (b == 0x756e6547 && c == 0x6c65746e && d == 0x49656e69)
>>
>> You can use:
>>
>> #define signature_INTEL_ebx0x756e6547
>> #define signature_INTEL_ecx0x6c65746e
>> #define signature_INTEL_edx0x49656e69
>>
>> defines from cpuid.h here.
>
> Actually, just provide a non-NULL second argument to __get_cpuid_max.
> It will return %ebx from cpuid, which should be enough to detect Intel
> processor.


Attached is the patch I have committed to mainline SVN after a short
off-line discussion with Torvald.

2017-01-19  Uros Bizjak  

* config/x86/target.h (htm_available): Determine vendor from
__get_cpuid_max return.  Use signature_INTEL_ebx.  Cleanup.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/x86/target.h
===
--- config/x86/target.h (revision 244636)
+++ config/x86/target.h (working copy)
@@ -75,31 +75,32 @@ static inline bool
 htm_available ()
 {
   const unsigned cpuid_rtm = bit_RTM;
-  if (__get_cpuid_max (0, NULL) >= 7)
+  unsigned vendor;
+
+  if (__get_cpuid_max (0, &vendor) >= 7)
 {
   unsigned a, b, c, d;
-  /* TSX is broken on some processors.  This can be fixed by microcode,
+  unsigned family;
+
+  __cpuid (1, a, b, c, d);
+  family = (a >> 8) & 0x0f;
+  /* TSX is broken on some processors.  TSX can be disabled by microcode,
 but we cannot reliably detect whether the microcode has been
 updated.  Therefore, do not report availability of TSX on these
 processors.  We use the same approach here as in glibc (see
 https://sourceware.org/ml/libc-alpha/2016-12/msg00470.html).  */
-  __cpuid (0, a, b, c, d);
-  if (b == 0x756e6547 && c == 0x6c65746e && d == 0x49656e69)
+  if (vendor == signature_INTEL_ebx && family == 0x06)
{
- __cpuid (1, a, b, c, d);
- if (((a >> 8) & 0x0f) == 0x06) // Family.
-   {
- unsigned model = ((a >> 4) & 0x0f) // Model.
- + ((a >> 12) & 0xf0); // Extended model.
- unsigned stepping = a & 0x0f;
- if ((model == 0x3c)
- || (model == 0x45)
- || (model == 0x46)
- /* Xeon E7 v3 has correct TSX if stepping >= 4.  */
- || ((model == 0x3f) && (stepping < 4)))
-   return false;
-   }
+ unsigned model = ((a >> 4) & 0x0f) + ((a >> 12) & 0xf0);
+ unsigned stepping = a & 0x0f;
+ if (model == 0x3c
+ /* Xeon E7 v3 has correct TSX if stepping >= 4.  */
+ || (model == 0x3f && stepping < 4)
+ || model == 0x45
+ || model == 0x46)
+   return false;
}
+
   __cpuid_count (7, 0, a, b, c, d);
   if (b & cpuid_rtm)
return true;


Re: [PATCH] Make LTO's implementation of LANG_HOOKS_TYPE_FOR_SIZE the default

2017-01-19 Thread Richard Biener
On January 19, 2017 5:37:09 PM GMT+01:00, David Malcolm  
wrote:
>On Thu, 2017-01-19 at 10:36 +0100, Richard Biener wrote:
>> On Wed, Jan 18, 2017 at 10:45 PM, David Malcolm 
>> wrote:
>> > The jit testcase test-nested-loops.c was crashing.
>> > 
>> > Root cause is that deep inside loop optimization we're now exposing
>> > this call within fold-const.c which wasn't being hit before:
>> > 
>> > 4082  /* Compute the mask to access the bitfield.  */
>> > 4083  unsigned_type = lang_hooks.types.type_for_size
>> > (*pbitsize, 1);
>> > 
>> > and the jit's implementation of LANG_HOOKS_TYPE_FOR_SIZE was a
>> > placeholder that asserted it wasn't called.
>> > 
>> > This patch implements a proper LANG_HOOKS_TYPE_FOR_SIZE for jit,
>> > by taking LTO's implementation.
>> > 
>> > Fixes test-nested-loops.c, along with the related failures in
>> > test-combination.c and test-threads.c due to reusing the test.
>> > 
>> > This fixes all known failures in jit.sum, putting it at 8609
>> > passes.
>> > 
>> > Committed to trunk as r244600.
>> 
>> I suppose we could instead make the lto hook the default (thus move
>> it
>> to langhooks.c as lhd_type_for_size).  Note similar issues may arise
>> from type_for_mode?  Ah, I see you have that one...
>
>The following patch does that.
>
>Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>
>OK for trunk? 

OK.

Richard.

>gcc/jit/ChangeLog:
>   * dummy-frontend.c (jit_langhook_type_for_size): Delete.
>   (LANG_HOOKS_TYPE_FOR_SIZE): Don't redefine.
>
>gcc/ChangeLog:
>   * langhooks-def.h (lhd_type_for_size): New decl.
>   (LANG_HOOKS_TYPE_FOR_SIZE): Define as lhd_type_for_size.
>   * langhooks.c (lhd_type_for_size): New function, taken from
>   lto_type_for_size.
>
>gcc/lto/ChangeLog:
>   * lto-lang.c (builtin_type_for_size): Convert call to
>   lto_type_for_size to one through the langhook.
>   (lto_type_for_size): move to langhooks.c and rename to
>   lhd_type_for_size.
>   (LANG_HOOKS_TYPE_FOR_SIZE): Don't redefine.
>---
>gcc/jit/dummy-frontend.c | 52
>
> gcc/langhooks-def.h  |  2 ++
>gcc/langhooks.c  | 50
>++
>gcc/lto/lto-lang.c   | 56
>+---
> 4 files changed, 53 insertions(+), 107 deletions(-)
>
>diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
>index 4c7932b..87f583f 100644
>--- a/gcc/jit/dummy-frontend.c
>+++ b/gcc/jit/dummy-frontend.c
>@@ -207,55 +207,6 @@ jit_langhook_type_for_mode (enum machine_mode
>mode, int unsignedp)
>   return NULL;
> }
> 
>-/* Return an integer type with PRECISION bits of precision,
>-   that is unsigned if UNSIGNEDP is nonzero, otherwise signed.  */
>-
>-static tree
>-jit_langhook_type_for_size (unsigned precision, int unsignedp)
>-{
>-  int i;
>-
>-  if (precision == TYPE_PRECISION (integer_type_node))
>-return unsignedp ? unsigned_type_node : integer_type_node;
>-
>-  if (precision == TYPE_PRECISION (signed_char_type_node))
>-return unsignedp ? unsigned_char_type_node :
>signed_char_type_node;
>-
>-  if (precision == TYPE_PRECISION (short_integer_type_node))
>-return unsignedp ? short_unsigned_type_node :
>short_integer_type_node;
>-
>-  if (precision == TYPE_PRECISION (long_integer_type_node))
>-return unsignedp ? long_unsigned_type_node :
>long_integer_type_node;
>-
>-  if (precision == TYPE_PRECISION (long_long_integer_type_node))
>-return unsignedp
>- ? long_long_unsigned_type_node
>- : long_long_integer_type_node;
>-
>-  for (i = 0; i < NUM_INT_N_ENTS; i ++)
>-if (int_n_enabled_p[i]
>-  && precision == int_n_data[i].bitsize)
>-  return (unsignedp ? int_n_trees[i].unsigned_type
>-: int_n_trees[i].signed_type);
>-
>-  if (precision <= TYPE_PRECISION (intQI_type_node))
>-return unsignedp ? unsigned_intQI_type_node : intQI_type_node;
>-
>-  if (precision <= TYPE_PRECISION (intHI_type_node))
>-return unsignedp ? unsigned_intHI_type_node : intHI_type_node;
>-
>-  if (precision <= TYPE_PRECISION (intSI_type_node))
>-return unsignedp ? unsigned_intSI_type_node : intSI_type_node;
>-
>-  if (precision <= TYPE_PRECISION (intDI_type_node))
>-return unsignedp ? unsigned_intDI_type_node : intDI_type_node;
>-
>-  if (precision <= TYPE_PRECISION (intTI_type_node))
>-return unsignedp ? unsigned_intTI_type_node : intTI_type_node;
>-
>-  return NULL_TREE;
>-}
>-
> /* Record a builtin function.  We just ignore builtin functions.  */
> 
> static tree
>@@ -295,9 +246,6 @@ jit_langhook_getdecls (void)
> #undef LANG_HOOKS_TYPE_FOR_MODE
> #define LANG_HOOKS_TYPE_FOR_MODE  jit_langhook_type_for_mode
> 
>-#undef LANG_HOOKS_TYPE_FOR_SIZE
>-#define LANG_HOOKS_TYPE_FOR_SIZE  jit_langhook_type_for_size
>-
> #undef LANG_HOOKS_BUILTIN_FUNCTION
> #define LANG_HOOKS_BUILTIN_FUNCTION   jit_langhook_builtin_function
> 
>diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
>index 

RE: [patch mips/gcc] add build-time and runtime options to disable or set madd.fmt type

2017-01-19 Thread Matthew Fortune
Hi,

I've rewritten/simplified this patch as it provides far too much control
to end users who will undoubtedly shoot themselves in the foot so to
speak. The option I intend to support is simply --with-madd4 --without-madd4
and -mmadd4 -mno-madd4. This is a simple enable/disable on top of
architecture checks to use/not use the madd4 family of instructions.

We have to keep each of these unusual features simple so that we can somehow
reason about them in the future.

The patch below is still in test so may need further tweaks but I intend to
finish that and commit later on today.

Thanks,
Matthew

gcc/

* config.gcc (supported_defaults): Add madd4.
(with_madd4): Add validation.
(all_defaults): Add madd4.
* config/mips/mips.opt (mmadd4): New option.
* gcc/config/mips/mips.h (OPTION_DEFAULT_SPECS): Add a default for
mmadd4.
(TARGET_CPU_CPP_BUILTINS): Add builtin_define for
__mips_no_madd4.
(ISA_HAS_UNFUSED_MADD4): Gate with mips_madd4.
(ISA_HAS_FUSED_MADD4): Likewise.
* gcc/doc/invoke.texi (-mmadd4): Document the new option.
* doc/install.texi (--with-madd4): Document the new option.

gcc/testsuite/

* gcc.target/mips/madd4-1.c: New file.
* gcc.target/mips/madd4-2.c: Likewise.
* gcc.target/mips/mips.exp (mips_option_groups): Add ghost option
HAS_MADD4.
(mips_option_groups): Add -m[no-]madd4.
(mips-dg-init): Detect default -mno-madd4.
(mips-dg-options): Handle HAS_MADD4 arch upgrade/downgrade.
---
 gcc/ChangeLog   | 16 
 gcc/config.gcc  | 19 +--
 gcc/config/mips/mips.h  | 12 +---
 gcc/config/mips/mips.opt|  4 
 gcc/doc/install.texi| 14 ++
 gcc/doc/invoke.texi |  8 +++-
 gcc/testsuite/ChangeLog | 10 ++
 gcc/testsuite/gcc.target/mips/madd4-1.c | 14 ++
 gcc/testsuite/gcc.target/mips/madd4-2.c | 14 ++
 gcc/testsuite/gcc.target/mips/mips.exp  | 12 +++-
 gcc/testsuite/gcc.target/mips/nmadd-1.c |  2 +-
 gcc/testsuite/gcc.target/mips/nmadd-2.c |  2 +-
 gcc/testsuite/gcc.target/mips/nmadd-3.c |  2 +-
 13 files changed, 119 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/madd4-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/madd4-2.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e53f9e1..7496071 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,19 @@
+2017-01-19  Matthew Fortune  
+   Yunqiang Su  
+
+   * config.gcc (supported_defaults): Add madd4.
+   (with_madd4): Add validation.
+   (all_defaults): Add madd4.
+   * config/mips/mips.opt (mmadd4): New option.
+   * gcc/config/mips/mips.h (OPTION_DEFAULT_SPECS): Add a default for
+   mmadd4.
+   (TARGET_CPU_CPP_BUILTINS): Add builtin_define for
+   __mips_no_madd4.
+   (ISA_HAS_UNFUSED_MADD4): Gate with mips_madd4.
+   (ISA_HAS_FUSED_MADD4): Likewise.
+   * gcc/doc/invoke.texi (-mmadd4): Document the new option.
+   * doc/install.texi (--with-madd4): Document the new option.
+
 2017-01-19  Chenghua Xu  
 
* config/mips/mips.h (ISA_HAS_FUSED_MADD4): Enable for
diff --git a/gcc/config.gcc b/gcc/config.gcc
index dd8c08c..9e67d36 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3940,7 +3940,7 @@ case "${target}" in
;;
 
mips*-*-*)
-   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1"
+   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4"
 
case ${with_float} in
"" | soft | hard)
@@ -4078,6 +4078,21 @@ case "${target}" in
exit 1
;;
esac
+
+   case ${with_madd4} in
+   yes)
+   with_madd4=madd4
+   ;;
+   no)
+   with_madd4=no-madd4
+   ;;
+   "")
+   ;;
+   *)
+   echo "Unknown madd4 type used in --with-madd4" 1>&2
+   exit 1
+   ;;
+   esac
;;
 
nds32*-*-*)
@@ -4511,7 +4526,7 @@ case ${target} in
 esac
 
 t=
-all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1"
+all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4"
 for option in $all_defaults
 do
eval "val=\$with_

Re: [PATCH] BRIG frontend: request for a global review

2017-01-19 Thread Pekka Jääskeläinen
Hi Jakub and Richard,

Attached is an updated BRIG FE patch which adds the HSAIL related
builtins only internally in the BRIG FE. I didn't add LTO support as I
believe it's not
useful for BRIG FE due to it always inputting fully linked BRIGs and
not mixing with
other frontends.

BR,
Pekka



On Mon, Jan 16, 2017 at 11:07 AM, Jakub Jelinek  wrote:
> On Mon, Jan 16, 2017 at 09:46:43AM +0100, Richard Biener wrote:
>> There are 187 of them (well, simple grep of DEF_HSAIL, so probably a bit 
>> less).
>> They aren't really documented but I guess that __hsail_bitmask_u64 for 
>> example
>> is really equivalent to sth like -1U >> n << m?  So I'm not sure why
>> you have builtins
>> like these represened as functions rather than as "expanded" code sequences?
>>
>> If that's the ones you are talking about having special target
>> specific expansion.
>>
>> Note that builtins add to GCC startup times and if you don't expect
>> people to enable
>> BRIG then I wonder why you are submitting it for inclusion ;)
>
> I guess the question is when the DEF_HSAIL* builtins are actually needed.
> If the FE is separate from the other FEs, I guess it would be enough to
> define those builtins
> 1) in the BRIG FE
> 2) in tree-core.h
> 3) in lto1 (only if any such builtin has been seen in the IL
>
> So, perhaps define DEF_HSAIL* just to DEF_BUILTIN_STUB in builtins.def
> unless already defined, and override it in the BRIG FE where you create its
> builtins, and then have some routine in the middle-end similar to
> initialize_sanitizer_builtins which lazily initializes the DEF_HSAIL*
> builtins during LTO reading if a call to any of the builtins in the hsail
> range is noticed?
>
> Jakub


002-brig-fe-gcc.patch.gz
Description: GNU Zip compressed data


005-diff-to-previous.patch.gz
Description: GNU Zip compressed data


[testsuite] Fix gcc.dg/attr-alloc_size-4.c on i?86 (PR testsuite/79051)

2017-01-19 Thread Rainer Orth
As described in the PR, gcc.dg/attr-alloc_size-4.c was FAILing on 32-bit
x86 targets.  Fixed as follows by removing the mismatch between #if
conditions and target selectors.

Tested with the appropriate runtest invocations on i386-pc-solaris2.12
and amd64-pc-solaris2.12, installed on mainline.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-01-19  Rainer Orth  

PR testsuite/79051
* gcc.dg/attr-alloc_size-4.c (test_int_range) [__i386__ ||
__x86_64__]: Allow for target i?86-*-*.

# HG changeset patch
# Parent  a83e32258f0450b5c0538ef71d8c6230eb72180f
Further fix for PR testsuite/79051

diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-4.c b/gcc/testsuite/gcc.dg/attr-alloc_size-4.c
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-4.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-4.c
@@ -140,7 +140,7 @@ test_int_range (int n)
 
 #if __i386__ || __x86_64__
   /* Avoid failures described in bug 79051.  */
-  sink (f_int_1 (SAR (min + 2, 1235)));   /* { dg-warning "argument 1 range \\\[1236, \[0-9\]+\\\] exceeds maximum object size 1234" "" { target { x86_64-*-* } } } */
+  sink (f_int_1 (SAR (min + 2, 1235)));   /* { dg-warning "argument 1 range \\\[1236, \[0-9\]+\\\] exceeds maximum object size 1234" "" { target { i?86-*-* x86_64-*-* } } } */
 #endif
 
   sink (f_int_1 (SAR (0, max)));   /* { dg-warning "argument 1 range \\\[-\[0-9\]*, -1\\\] is negative" } */


Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-19 Thread Kirill Yukhin
On 19 Jan 19:42, Andrew Senkevich wrote:
> 2017-01-19 13:39 GMT+03:00 Kirill Yukhin :
> > Hi Andrew,
> > On 18 Jan 15:45, Andrew Senkevich wrote:
> >> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek :
> >> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
> >> >> > I've played a bit w/ SDE. And looks like operands are not early 
> >> >> > clobber:
> >> >> > TID0: INS 0x004003ee AVX512VEX kmovd k0, eax
> >> >> > TID0:   k0 := _
> >> >> > ...
> >> >> > TID0: INS 0x004003f4 AVX512VEX kshiftlw k0, k0, 
> >> >> > 0x3
> >> >> > TID0:   k0 := _fff8
> >> >> >
> >> >> > You can see that same dest and source works just fine.
> >> >>
> >> >> Hmm, I looked only on what ICC generates, and it was not correct way.
> >> >
> >> > I've just tried
> >> > int
> >> > main ()
> >> > {
> >> >   unsigned int a = 0x;
> >> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, 
> >> > %0" : "=r" (a) : "r" (a) : "k6");
> >> >   __builtin_printf ("%x\n", a);
> >> >   return 0;
> >> > }
> >> > on KNL and got 0x.
> >> > Are you going to report to the SDM authors so that they fix it up?
> >> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
> >> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
> >> > at the end assigning DEST <- TEMP etc. would do.
> >>
> >> Yes, we will work on it.
> >>
> >> Attached patch refactored in part of builtints declarations and tests, is 
> >> it Ok?
> >
> > Could you please add runtime tests for new intrinsics as well?
>
> Attached with runtime tests.
Great! Thanks. Patch is OK for main trunk.

--
Thanks, K
>
> gcc/
> * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
> * config/i386/avx512dqintrin.h: Ditto.
> * config/i386/avx512fintrin.h: Ditto.
> * config/i386/i386-builtin-types.def: Add new types.
> * gcc/config/i386/i386.c: Handle new types.
> * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
> __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
> __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
> __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
> __builtin_ia32_kshiftridi): New.
> * config/i386/sse.md (k): Rename *k.
>
> gcc/testsuite/
> * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
> * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
> * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
> * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
> * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
> * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftld-2.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftlq-2.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftrd-2.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftrq-2.c: Ditto.
> * gcc.target/i386/avx512dq-kshiftlb-2.c: Ditto.
> * gcc.target/i386/avx512dq-kshiftrb-2.c: Ditto.
> * gcc.target/i386/avx512f-kshiftlw-2.c: Ditto.
> * gcc.target/i386/avx512f-kshiftrw-2.c: Ditto.
> * gcc.target/i386/avx-1.c: Test new intrinsics.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
>
>
> --
> WBR,
> Andrew


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-19 Thread Tamar Christina
Hi Joseph,

I made the requested changes and did a quick pass over the rest
of the fp cases.

Regards,
Tamar


From: Joseph Myers 
Sent: Thursday, January 19, 2017 4:33:13 PM
To: Tamar Christina
Cc: Jeff Law; GCC Patches; Wilco Dijkstra; rguent...@suse.de; Michael Meissner; 
nd
Subject: Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers 
in GIMPLE.

On Thu, 19 Jan 2017, Tamar Christina wrote:

> > The old code set orig_arg before converting IBM long double to double.
> > Your code sets it after the conversion.  The old code set min_exp based on
> > a string set from REAL_MODE_FORMAT (orig_mode)->emin - 1; your code uses
> > the adjusted mode.  Both of those are incorrect for IBM long double.
>
> Hmm, this is correct, I've made the change to be like it was before, but
> there's something I don't quite get about the old code, if it's building rmin
> the orig_mode which is larger than mode, but then creating the real using
> build_real (type, rmin) using the adjusted type, shouldn't it not be getting
> truncated?

The value is a power of 2, which is *larger* than the minimum normal value
for double (as they have the same least subnormal value).

Looking further at the code, my only remaining comments are for the cases
where you aren't using the integer path: in is_normal you use LT_EXPR to
compare with +Inf, but need to use __builtin_isless, likewise in
is_subnormal again you should be using __builtin_isless and
__builtin_isgreater, in is_finite you should be using
__builtin_islessequal.  All the existing expressions will raise spurious
"invalid" exceptions for quiet NaN arguments.  (I'm presuming that the
output of these functions goes through folding that converts
__builtin_isless to !UNGE_EXPR, etc.)

--
Joseph S. Myers
jos...@codesourcery.com
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3ac2d44148440b124559ba7cd3de483b7a74b72d..d8ff9c70ae6b9e72e09b8cbd9a0bd41b6830b83e 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -160,7 +160,6 @@ static tree fold_builtin_0 (location_t, tree);
 static tree fold_builtin_1 (location_t, tree, tree);
 static tree fold_builtin_2 (location_t, tree, tree, tree);
 static tree fold_builtin_3 (location_t, tree, tree, tree, tree);
-static tree fold_builtin_varargs (location_t, tree, tree*, int);
 
 static tree fold_builtin_strpbrk (location_t, tree, tree, tree);
 static tree fold_builtin_strstr (location_t, tree, tree, tree);
@@ -2202,19 +2201,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   switch (DECL_FUNCTION_CODE (fndecl))
 {
 CASE_FLT_FN (BUILT_IN_ILOGB):
-  errno_set = true; builtin_optab = ilogb_optab; break;
-CASE_FLT_FN (BUILT_IN_ISINF):
-  builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
-case BUILT_IN_ISFINITE:
-CASE_FLT_FN (BUILT_IN_FINITE):
-case BUILT_IN_FINITED32:
-case BUILT_IN_FINITED64:
-case BUILT_IN_FINITED128:
-case BUILT_IN_ISINFD32:
-case BUILT_IN_ISINFD64:
-case BUILT_IN_ISINFD128:
-  /* These builtins have no optabs (yet).  */
+  errno_set = true;
+  builtin_optab = ilogb_optab;
   break;
 default:
   gcc_unreachable ();
@@ -2233,8 +2221,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 }
 
 /* Expand a call to one of the builtin math functions that operate on
-   floating point argument and output an integer result (ilogb, isinf,
-   isnan, etc).
+   floating point argument and output an integer result (ilogb, etc).
Return 0 if a normal call should be emitted rather than expanding the
function in-line.  EXP is the expression that is a call to the builtin
function; if convenient, the result should be placed in TARGET.  */
@@ -5997,11 +5984,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 CASE_FLT_FN (BUILT_IN_ILOGB):
   if (! flag_unsafe_math_optimizations)
 	break;
-  gcc_fallthrough ();
-CASE_FLT_FN (BUILT_IN_ISINF):
-CASE_FLT_FN (BUILT_IN_FINITE):
-case BUILT_IN_ISFINITE:
-case BUILT_IN_ISNORMAL:
+
   target = expand_builtin_interclass_mathfn (exp, target);
   if (target)
 	return target;
@@ -6281,8 +6264,25 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	}
   break;
 
+CASE_FLT_FN (BUILT_IN_ISINF):
+case BUILT_IN_ISNAND32:
+case BUILT_IN_ISNAND64:
+case BUILT_IN_ISNAND128:
+case BUILT_IN_ISNAN:
+case BUILT_IN_ISINFD32:
+case BUILT_IN_ISINFD64:
+case BUILT_IN_ISINFD128:
+case BUILT_IN_ISNORMAL:
+case BUILT_IN_ISZERO:
+case BUILT_IN_ISSUBNORMAL:
+case BUILT_IN_FPCLASSIFY:
 case BUILT_IN_SETJMP:
-  /* This should have been lowered to the builtins below.  */
+CASE_FLT_FN (BUILT_IN_FINITE):
+case BUILT_IN_FINITED32:
+case BUILT_IN_FINITED64:
+case BUILT_IN_FINITED128:
+case BUILT_IN_ISFINITE:
+  /* These should have been lowered to the builtins in gimple-low.c.  */
   gcc_unreachable ();
 
 case

Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-19 Thread Tamar Christina
Hi James,

I have corrected the testsuite changes and attached is the new file and 
changelog.

Ok for trunk?

Tamar

Hi All,

This patch vectorizes the copysign builtin for Aarch64
similar to how it is done for Arm.

AArch64 now generates:

...
.L4:
ldr q1, [x6, x3]
add w4, w4, 1
ldr q0, [x5, x3]
cmp w4, w7
bif v1.16b, v2.16b, v3.16b
fmulv0.2d, v0.2d, v1.2d
str q0, [x5, x3]

for the input:

 x * copysign(1.0, y)

On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%.
Regtested on  aarch64-none-linux-gnu and arm-none-eabi and no regressions.

Ok for trunk?

gcc/
2017-01-19  Tamar Christina  

* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Change int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd.md: Add copysign3.

gcc/testsuite/
2017-01-19  Tamar Christina  

* gcc/testsuite/lib/target-supports.exp
(check_effective_target_vect_call_copysignf): Enable for AArch64.

From: James Greenhalgh 
Sent: Thursday, January 19, 2017 4:58:09 PM
To: Tamar Christina
Cc: GCC Patches; Marcus Shawcroft; Richard Earnshaw; nd
Subject: Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

On Thu, Jan 19, 2017 at 03:55:36PM +, Tamar Christina wrote:
>
> It seems I can drop even more:

The AArch64 parts of this look fine to me, and based on benchmarking of the
patch they are low risk for high reward (and other targets have had a
vectorized copysign for a while without issue).

However, the testsuite change looks wrong.

> diff --git a/gcc/testsuite/gcc.target/arm/vect-copysignf.c 
> b/gcc/testsuite/gcc.dg/vect/vect-copysignf.c
> similarity index 91%
> rename from gcc/testsuite/gcc.target/arm/vect-copysignf.c
> rename to gcc/testsuite/gcc.dg/vect/vect-copysignf.c
> index 
> 425f1b78af7b07be6929f9e5bc1118ca901bc9ce..dc961d0223399c6e7ee8209d22ca77f6d22dbd70
>  100644
> --- a/gcc/testsuite/gcc.target/arm/vect-copysignf.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-copysignf.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-require-effective-target arm_neon_hw { target { arm*-*-* } } } */
>  /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
>  /* { dg-add-options "arm_neon" } */

It is a bit redundant as add_options_for_arm_neon will check you are an ARM
target before doing anything, but this dg-add-options could be guarded by
{ target { arm*-*-* } } for clarity. Though, in the gcc.dg/vect/ directory
I'd be surprised if you needed this at all.

I see we have a check_effective_target test for
{ target { vect_call_copysignf } } . It seems that would be most
appropriate for this test - then you can also drop the effective-target
arm_neon_hw test.

You might want to look at gcc.dg/vect/fast-math-bb-slp-call-1.c - It seems
that gives you most of what you are looking for from this test.

That will mean updating check_effective_target_vect_call_copysignf in
testsuite/lib/target-supports.exp .

Thanks,
James

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
-rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
+rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 bool aarch64_simd_mem_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
 rtx aarch64_tls_get_addr (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a12e2268ef9b023112f8d05db0a86957fee83273..b61f79a09462b8cecca7dd2cc4ac0eb4be2dbc79 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -338,6 +338,24 @@
   }
 )
 
+(define_expand "copysign3"
+  [(match_operand:VHSDF 0 "register_operand")
+   (match_operand:VHSDF 1 "register_operand")
+   (match_operand:VHSDF 2 "register_operand")]
+  "TARGET_FLOAT && TARGET_SIMD"
+{
+  rtx v_bitmask = gen_reg_rtx (mode);
+  int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+  emit_move_insn (v_bitmask,
+		  aarch64_simd_gen_const_vector_dup (mode,
+		 HOST_WIDE_INT_M1U << bits));
+  emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
+	 operands[2], operands[1]));
+  DONE;
+}
+)
+
 (define_insn "*aarch64_mul3_elt"
  [(set (match_operand:VMUL 0 "register_operand" "=w")
 (mult:VMUL
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0cf7d12186af3e05ba8742af5a03425f61f51754..1a69605db5d2a4a0efb

Re: [PATCH] C++: fix fix-it hints for misspellings within explicit namespaces (v2)

2017-01-19 Thread Jason Merrill
On Wed, Jan 18, 2017 at 5:29 PM, David Malcolm  wrote:
> Here's a version of the patch which simply tweaks
> cp_parser_primary_expression's call to finish_id_expression so that
> it passes the location of the id_expression, rather than that of
> id_expr_token.
>
> The id_expression in question came from cp_parser_id_expression,
> whereas the id_expr_token is the first token within the id-expression.
>
> The location passed here to finish_id_expression only affects:
> the location used for name-lookup errors, and for the resulting
> decl cp_expr.  Given that the following code immediately does this:
> decl.set_location (id_expr_token->location);

What happens if we use id_expression.get_location() here, too?

OK.

Jason


Re: [PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-19 Thread James Greenhalgh
On Thu, Jan 19, 2017 at 06:05:52PM +, Tamar Christina wrote:
> Hi James,
> 
> I have corrected the testsuite changes and attached is the new file and 
> changelog.
> 
> Ok for trunk?
> 
> Tamar
> 
> Hi All,
> 
> This patch vectorizes the copysign builtin for Aarch64
> similar to how it is done for Arm.
> 
> AArch64 now generates:
> 
> ...
> .L4:
> ldr q1, [x6, x3]
> add w4, w4, 1
> ldr q0, [x5, x3]
> cmp w4, w7
> bif v1.16b, v2.16b, v3.16b
> fmulv0.2d, v0.2d, v1.2d
> str q0, [x5, x3]
> 
> for the input:
> 
>  x * copysign(1.0, y)
> 
> On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%.
> Regtested on  aarch64-none-linux-gnu and arm-none-eabi and no regressions.
> 
> Ok for trunk?

OK. I think this is now suitably minimal (and safe) for the last
day of Stage 3.

Thanks,
James

> gcc/
> 2017-01-19  Tamar Christina  
> 
> * config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
> Change int to HOST_WIDE_INT.
> * config/aarch64/aarch64-protos.h
> (aarch64_simd_gen_const_vector_dup): Likewise.
> * config/aarch64/aarch64-simd.md: Add copysign3.
> 
> gcc/testsuite/
> 2017-01-19  Tamar Christina  
> 
> * gcc/testsuite/lib/target-supports.exp
> (check_effective_target_vect_call_copysignf): Enable for AArch64.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void);
>  rtx aarch64_mask_from_zextract_ops (rtx, rtx);
>  const char *aarch64_output_move_struct (rtx *operands);
>  rtx aarch64_return_addr (int, rtx);
> -rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
> +rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
>  bool aarch64_simd_mem_operand_p (rtx);
>  rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
>  rtx aarch64_tls_get_addr (void);
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> a12e2268ef9b023112f8d05db0a86957fee83273..b61f79a09462b8cecca7dd2cc4ac0eb4be2dbc79
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -338,6 +338,24 @@
>}
>  )
>  
> +(define_expand "copysign3"
> +  [(match_operand:VHSDF 0 "register_operand")
> +   (match_operand:VHSDF 1 "register_operand")
> +   (match_operand:VHSDF 2 "register_operand")]
> +  "TARGET_FLOAT && TARGET_SIMD"
> +{
> +  rtx v_bitmask = gen_reg_rtx (mode);
> +  int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
> +
> +  emit_move_insn (v_bitmask,
> +   aarch64_simd_gen_const_vector_dup (mode,
> +  HOST_WIDE_INT_M1U << 
> bits));
> +  emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
> +  operands[2], operands[1]));
> +  DONE;
> +}
> +)
> +
>  (define_insn "*aarch64_mul3_elt"
>   [(set (match_operand:VMUL 0 "register_operand" "=w")
>  (mult:VMUL
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 0cf7d12186af3e05ba8742af5a03425f61f51754..1a69605db5d2a4a0efb8c9f97a019de9dded40eb
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11244,14 +11244,16 @@ aarch64_mov_operand_p (rtx x, machine_mode mode)
>  
>  /* Return a const_int vector of VAL.  */
>  rtx
> -aarch64_simd_gen_const_vector_dup (machine_mode mode, int val)
> +aarch64_simd_gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
>  {
>int nunits = GET_MODE_NUNITS (mode);
>rtvec v = rtvec_alloc (nunits);
>int i;
>  
> +  rtx cache = GEN_INT (val);
> +
>for (i=0; i < nunits; i++)
> -RTVEC_ELT (v, i) = GEN_INT (val);
> +RTVEC_ELT (v, i) = cache;
>  
>return gen_rtx_CONST_VECTOR (mode, v);
>  }
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> b88d13c13f277e8cdb88b5dc8545ffa01408a0fa..12dbf475e31933cff781c2f9e9c1cfbe2ce108bb
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6158,7 +6158,8 @@ proc check_effective_target_vect_call_copysignf { } {
>  } else {
>   set et_vect_call_copysignf_saved($et_index) 0
>   if { [istarget i?86-*-*] || [istarget x86_64-*-*]
> -  || [istarget powerpc*-*-*] } {
> +  || [istarget powerpc*-*-*]
> +  || [istarget aarch64*-*-*] } {
>  set et_vect_call_copysignf_saved($et_index) 1
>   }
>  }



Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-19 Thread Joseph Myers
On Thu, 19 Jan 2017, Tamar Christina wrote:

> Hi Joseph,
> 
> I made the requested changes and did a quick pass over the rest
> of the fp cases.

I've no further comments, but watch out for any related test failures 
being reported.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] PR67085 move comparison functions in heap operations

2017-01-19 Thread Jonathan Wakely

This turns lots and lots of copies into moves, which can make a huge
difference to performance (consider a std::function which has to
allocate on every copy).

PR libstdc++/67085
* include/bits/stl_heap.h (push_heap, __adjust_heap, __pop_heap)
(pop_heap, __make_heap, make_heap, __sort_heap, sort_heap): Use
_GLIBCXX_MOVE when passing comparison function to other functions.
(is_heap_until, is_heap): Use std::move when passing comparison
function.
* testsuite/23_containers/priority_queue/67085.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 761493c31343d47689150fc22d029adc0c568caa
Author: Jonathan Wakely 
Date:   Thu Jan 19 17:58:54 2017 +

PR67085 move comparison functions in heap operations

PR libstdc++/67085
* include/bits/stl_heap.h (push_heap, __adjust_heap, __pop_heap)
(pop_heap, __make_heap, make_heap, __sort_heap, sort_heap): Use
_GLIBCXX_MOVE when passing comparison function to other functions.
(is_heap_until, is_heap): Use std::move when passing comparison
function.
* testsuite/23_containers/priority_queue/67085.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_heap.h 
b/libstdc++-v3/include/bits/stl_heap.h
index 7d1d6f2..c82ce77 100644
--- a/libstdc++-v3/include/bits/stl_heap.h
+++ b/libstdc++-v3/include/bits/stl_heap.h
@@ -200,7 +200,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _ValueType __value = _GLIBCXX_MOVE(*(__last - 1));
   std::__push_heap(__first, _DistanceType((__last - __first) - 1),
   _DistanceType(0), _GLIBCXX_MOVE(__value),
-  __gnu_cxx::__ops::__iter_comp_val(__comp));
+  __gnu_cxx::__ops::
+  __iter_comp_val(_GLIBCXX_MOVE(__comp)));
 }
 
   template
@@ -246,7 +248,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   *__result = _GLIBCXX_MOVE(*__first);
   std::__adjust_heap(__first, _DistanceType(0),
 _DistanceType(__last - __first),
-_GLIBCXX_MOVE(__value), __comp);
+_GLIBCXX_MOVE(__value), _GLIBCXX_MOVE(__comp));
 }
 
   /**
@@ -310,7 +312,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  --__last;
  std::__pop_heap(__first, __last, __last,
- __gnu_cxx::__ops::__iter_comp_iter(__comp));
+ __gnu_cxx::__ops::
+ __iter_comp_iter(_GLIBCXX_MOVE(__comp)));
}
 }
 
@@ -333,7 +336,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  _ValueType __value = _GLIBCXX_MOVE(*(__first + __parent));
  std::__adjust_heap(__first, __parent, __len, _GLIBCXX_MOVE(__value),
-__comp);
+_GLIBCXX_MOVE(__comp));
  if (__parent == 0)
return;
  __parent--;
@@ -386,7 +389,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_requires_irreflexive_pred(__first, __last, __comp);
 
   std::__make_heap(__first, __last,
-  __gnu_cxx::__ops::__iter_comp_iter(__comp));
+  __gnu_cxx::__ops::
+  __iter_comp_iter(_GLIBCXX_MOVE(__comp)));
 }
 
   template
@@ -397,7 +401,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   while (__last - __first > 1)
{
  --__last;
- std::__pop_heap(__first, __last, __last, __comp);
+ std::__pop_heap(__first, __last, __last, _GLIBCXX_MOVE(__comp));
}
 }
 
@@ -449,7 +453,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_requires_heap_pred(__first, __last, __comp);
 
   std::__sort_heap(__first, __last,
-  __gnu_cxx::__ops::__iter_comp_iter(__comp));
+  __gnu_cxx::__ops::
+  __iter_comp_iter(_GLIBCXX_MOVE(__comp)));
 }
 
 #if __cplusplus >= 201103L
@@ -504,7 +509,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   return __first
+ std::__is_heap_until(__first, std::distance(__first, __last),
-  __gnu_cxx::__ops::__iter_comp_iter(__comp));
+  __gnu_cxx::__ops::
+  __iter_comp_iter(std::move(__comp)));
 }
 
   /**
@@ -531,7 +537,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline bool
 is_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
_Compare __comp)
-{ return std::is_heap_until(__first, __last, __comp) == __last; }
+{
+  return std::is_heap_until(__first, __last, std::move(__comp))
+   == __last;
+}
 #endif
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc 
b/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc
new file mode 100644
index 000..5a3ca32
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc
@@ -0,0 +1,46 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+/

Re: [PATCH] PR67085 move comparison functions in heap operations

2017-01-19 Thread Jonathan Wakely

On 19/01/17 18:28 +, Jonathan Wakely wrote:

@@ -397,7 +401,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  while (__last - __first > 1)
{
  --__last;
- std::__pop_heap(__first, __last, __last, __comp);
+ std::__pop_heap(__first, __last, __last, _GLIBCXX_MOVE(__comp));


Oops, we can't move from the functor in a loop, it might be invalid
after the first move. Fix coming asap.


Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-19 Thread Richard Earnshaw (lists)
On 19/01/17 14:46, Jiong Wang wrote:
> Thanks for the review.
> 
> On 19/01/17 14:18, Richard Earnshaw (lists) wrote:
>>
>>>
>>>
>>> diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
>>> index
>>> 8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2
>>> 100644
>>> --- a/libgcc/unwind-dw2.c
>>> +++ b/libgcc/unwind-dw2.c
>>> @@ -136,6 +136,8 @@ struct _Unwind_Context
>>>  #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
>>>/* Context which has version/args_size/by_value fields.  */
>>>  #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
>>> +  /* Bit reserved on AArch64, return address has been signed with A
>>> key.  */
>>> +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)
>>
>> Why is this here?   It appears to only be used within the
>> AArch64-specific header file.
> 
> I was putting it here so that when we allocate the next general purpose
> bit, we
> know clearly that bit 3 is allocated to AArch64 already, and the new
> general bit
> needs to go to the next one.  This can avoid bit collision.
> 

Fair enough.
>>
>>> ...
>>>
>>> +/* Frob exception handler's address kept in TARGET before installing
>>> into
>>> +   CURRENT context.  */
>>> +
>>> +static void *
>>> +uw_frob_return_addr (struct _Unwind_Context *current,
>>> + struct _Unwind_Context *target)
>>> +{
>>> +  void *ret_addr = __builtin_frob_return_addr (target->ra);
>>> +#ifdef MD_POST_FROB_EH_HANDLER_ADDR
>>> +  ret_addr = MD_POST_FROB_EH_HANDLER_ADDR (current, target, ret_addr);
>>> +#endif
>>> +  return ret_addr;
>>> +}
>>> +
>>
>> I think this function should be marked inline.  The optimizers would
>> probably inline it anyway, but it seems wrong for us to rely on that.
> 
> Thanks, fixed.
> 
> Does the updated patch looks OK to you know?
> 
> libgcc/
> 
> 2017-01-19  Jiong Wang  
> 
> * config/aarch64/aarch64-unwind.h: New file.
> (DWARF_REGNUM_AARCH64_RA_STATE): Define.
> (MD_POST_EXTRACT_ROOT_ADDR): Define.
> (MD_POST_EXTRACT_FRAME_ADDR): Define.
> (MD_POST_FROB_EH_HANDLER_ADDR): Define.
> (MD_FROB_UPDATE_CONTEXT): Define.
> (aarch64_post_extract_frame_addr): New function.
> (aarch64_post_frob_eh_handler_addr): New function.
> (aarch64_frob_update_context): New function.
> * config/aarch64/linux-unwind.h: Include aarch64-unwind.h
> * config.host (aarch64*-*-elf, aarch64*-*-rtems*,
> aarch64*-*-freebsd*):
> Initialize md_unwind_header to include aarch64-unwind.h.
> * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
> (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for
> __aarch64__.
> (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
> (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR.
> (uw_frob_return_addr): New function.
> (_Unwind_DebugHook): Use uw_frob_return_addr.
> 

OK.

R.


Re: [PATCH] PR67085 move comparison functions in heap operations

2017-01-19 Thread Jonathan Wakely

On 19/01/17 18:50 +, Jonathan Wakely wrote:

On 19/01/17 18:28 +, Jonathan Wakely wrote:

@@ -397,7 +401,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 while (__last - __first > 1)
{
  --__last;
- std::__pop_heap(__first, __last, __last, __comp);
+ std::__pop_heap(__first, __last, __last, _GLIBCXX_MOVE(__comp));


Oops, we can't move from the functor in a loop, it might be invalid
after the first move. Fix coming asap.


Unfortunately this adds some more copies to my testcase. I have
another idea how to avoid that though.

This also adds an extra (safe) move in __is_heap.

Tested powerpc64le-linux, committed to trunk.

commit 41125d036d08d1fdc47c8c0d2ef280a0f8db4f89
Author: Jonathan Wakely 
Date:   Thu Jan 19 19:03:35 2017 +

Fix unsafe moves inside loops

	PR libstdc++/67085
	* include/bits/stl_heap.h (__is_heap): Use _GLIBCXX_MOVE.
	(__make_heap, __sort_heap): Don't use _GLIBCXX_MOVE inside loops.
	* testsuite/23_containers/priority_queue/67085.cc: Adjust expected
	number of copies.
	* testsuite/25_algorithms/make_heap/movable.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_heap.h b/libstdc++-v3/include/bits/stl_heap.h
index c82ce77..b19c9f4 100644
--- a/libstdc++-v3/include/bits/stl_heap.h
+++ b/libstdc++-v3/include/bits/stl_heap.h
@@ -113,7 +113,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline bool
 __is_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
 	  _Compare __comp)
-{ return std::__is_heap(__first, __comp, std::distance(__first, __last)); }
+{
+  return std::__is_heap(__first, _GLIBCXX_MOVE(__comp),
+			std::distance(__first, __last));
+}
 
   // Heap-manipulation functions: push_heap, pop_heap, make_heap, sort_heap,
   // + is_heap and is_heap_until in C++0x.
@@ -336,7 +339,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	{
 	  _ValueType __value = _GLIBCXX_MOVE(*(__first + __parent));
 	  std::__adjust_heap(__first, __parent, __len, _GLIBCXX_MOVE(__value),
-			 _GLIBCXX_MOVE(__comp));
+			 __comp);
 	  if (__parent == 0)
 	return;
 	  __parent--;
@@ -401,7 +404,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   while (__last - __first > 1)
 	{
 	  --__last;
-	  std::__pop_heap(__first, __last, __last, _GLIBCXX_MOVE(__comp));
+	  std::__pop_heap(__first, __last, __last, __comp);
 	}
 }
 
diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc b/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc
index 5a3ca32..4ccea30 100644
--- a/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc
+++ b/libstdc++-v3/testsuite/23_containers/priority_queue/67085.cc
@@ -34,9 +34,9 @@ test01()
 {
   int v[] = {1, 2, 3, 4};
   std::priority_queue, CopyCounter> q{v, v+4};
-  VERIFY(count == 2);
+  VERIFY(count == 4);
   q.push(1);
-  VERIFY(count == 3);
+  VERIFY(count == 5);
 }
 
 int
diff --git a/libstdc++-v3/testsuite/25_algorithms/make_heap/movable.cc b/libstdc++-v3/testsuite/25_algorithms/make_heap/movable.cc
new file mode 100644
index 000..f6738f1
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/make_heap/movable.cc
@@ -0,0 +1,38 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  int i[] = { 1, 2, 3, 4 };
+  std::function f = std::less<>{};
+  // If this uses a moved-from std::function we'll get an exception:
+  std::make_heap(std::begin(i), std::end(i), f);
+  std::sort_heap(std::begin(i), std::end(i), f);
+}
+
+int
+main()
+{
+  test01();
+}


[PATCH, configure, libgfortran]: PR78478: Use fpu-generic for x86 when _SOFT_FLOAT is defined

2017-01-19 Thread Uros Bizjak
Hello!

Attached patch avoids bootstrap error when building libgfortran for
soft-float x86 targets.  Configure detects when _SOFT_FLOAT is defined
and uses fpu-generic.h instead of fpu-387.h header.

The patch also imports ax_check_define.m4 from autoconf archive [1] -
the macro is really handy, and I guess it can be used in many other
places, avoiding reinventing the wheel.

2017-01-19  Uros Bizjak  

PR target/78478
* config/ax_check_define.m4: New file.

libgfortran/ChangeLog:

2017-01-19  Uros Bizjak  

PR target/78478
* acinclude.m4: Include ../config/ax_check_define.m4
* configure.ac: Check if _SOFT_FLOAT is defined.
* configure.host (i?86 | x86_64): Use fpu-generic when
have_soft_float is set.
* configure: Regenerate.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline and gcc-6 branch?

[1] https://www.gnu.org/software/autoconf-archive/ax_check_define.html

Uros.
Index: config/ax_check_define.m4
===
--- config/ax_check_define.m4   (nonexistent)
+++ config/ax_check_define.m4   (working copy)
@@ -0,0 +1,92 @@
+# ===
+#  http://www.gnu.org/software/autoconf-archive/ax_check_define.html
+# ===
+#
+# SYNOPSIS
+#
+#   AC_CHECK_DEFINE([symbol], [ACTION-IF-FOUND], [ACTION-IF-NOT])
+#   AX_CHECK_DEFINE([includes],[symbol], [ACTION-IF-FOUND], [ACTION-IF-NOT])
+#
+# DESCRIPTION
+#
+#   Complements AC_CHECK_FUNC but it does not check for a function but for a
+#   define to exist. Consider a usage like:
+#
+#AC_CHECK_DEFINE(__STRICT_ANSI__, CFLAGS="$CFLAGS -D_XOPEN_SOURCE=500")
+#
+# LICENSE
+#
+#   Copyright (c) 2008 Guido U. Draheim 
+#
+#   This program is free software; you can redistribute it and/or modify it
+#   under the terms of the GNU General Public License as published by the
+#   Free Software Foundation; either version 3 of the License, or (at your
+#   option) any later version.
+#
+#   This program is distributed in the hope that it will be useful, but
+#   WITHOUT ANY WARRANTY; without even the implied warranty of
+#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+#   Public License for more details.
+#
+#   You should have received a copy of the GNU General Public License along
+#   with this program. If not, see .
+#
+#   As a special exception, the respective Autoconf Macro's copyright owner
+#   gives unlimited permission to copy, distribute and modify the configure
+#   scripts that are the output of Autoconf when processing the Macro. You
+#   need not follow the terms of the GNU General Public License when using
+#   or distributing such scripts, even though portions of the text of the
+#   Macro appear in them. The GNU General Public License (GPL) does govern
+#   all other use of the material that constitutes the Autoconf Macro.
+#
+#   This special exception to the GPL applies to versions of the Autoconf
+#   Macro released by the Autoconf Archive. When you make and distribute a
+#   modified version of the Autoconf Macro, you may extend this special
+#   exception to the GPL to apply to your modified version as well.
+
+#serial 8
+
+AU_ALIAS([AC_CHECK_DEFINED], [AC_CHECK_DEFINE])
+AC_DEFUN([AC_CHECK_DEFINE],[
+AS_VAR_PUSHDEF([ac_var],[ac_cv_defined_$1])dnl
+AC_CACHE_CHECK([for $1 defined], ac_var,
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[]], [[
+  #ifdef $1
+  int ok;
+  #else
+  choke me
+  #endif
+]])],[AS_VAR_SET(ac_var, yes)],[AS_VAR_SET(ac_var, no)]))
+AS_IF([test AS_VAR_GET(ac_var) != "no"], [$2], [$3])dnl
+AS_VAR_POPDEF([ac_var])dnl
+])
+
+AU_ALIAS([AX_CHECK_DEFINED], [AX_CHECK_DEFINE])
+AC_DEFUN([AX_CHECK_DEFINE],[
+AS_VAR_PUSHDEF([ac_var],[ac_cv_defined_$2_$1])dnl
+AC_CACHE_CHECK([for $2 defined in $1], ac_var,
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include <$1>]], [[
+  #ifdef $2
+  int ok;
+  #else
+  choke me
+  #endif
+]])],[AS_VAR_SET(ac_var, yes)],[AS_VAR_SET(ac_var, no)]))
+AS_IF([test AS_VAR_GET(ac_var) != "no"], [$3], [$4])dnl
+AS_VAR_POPDEF([ac_var])dnl
+])
+
+AC_DEFUN([AX_CHECK_FUNC],
+[AS_VAR_PUSHDEF([ac_var], [ac_cv_func_$2])dnl
+AC_CACHE_CHECK([for $2], ac_var,
+dnl AC_LANG_FUNC_LINK_TRY
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([$1
+#undef $2
+char $2 ();],[
+char (*f) () = $2;
+return f != $2; ])],
+[AS_VAR_SET(ac_var, yes)],
+[AS_VAR_SET(ac_var, no)])])
+AS_IF([test AS_VAR_GET(ac_var) = yes], [$3], [$4])dnl
+AS_VAR_POPDEF([ac_var])dnl
+])# AC_CHECK_FUNC
Index: libgfortran/acinclude.m4
===
--- libgfortran/acinclude.m4(revision 244636)
+++ libgfortran/acinclude.m4(working copy)
@@ -1,6 +1,7 @@
 m4_include(../config/acx.m4)
 m4_include(../config/no-executables.m4)
 m4

Re: [PATCH] C++: fix fix-it hints for misspellings within explicit namespaces (v2)

2017-01-19 Thread David Malcolm
On Thu, 2017-01-19 at 13:15 -0500, Jason Merrill wrote:
> On Wed, Jan 18, 2017 at 5:29 PM, David Malcolm 
> wrote:
> > Here's a version of the patch which simply tweaks
> > cp_parser_primary_expression's call to finish_id_expression so that
> > it passes the location of the id_expression, rather than that of
> > id_expr_token.
> > 
> > The id_expression in question came from cp_parser_id_expression,
> > whereas the id_expr_token is the first token within the id
> > -expression.
> > 
> > The location passed here to finish_id_expression only affects:
> > the location used for name-lookup errors, and for the resulting
> > decl cp_expr.  Given that the following code immediately does this:
> > decl.set_location (id_expr_token->location);
> 
> What happens if we use id_expression.get_location() here, too?
> 
> OK.

With that other change it bootstraps but introduces some regressions:

 PASS -> FAIL : g++.dg/cpp0x/pr51420.C  -std=c++11  (test for errors, line 6)
 PASS -> FAIL : g++.dg/cpp0x/pr51420.C  -std=c++11 (test for excess errors)
 PASS -> FAIL : g++.dg/cpp0x/pr51420.C  -std=c++14  (test for errors, line 6)
 PASS -> FAIL : g++.dg/cpp0x/pr51420.C  -std=c++14 (test for excess errors)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++11  (test for errors, 
line 11)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++11  (test for errors, 
line 14)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++11  (test for errors, 
line 5)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++11  (test for errors, 
line 8)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++11 (test for excess 
errors)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++14  (test for errors, 
line 11)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++14  (test for errors, 
line 14)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++14  (test for errors, 
line 5)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++14  (test for errors, 
line 8)
 PASS -> FAIL : g++.dg/cpp0x/udlit-declare-neg.C  -std=c++14 (test for excess 
errors)

It would change:

g++.dg/cpp0x/pr51420.C: In function ‘void foo()’:
g++.dg/cpp0x/pr51420.C:6:13: error: ‘operator""_F’ was not declared in this 
scope
   float x = operator"" _F();  //  { dg-error  "13:'operator\"\"_F' was not 
declared in this scope" }
 ^~~~
g++.dg/cpp0x/pr51420.C:6:13: note: suggested alternative: ‘operator new’
   float x = operator"" _F();  //  { dg-error  "13:'operator\"\"_F' was not 
declared in this scope" }
 ^~~~
 operator new

to:

g++.dg/cpp0x/pr51420.C: In function ‘void foo()’:
g++.dg/cpp0x/pr51420.C:6:27: error: ‘operator""_F’ was not declared in this 
scope
   float x = operator"" _F();  //  { dg-error  "13:'operator\"\"_F' was not 
declared in this scope" }
   ^

and would change:

g++.dg/cpp0x/udlit-declare-neg.C:5:9: error: ‘operator""_Bar’ was not declared 
in this scope
 int i = operator"" _Bar('x');  // { dg-error "9:'operator\"\"_Bar' was not 
declared in this scope" }
 ^~~~
g++.dg/cpp0x/udlit-declare-neg.C:5:9: note: suggested alternative: ‘operator 
new’
 int i = operator"" _Bar('x');  // { dg-error "9:'operator\"\"_Bar' was not 
declared in this scope" }
 ^~~~
 operator new

to:

g++.dg/cpp0x/udlit-declare-neg.C:5:28: error: ‘operator""_Bar’ was not declared 
in this scope
 int i = operator"" _Bar('x');  // { dg-error "9:'operator\"\"_Bar' was not 
declared in this scope" }
^

(DejaGnu picked up on this via the changing column numbers, but it
didn't detect the missing "suggested alternative").


With the patch I posted as-is, we get:

g++.dg/cpp0x/pr51420.C:6:13: error: ‘operator""_F’ was not declared in this 
scope
   float x = operator"" _F();  //  { dg-error  "13:'operator\"\"_F' was not 
declared in this scope" }
 ^~~~

and:

g++.dg/cpp0x/udlit-declare-neg.C:5:9: error: ‘operator""_Bar’ was not declared 
in this scope
 int i = operator"" _Bar('x');  // { dg-error "9:'operator\"\"_Bar' was not 
declared in this scope" }
 ^~~~

i.e. the same locations as the status quo, but dropping the suggested
"operator new" hint.


Is the patch still OK as-is?
Dave


Re: [PATCH, configure, libgfortran]: PR78478: Use fpu-generic for x86 when _SOFT_FLOAT is defined

2017-01-19 Thread Janne Blomqvist
On Thu, Jan 19, 2017 at 10:43 PM, Uros Bizjak  wrote:
> Hello!
>
> Attached patch avoids bootstrap error when building libgfortran for
> soft-float x86 targets.  Configure detects when _SOFT_FLOAT is defined
> and uses fpu-generic.h instead of fpu-387.h header.
>
> The patch also imports ax_check_define.m4 from autoconf archive [1] -
> the macro is really handy, and I guess it can be used in many other
> places, avoiding reinventing the wheel.
>
> 2017-01-19  Uros Bizjak  
>
> PR target/78478
> * config/ax_check_define.m4: New file.
>
> libgfortran/ChangeLog:
>
> 2017-01-19  Uros Bizjak  
>
> PR target/78478
> * acinclude.m4: Include ../config/ax_check_define.m4
> * configure.ac: Check if _SOFT_FLOAT is defined.
> * configure.host (i?86 | x86_64): Use fpu-generic when
> have_soft_float is set.
> * configure: Regenerate.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for mainline and gcc-6 branch?

Ok, thanks.

>
> [1] https://www.gnu.org/software/autoconf-archive/ax_check_define.html
>
> Uros.



-- 
Janne Blomqvist


Re: Pretty printers for versioned namespace

2017-01-19 Thread François Dumont

On 10/01/2017 13:39, Jonathan Wakely wrote:

I've committed the attached patch, which passes the tests for the
default configuration and the versioned namespace configuration.

I added another helper function, strip_versioned_namespace, which is
more expressive than doing typename.replace(vers_nsp, '') everywhere.
I've also renamed vers_nsp to _versioned_namespace (using the naming
convention for global variables private to the module). I've added
checks so that if that variable is None then the extra printers and
special cases for the versioned namespace are skipped. That's not
currently used, but it would allow us to optimise things later if
needed.


Very nice feature indeed, see below.



I also needed to update the new SharedPtrMethodsMatcher to add
"(__\d+)?" to the regular expression.



@@ -1392,47 +1406,54 @@ def register_type_printers(obj):
add_one_type_printer(obj, 'discard_block_engine', 'ranlux48')
add_one_type_printer(obj, 'shuffle_order_engine', 'knuth_b')

+# Consider optional versioned namespace
+opt_nsp = '(' + vers_nsp + ')?'
+
# Do not show defaulted template arguments in class templates
add_one_template_type_printer(obj, 'unique_ptr',
-'unique_ptr<(.*), std::default_delete<\\1 ?> >',
-'unique_ptr<{1}>')
+'{0}unique_ptr<(.*), std::{0}default_delete<\\2 ?> 
>'.format(opt_nsp),

+'unique_ptr<{2}>')


This is ugly. Mixing python string formatting with regular expressions
makes it harder to read, and is inconsistent with how the versioned
namespace is handled elsewhere. In Printer.add_version and
add_one_type_printer we just register two names, one using std:: and
one using std::__7::. We can do the same for the template type
printers.


Yes, your approach is much nicer even if it results in more type 
printer registered.


My plan was to submit the attached patch but this doesn't work as 
the python module seems to be loaded before libstdc++.so. If you know a 
way to test for versioned namespace before starting registering printers 
this patch might still be useful. Otherwise I will just forget it.


François
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 36dd81d..5e42988 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -79,6 +79,15 @@ try:
 except ImportError:
 pass
 
+_versioned_namespace = None
+
+# Use std::string to find out if versioned namespace has been activated.
+try:
+gdb.lookup_type('std::string')
+except RuntimeError:
+_versioned_namespace = '__7::'
+pass
+
 # Starting with the type ORIG, search for the member type NAME.  This
 # handles searching upward through superclasses.  This is needed to
 # work around http://sourceware.org/bugzilla/show_bug.cgi?id=13615.
@@ -100,8 +109,6 @@ def find_type(orig, name):
 raise ValueError("Cannot find type %s::%s" % (str(orig), name))
 typ = field.type
 
-_versioned_namespace = '__7::'
-
 # Test if a type is a given template instantiation.
 def is_specialization_of(type, template_name):
 global _versioned_namespace
@@ -1222,11 +1229,12 @@ class Printer(object):
 self.subprinters.append(printer)
 self.lookup[name] = printer
 
-# Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION.
+# Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION if needed.
 def add_version(self, base, name, function):
-self.add(base + name, function)
 if _versioned_namespace:
 self.add(base + _versioned_namespace + name, function)
+else:
+self.add(base + name, function)
 
 # Add a name using _GLIBCXX_BEGIN_NAMESPACE_CONTAINER.
 def add_container(self, base, name, function):
@@ -1319,13 +1327,11 @@ class TemplateTypePrinter(object):
 
 def add_one_template_type_printer(obj, name, match, subst):
 match = '^std::' + match + '$'
-printer = TemplateTypePrinter(name, match, 'std::' + subst)
-gdb.types.register_type_printer(obj, printer)
 if _versioned_namespace:
-# Add second type printer for same type in versioned namespace:
+# Add type printer in versioned namespace:
 match = match.replace('std::', 'std::' + _versioned_namespace)
-printer = TemplateTypePrinter(name, match, 'std::' + subst)
-gdb.types.register_type_printer(obj, printer)
+printer = TemplateTypePrinter(name, match, 'std::' + subst)
+gdb.types.register_type_printer(obj, printer)
 
 class FilteringTypePrinter(object):
 def __init__(self, match, name):
@@ -1359,11 +1365,11 @@ class FilteringTypePrinter(object):
 return self._recognizer(self.match, self.name)
 
 def add_one_type_printer(obj, match, name):
-printer = FilteringTypePrinter(match, 'std::' + name)
-gdb.types.register_type_printer(obj, printer)
+namespace = 'std::'
 if _versioned_namespace:
-printer = FilteringTypePrinter(match, 'std::' + 

Re: Pretty printers for versioned namespace

2017-01-19 Thread Jonathan Wakely

On 19/01/17 22:01 +0100, François Dumont wrote:

On 10/01/2017 13:39, Jonathan Wakely wrote:

I've committed the attached patch, which passes the tests for the
default configuration and the versioned namespace configuration.

I added another helper function, strip_versioned_namespace, which is
more expressive than doing typename.replace(vers_nsp, '') everywhere.
I've also renamed vers_nsp to _versioned_namespace (using the naming
convention for global variables private to the module). I've added
checks so that if that variable is None then the extra printers and
special cases for the versioned namespace are skipped. That's not
currently used, but it would allow us to optimise things later if
needed.


Very nice feature indeed, see below.



I also needed to update the new SharedPtrMethodsMatcher to add
"(__\d+)?" to the regular expression.



@@ -1392,47 +1406,54 @@ def register_type_printers(obj):
   add_one_type_printer(obj, 'discard_block_engine', 'ranlux48')
   add_one_type_printer(obj, 'shuffle_order_engine', 'knuth_b')

+# Consider optional versioned namespace
+opt_nsp = '(' + vers_nsp + ')?'
+
   # Do not show defaulted template arguments in class templates
   add_one_template_type_printer(obj, 'unique_ptr',
-'unique_ptr<(.*), std::default_delete<\\1 ?> >',
-'unique_ptr<{1}>')
+'{0}unique_ptr<(.*), std::{0}default_delete<\\2 ?> 

'.format(opt_nsp),

+'unique_ptr<{2}>')


This is ugly. Mixing python string formatting with regular expressions
makes it harder to read, and is inconsistent with how the versioned
namespace is handled elsewhere. In Printer.add_version and
add_one_type_printer we just register two names, one using std:: and
one using std::__7::. We can do the same for the template type
printers.


   Yes, your approach is much nicer even if it results in more type 
printer registered.


   My plan was to submit the attached patch but this doesn't work as 
the python module seems to be loaded before libstdc++.so. If you know 
a way to test for versioned namespace before starting registering 
printers this patch might still be useful. Otherwise I will just 
forget it.


See the attached patch, which decides at configure-time whether to
enable the versioned namespace printers or not. This is what I had in
mind.



diff --git a/libstdc++-v3/python/Makefile.am b/libstdc++-v3/python/Makefile.am
index 80790e2..5d19d3d 100644
--- a/libstdc++-v3/python/Makefile.am
+++ b/libstdc++-v3/python/Makefile.am
@@ -29,6 +29,12 @@ else
 pythondir = $(datadir)/gcc-$(gcc_version)/python
 endif
 
+if ENABLE_SYMVERS_GNU_NAMESPACE
+use_versioned_namespace = True
+else
+use_versioned_namespace = False
+endif
+
 all-local: gdb.py
 
 nobase_python_DATA = \
@@ -39,7 +45,9 @@ nobase_python_DATA = \
 
 gdb.py: hook.in Makefile
sed -e 's,@pythondir@,$(pythondir),' \
-   -e 's,@toolexeclibdir@,$(toolexeclibdir),' < $(srcdir)/hook.in > $@
+   -e 's,@toolexeclibdir@,$(toolexeclibdir),' \
+   -e 's,@use_versioned_namespace@,$(use_versioned_namespace),' \
+   < $(srcdir)/hook.in > $@
 
 install-data-local: gdb.py
@$(mkdir_p) $(DESTDIR)$(toolexeclibdir)
diff --git a/libstdc++-v3/python/hook.in b/libstdc++-v3/python/hook.in
index b82604a6c..1b3a577 100644
--- a/libstdc++-v3/python/hook.in
+++ b/libstdc++-v3/python/hook.in
@@ -58,4 +58,4 @@ if gdb.current_objfile () is not None:
 # Call a function as a plain import would not execute body of the included file
 # on repeated reloads of this object file.
 from libstdcxx.v6 import register_libstdcxx_printers
-register_libstdcxx_printers(gdb.current_objfile())
+register_libstdcxx_printers(gdb.current_objfile(), @use_versioned_namespace@)
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 36dd81d..4a7d117 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1482,7 +1482,7 @@ def register_type_printers(obj):
 'experimental::fundamentals_v\d::basic_string_view<(.*), 
std::char_traits<\\1> >',
 'experimental::basic_string_view<\\1>')
 
-def register_libstdcxx_printers (obj):
+def register_libstdcxx_printers (obj, use_versioned_namespace = False):
 "Register libstdc++ pretty-printers with objfile Obj."
 
 global _use_gdb_pp
@@ -1495,6 +1495,9 @@ def register_libstdcxx_printers (obj):
 obj = gdb
 obj.pretty_printers.append(libstdcxx_printer)
 
+if not use_versioned_namespace:
+_versioned_namespace = None
+
 register_type_printers(obj)
 
 def build_libstdcxx_dictionary ():


[PATCH, i386] Fix PR78478, Compile Error for i386-rtems

2017-01-19 Thread Uros Bizjak
Hello!

This patch (partially) reverts my change from 2013-11-05. Apparently,
LONG_DOUBLE_TYPE_SIZE interferes with soft-float handling.

2017-01-19  Uros Bizjak  

PR target/78478
Revert:
2013-11-05  Uros Bizjak  

* config/i386/rtemself.h (LONG_DOUBLE_TYPE_SIZE): New define.

Tested by Joel on RTEMS soft-float target.

Committed to mainline SVN.

Uros.
Index: config/i386/rtemself.h
===
--- config/i386/rtemself.h  (revision 244636)
+++ config/i386/rtemself.h  (working copy)
@@ -28,6 +28,3 @@
builtin_assert ("system=rtems");\
 }  \
   while (0)
-
-#undef LONG_DOUBLE_TYPE_SIZE
-#define LONG_DOUBLE_TYPE_SIZE (TARGET_80387 ? 80 : 64)


[PATCH, i386]: Remove config/i386/rtems-64.h

2017-01-19 Thread Uros Bizjak
Hello!

This file is now the same as config/i386/rtemself.h. Remove one copy.

2017-01-19  Uros Bizjak  

* config.gcc (x86_64-*-rtems*): Use i386/rtemself.h
instead of i386/rtems-64.h.
* config/i386/rtems-64.h: Remove.

Committed as obvious.

Uros.
Index: config/i386/rtems-64.h
===
--- config/i386/rtems-64.h  (revision 244636)
+++ config/i386/rtems-64.h  (nonexistent)
@@ -1,30 +0,0 @@
-/* Definitions for rtems targeting an x86_64
-   Copyright (C) 2016-2017 Free Software Foundation, Inc.
-   Contributed by Joel Sherrill (j...@oarcorp.com).
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 3, or (at your option)
-any later version.
-
-GCC is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
-.  */
-
-/* Specify predefined symbols in preprocessor.  */
-
-#define TARGET_OS_CPP_BUILTINS()   \
-  do   \
-{  \
-   builtin_define ("__rtems__");   \
-   builtin_define ("__USE_INIT_FINI__");   \
-   builtin_assert ("system=rtems");\
-}  \
-  while (0)
Index: config.gcc
===
--- config.gcc  (revision 244636)
+++ config.gcc  (working copy)
@@ -1447,7 +1447,7 @@
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h i386/x86-64.h"
;;
 x86_64-*-rtems*)
-   tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h i386/x86-64.h i386/rtems-64.h"
+   tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h i386/x86-64.h i386/rtemself.h"
;;
 i[34567]86-*-rdos*)
 tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h i386/rdos.h"


  1   2   >