Re: [PATCH][testsuite] Fix TORTURE_OPTIONS overriding

2015-06-24 Thread Richard Biener
On Tue, 23 Jun 2015, James Greenhalgh wrote:

> 
> On Thu, Jun 18, 2015 at 11:10:01AM +0100, Richard Biener wrote:
> >
> > Currently when doing
> >
> > make check-gcc RUNTESTFLAGS="TORTURE_OPTIONS=\\\"{ -O3 } { -O2 }\\\"
> > dg-torture.exp"
> >
> > you get -O3 and -O2 but also the two LTO torture option combinations.
> > That's undesired (those are the most expensive anyway).  The following
> > patch avoids this by setting LTO_TORTURE_OPTIONS only when
> > TORTURE_OPTIONS isn't specified.
> >
> > Tested with and without TORTURE_OPTIONS for C and fortran tortures.
> >
> > Seems the instruction in c-torture.exp how to override TORTURE_OPTIONS
> > is off, RUNTESTFLAGS="TORTURE_OPTIONS=\\\"{ { -O3 } { -O2 } }\\\"
> > certainly doesn't do what it should.
> 
> This patch causes issues for ARM and AArch64 cross multilib
> testing. There are two issues, one is that we now clobber
> gcc_force_conventional_output after setting it in the conditional this patch
> moved (hits all targets, see the new x86-64 failures like pr61848.c).
> 
> The other is that we no longer protect environment settings before calling
> check_effective_target_lto, which results in our cross --specs files no
> longer being on the path.
> 
> I've fixed these issues by rearranging the file again, but I'm not
> sure if what I've done is sensible and does not cause other issues. This
> seems to bring back the tests I'd lost overnight, and doesn't cause
> issues elsewhere.
> 
> I've run some cross-tests to ensure this brings back the missing tests,
> and a full x86-64 testrun to make sure I haven't dropped any from there.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> James
> 
> ---
> 2015-06-23  James Greenhalgh  
> 
> * lib/c-torture.exp: Don't call check_effective_target_lto
>   before setting up environment correctly.
> * lib/gcc-dg.exp: Likewise, and protect
>   gcc_force_conventional_output.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [PATCH 4.8] PR66306 Fix reload problem with commutative operands

2015-06-24 Thread Andreas Krebbel
On 06/16/2015 07:40 PM, Ulrich Weigand wrote:
> Andreas Krebbel wrote:
> 
>> this fixes a reload problem with match_dup's on commutative operands.
>>
>> Bootstrapped and regtested on x86-64, ppc64, and s390x.
>>
>> Ok?
>>
>> Bye,
>>
>> -Andreas-
>>
>> 2015-06-11  Andreas Krebbel  
>>
>>  PR rtl-optimization/66306
>>  * reload.c (find_reloads): Swap the match_dup info for
>>  commutative operands.
> 
> This does indeed appear to be broken, and the fix looks good to me.
> 
> However, I'm not clear why this should be a 4.8 only patch ... the
> same problem seems to be still there on mainline, right?
> 
> Patch is OK for mainline if it passes regression tests there.

I've committed the patch after successful testing on PPC64 and s390x. I 
couldn't get reload working
on x86_64 quickly.

Bye,

-Andreas-



Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Ramana Radhakrishnan



On 24/06/15 02:00, Sandra Loosemore wrote:

On 06/18/2015 11:32 AM, Eric Botcazou wrote:

The attached patch teaches regrename to validate insns affected by each
register renaming before making the change.  I can see at least two
other ways to handle this -- earlier, by rejecting renamings that result
in invalid instructions when it's searching for the best renaming; or
later, by validating the entire set of renamings as a group instead of
incrementally for each one -- but doing it all in regname_do_replace
seems least disruptive and risky in terms of the existing code.


OK, but the patch looks incomplete, rename_chains should be adjusted
as well,
i.e. regrename_do_replace should now return a boolean.


Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
built for aarch64-linux-gnu and ran the gcc testsuite.


Hopefully that was built with --with-cpu=cortex-a57 to enable the 
renaming pass ?


Ramana



The c6x back end also calls regrename_do_replace.  I am not set up to
build or test on that target, and Bernd told me off-list that it would
never fail on that target anyway so I have left that code alone.

-Sandra


Re: [PATCH][RFC] Add FRE in pass_vectorize

2015-06-24 Thread Richard Biener
On Tue, 23 Jun 2015, Jeff Law wrote:

> On 06/10/2015 08:02 AM, Richard Biener wrote:
> > 
> > The following patch adds FRE after vectorization which is needed
> > for IVOPTs to remove redundant PHI nodes (well, I'm testing a
> > patch for FRE that will do it already there).
> Redundant or degenerates which should be propagated?

Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same "enough" - the vectorizer can end up generating redundant huge
expressions for both.

> I believe Alan Lawrence has run into similar issues (unpropagated degenerates)
> with his changes to make loop header copying more aggressive.  Threading will
> also create them.  The phi-only propagator may be the solution.  It ought to
> be cheaper than FRE.

Yes, but that's unrelated (see above).

> > The patch also makes FRE preserve loop-closed SSA form and thus
> > make it suitable for use in the loop pipeline.
> Loop optimizations will tend to create opportunities for redundancy
> elimination, so the ability to use FRE in the loop pipeline seems like a good
> thing.  We ran into this in RTL land, so I'm not surprised to see it occurring
> in the gimple optimizers and thus I'm not opposed to running FRE in the loop
> pipeline.
> 
> 
> 
> > 
> > With the placement in the vectorizer sub-pass FRE will effectively
> > be enabled by -O3 only (well, or if one requests loop vectorization).
> > I've considered placing it after complete_unroll instead but that
> > would enable it at -O1 already.  I have no strong opinion on the
> > exact placement, but it should help all passes between vectorizing
> > and ivopts for vectorized loops.
> For -O3/vectorization it seems like a no-brainer.  -O1 less so.  IIRC we
> conditionalize -frerun-cse-after-loop on -O2 which seems more appropriate than
> doing it with -O1.
> 
> > 
> > Any other suggestions on pass placement?  I can of course key
> > that FRE run on -O3 explicitely.  Not sure if we at this point
> > want to start playing fancy games like setting a property
> > when a pass (likely) generated redundancies that are worth
> > fixing up and then key FRE on that one (it gets harder and
> > less predictable what transforms are run on code).
> RTL CSE is bloody expensive and so many times I wanted the ability to know a
> bit about what the loop optimizer had done (or not done) so that I could
> conditionally skip the second CSE pass.   We never built that, but it's
> something I've wanted for decades.

Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.

Eventually we'll just end up setting PROP_want_* from every pass...
(like we schedule a CFG cleanup from nearly every pass that did
anything).

Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.

And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.

I'm now back to match-and-simplify and will pursue that last idea
a bit (also wanting it for SCCVN itself).

Richard.

> Jeff
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [02/13] Replace handle_cache_entry with new interface

2015-06-24 Thread Richard Sandiford
Jeff Law  writes:
> On 06/16/2015 02:45 AM, Richard Sandiford wrote:
>> As described in the covering note, this patch replaces handle_cache_entry
>> with a new function keep_cache_entry.  It also ensures that elements are
>> deleted using the proper function, so that m_n_deleted is updated.
>>
>> I couldn't tell whether the unusual name of the function
>> ("gt_cleare_cache") is deliberate or not, but I left it be.
> Short-hand for clear_entry or something similar?

Yeah, could be.

>> gcc/ada/
>>  * gcc-interface/decl.c (value_annotation_hasher::handle_cache_entry):
>>  Delete.
>>  (value_annotation_hasher::keep_cache_entry): New function.
>>  * gcc-interface/utils.c (pad_type_hasher::handle_cache_entry):
>>  Delete.
>>  (pad_type_hasher::keep_cache_entry): New function.
>>
>> gcc/
>>  * hash-table.h (hash_table): Add gt_cleare_cache as a friend.
>>  (gt_cleare_cache): Check here for deleted and empty entries.
>>  Replace handle_cache_entry with a call to keep_cache_entry.
>>  * hash-traits.h (ggc_cache_hasher::handle_cache_entry): Delete.
>>  (ggc_cache_hasher::keep_cache_entry): New function.
>>  * trans-mem.c (tm_wrapper_hasher::handle_cache_entry): Delete.
>>  (tm_wrapper_hasher::keep_cache_entry): New function.
>>  * tree.h (tree_decl_map_cache_hasher::handle_cache_entry): Delete.
>>  (tree_vec_map_cache_hasher::keep_cache_entry): New function.
>>  * tree.c (type_cache_hasher::handle_cache_entry): Delete.
>>  (type_cache_hasher::keep_cache_entry): New function.
>>  (tree_vec_map_cache_hasher::handle_cache_entry): Delete.
>>  (tree_vec_map_cache_hasher::keep_cache_entry): New function.
>>  * ubsan.c (tree_type_map_cache_hasher::handle_cache_entry): Delete.
>>  (tree_type_map_cache_hasher::keep_cache_entry): New function.
>>  * varasm.c (tm_clone_hasher::handle_cache_entry): Delete.
>>  (tm_clone_hasher::keep_cache_entry): New function.
>>  * config/i386/i386.c (dllimport_hasher::handle_cache_entry): Delete.
>>  (dllimport_hasher::keep_cache_entry): New function.
> So for all the keep_cache_entry functions, I guess they're trivial 
> enough that a function comment probably isn't needed.

Yeah.  For cases like this where the function is implementing a defined
interface (described in hash-table.h), I think it's better to only have
comments for implementations that are doing something non-obvious.

> Presumably no good way to share the trivial implementation?

Probably not without sharing the other parts of the traits in some way.
That might be another possible cleanup :-)

Thanks,
Richard



Re: [05/13] Add nofree_ptr_hash

2015-06-24 Thread Richard Sandiford
Jeff Law  writes:
> On 06/16/2015 02:55 AM, Richard Sandiford wrote:
>> This patch stops pointer_hash from inheriting typed_noop_remove and
>> instead creates a new class nofree_ptr_hash that inherits from both.
>> It then updates all uses of typed_noop_remove (which are all pointers)
>> and pointer_hash so that they use this new class instead.
>>
>> gcc/
>>  * hash-table.h: Update comments.
>>  * hash-traits.h (pointer_hash): Don't inherit from typed_noop_remove.
>>  (nofree_ptr_hash): New class.
>>  * asan.c (asan_mem_ref_hasher): Inherit from nofree_ptr_hash rather
>>  than typed_noop_remove.  Remove redudant typedefs.
>>  * attribs.c (attribute_hasher): Likewise.
>>  * cfg.c (bb_copy_hasher): Likewise.
>>  * cselib.c (cselib_hasher): Likewise.
>>  * dse.c (invariant_group_base_hasher): Likewise.
>>  * dwarf2cfi.c (trace_info_hasher): Likewise.
>>  * dwarf2out.c (macinfo_entry_hasher): Likewise.
>>  (comdat_type_hasher, loc_list_hasher): Likewise.
>>  * gcse.c (pre_ldst_expr_hasher): Likewise.
>>  * genmatch.c (id_base): Likewise.
>>  * genrecog.c (test_pattern_hasher): Likewise.
>>  * gimple-ssa-strength-reduction.c (cand_chain_hasher): Likewise.
>>  * haifa-sched.c (delay_i1_hasher): Likewise.
>>  * hard-reg-set.h (simplifiable_subregs_hasher): Likewise.
>>  * ipa-icf.h (congruence_class_group_hash): Likewise.
>>  * ipa-profile.c (histogram_hash): Likewise.
>>  * ira-color.c (allocno_hard_regs_hasher): Likewise.
>>  * lto-streamer.h (string_slot_hasher): Likewise.
>>  * lto-streamer.c (tree_entry_hasher): Likewise.
>>  * plugin.c (event_hasher): Likewise.
>>  * postreload-gcse.c (expr_hasher): Likewise.
>>  * store-motion.c (st_expr_hasher): Likewise.
>>  * tree-sra.c (uid_decl_hasher): Likewise.
>>  * tree-ssa-coalesce.c (coalesce_pair_hasher): Likewise.
>>  (ssa_name_var_hash): Likewise.
>>  * tree-ssa-live.c (tree_int_map_hasher): Likewise.
>>  * tree-ssa-loop-im.c (mem_ref_hasher): Likewise.
>>  * tree-ssa-pre.c (pre_expr_d): Likewise.
>>  * tree-ssa-sccvn.c (vn_nary_op_hasher): Likewise.
>>  * vtable-verify.h (registration_hasher): Likewise.
>>  * vtable-verify.c (vtbl_map_hasher): Likewise.
>>  * config/arm/arm.c (libcall_hasher): Likewise.
>>  * config/i386/winnt.c (wrapped_symbol_hasher): Likewise.
>>  * config/ia64/ia64.c (bundle_state_hasher): Likewise.
>>  * config/sol2.c (comdat_entry_hasher): Likewise.
>>  * fold-const.c (fold): Use nofree_ptr_hash instead of pointer_hash.
>>  (print_fold_checksum, fold_checksum_tree): Likewise.
>>  (debug_fold_checksum, fold_build1_stat_loc): Likewise.
>>  (fold_build2_stat_loc, fold_build3_stat_loc): Likewise.
>>  (fold_build_call_array_loc): Likewise.
>>  * tree-ssa-ccp.c (gimple_htab): Likewise.
>>  * tree-browser.c (tree_upper_hasher): Inherit from nofree_ptr_hash
>>  rather than pointer_type.
>>
>> gcc/c/
>>  * c-decl.c (detect_field_duplicates_hash): Use nofree_ptr_hash
>>  instead of pointer_hash.
>>  (detect_field_duplicates): Likewise.
>>
>> gcc/cp/
>>  * class.c (fixed_type_or_null_ref_ht): Inherit from nofree_ptr_hash
>>  rather than pointer_hash.
>>  (fixed_type_or_null): Use nofree_ptr_hash instead of pointer_hash.
>>  * semantics.c (nrv_data): Likewise.
>>  * tree.c (verify_stmt_tree_r, verify_stmt_tree): Likewise.
>>
>> gcc/java/
>>  * jcf-io.c (charstar_hash): Inherit from nofree_ptr_hash rather
>>  than typed_noop_remove.  Remove redudant typedefs.
>>
>> gcc/lto/
>>  * lto.c (tree_scc_hasher): Inherit from nofree_ptr_hash rather
>>  than typed_noop_remove.  Remove redudant typedefs.
>>
>> gcc/objc/
>>  * objc-act.c (decl_name_hash): Inherit from nofree_ptr_hash rather
>>  than typed_noop_remove.  Remove redudant typedefs.
>>
>> libcc1/
>>  * plugin.cc (string_hasher): Inherit from nofree_ptr_hash rather
>>  than typed_noop_remove.  Remove redudant typedefs.
>>  (plugin_context): Use nofree_ptr_hash rather than pointer_hash.
>>  (plugin_context::mark): Likewise.
> So are we allowing multiple inheritance in GCC?  It seems like that's 
> what we've got for nofree_ptr_hash.  Is there a better way to achieve 
> what you're trying to do, or do you think this use ought to fall under 
> some kind of exception?
>
>
>> Index: gcc/haifa-sched.c
>> ===
>> --- gcc/haifa-sched.c2015-06-16 09:53:47.338092692 +0100
>> +++ gcc/haifa-sched.c2015-06-16 09:53:47.322092878 +0100
>> @@ -614,9 +614,8 @@ struct delay_pair
>>
>>   /* Helpers for delay hashing.  */
>>
>> -struct delay_i1_hasher : typed_noop_remove 
>> +struct delay_i1_hasher : nofree_ptr_hash 
>>   {
>> -  typedef delay_pair *value_type;
>> typedef void *compare_type;
>> static inline hashval_t hash (const delay_pair *);
>> static inline bool equal (c

Re: PING: Re: [patch] PR debug/66482: Do not ICE in gen_formal_parameter_die

2015-06-24 Thread Richard Biener
On Tue, Jun 23, 2015 at 6:08 PM, Aldy Hernandez  wrote:
> On 06/12/2015 10:07 AM, Aldy Hernandez wrote:
>
> Hi.
>
> This is now a P2, as it is causing a secondary target bootstrap to fail
> (s390).

Ok.

Thanks,
Richard.

> Aldy
>
>> Sigh.  I must say my head is spinning with this testcase and what we do
>> with it (-O3), even prior to the debug-early work:
>>
>> void f(int p) {}
>> int g() {
>>void f(int p);
>>g();
>>return 0;
>> }
>>
>> The inliner recursively inlines this function up to a certain depth, but
>> the useless inlining gets cleaned up shortly afterwards.  However, the
>> BLOCK_SOURCE_LOCATION are still set throughout which is technically
>> correct.
>>
>> Eventually late dwarf gets a hold of all this and we end up calling
>> dwarf2out_abstract_function to build debug info for the abstract
>> instance of a function for which we have already generated a DIE for.
>> Basically, a similar issue to what we encountered for template parameter
>> packs.  Or at least, that's my understanding, because as I've said, I
>> admit to being slightly confused here.
>>
>> Since technically this is all going away when we remove
>> dwarf2out_abstract_function, I suggest we remove the assert and avoid
>> sudden death.  It's not like the we generated useful debugging for this
>> testcase anyhow.
>>
>> Aldy
>
>


[Patch ARM] Fix PR target/63408

2015-06-24 Thread Ramana Radhakrishnan

Hi,

The attached patch fixes PR target/63408 and adds a regression test
for the same. The problem is essentially that
vfp3_const_double_for_fract_bits() needs to be aware that negative
values cannot be used in this context.

Tested with a bootstrap and regression test run on armhf. Applied to trunk.

Will apply to 5 after regression testing there and 4.9 after it unfreezes.

Ramana




2015-06-24  Ramana Radhakrishnan  

PR target/63408
* config/arm/arm.c (vfp3_const_double_for_fract_bits): Disable
for negative numbers.

2015-06-24  Ramana Radhakrishnan  

PR target/63408
* gcc.target/arm/pr63408.c: New test.
commit 9e6fb32c5ba143912c1114a59af0114500c5bc31
Author: Ramana Radhakrishnan 
Date:   Tue Jun 23 17:04:40 2015 +0100

Fix PR target/63408

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4fea1a6..4a284ec 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29812,7 +29812,8 @@ vfp3_const_double_for_fract_bits (rtx operand)
 return 0;
   
   REAL_VALUE_FROM_CONST_DOUBLE (r0, operand);
-  if (exact_real_inverse (DFmode, &r0))
+  if (exact_real_inverse (DFmode, &r0)
+  && !REAL_VALUE_NEGATIVE (r0))
 {
   if (exact_real_truncate (DFmode, &r0))
{
diff --git a/gcc/testsuite/gcc.target/arm/pr63408.c 
b/gcc/testsuite/gcc.target/arm/pr63408.c
new file mode 100644
index 000..a23b2a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr63408.c
@@ -0,0 +1,25 @@
+
+/* { dg-do run }  */
+/* { dg-options "-O2" } */
+void abort (void) __attribute__ ((noreturn));
+float __attribute__((noinline))
+f(float a, int b)
+{
+  return a - (((float)b / 0x7fff) * 100);
+}
+
+
+int
+main (void)
+{
+  float a[] = { 100.0, 0.0, 0.0};
+  int b[] = { 0x7fff, 0x7fff/100.0f, -0x7fff / 100.0f};
+  float c[] = { 0.0, -1.0, 1.0 };
+  int i;
+
+  for (i = 0; i < (sizeof(a) / sizeof (float)); i++)
+if (f (a[i], b[i]) != c[i])
+   abort ();
+
+  return 0;
+}


[gomp4, committed] Move rewrite_virtuals_into_loop_closed_ssa to tree-ssa-loop-manip.c

2015-06-24 Thread Tom de Vries

Hi,

this patch moves rewrite_virtuals_into_loop_closed_ssa to 
tree-ssa-loop-manip.c, as requested here: 
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01264.html .


Thanks,
- Tom
Move rewrite_virtuals_into_loop_closed_ssa to tree-ssa-loop-manip.c

2015-06-18  Tom de Vries  

	* tree-parloops.c (replace_uses_in_bbs_by)
	(rewrite_virtuals_into_loop_closed_ssa): Move to ...
	* tree-ssa-loop-manip.c: here.
	* tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa):
	Declare.
---
 gcc/tree-parloops.c   | 87 ---
 gcc/tree-ssa-loop-manip.c | 87 +++
 gcc/tree-ssa-loop-manip.h |  1 +
 3 files changed, 88 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 0661b78..a9d8c2a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1507,93 +1507,6 @@ replace_uses_in_bb_by (tree name, tree val, basic_block bb)
 }
 }
 
-/* Replace uses of NAME by VAL in blocks BBS.  */
-
-static void
-replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
-{
-  gimple use_stmt;
-  imm_use_iterator imm_iter;
-
-  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name)
-{
-  if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)->index))
-	continue;
-
-  use_operand_p use_p;
-  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-	SET_USE (use_p, val);
-}
-}
-
-/* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
-   In other words, ensure loop-closed ssa normal form for virtuals.  */
-
-static void
-rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
-{
-  gphi *phi;
-  edge exit = single_dom_exit (loop);
-
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (loop->header);
-   !gsi_end_p (gsi);
-   gsi_next (&gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
-  if (phi == NULL)
-return;
-
-  tree final_loop = PHI_ARG_DEF_FROM_EDGE (phi, single_succ_edge (loop->latch));
-
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (exit->dest);
-   !gsi_end_p (gsi);
-   gsi_next (&gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
-  if (phi != NULL)
-{
-  tree final_exit = PHI_ARG_DEF_FROM_EDGE (phi, exit);
-  gcc_assert (operand_equal_p (final_loop, final_exit, 0));
-  return;
-}
-
-  tree res_new = copy_ssa_name (final_loop, NULL);
-  gphi *nphi = create_phi_node (res_new, exit->dest);
-
-  /* Gather the bbs dominated by the exit block.  */
-  bitmap exit_dominated = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (exit_dominated, exit->dest->index);
-  vec exit_dominated_vec
-= get_dominated_by (CDI_DOMINATORS, exit->dest);
-
-  int i;
-  basic_block dom_bb;
-  FOR_EACH_VEC_ELT (exit_dominated_vec, i, dom_bb)
-bitmap_set_bit (exit_dominated, dom_bb->index);
-
-  exit_dominated_vec.release ();
-
-  replace_uses_in_bbs_by (final_loop, res_new, exit_dominated);
-
-  add_phi_arg (nphi, final_loop, exit, UNKNOWN_LOCATION);
-
-  BITMAP_FREE (exit_dominated);
-}
-
 /* Do transformation from:
 
  :
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 228fac6..1150e6c 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -569,6 +569,93 @@ rewrite_into_loop_closed_ssa (bitmap changed_bbs, unsigned update_flag)
   free (use_blocks);
 }
 
+/* Replace uses of NAME by VAL in blocks BBS.  */
+
+static void
+replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
+{
+  gimple use_stmt;
+  imm_use_iterator imm_iter;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name)
+{
+  if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)->index))
+	continue;
+
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+	SET_USE (use_p, val);
+}
+}
+
+/* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
+   In other words, ensure loop-closed ssa normal form for virtuals.  */
+
+void
+rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
+{
+  gphi *phi;
+  edge exit = single_dom_exit (loop);
+
+  phi = NULL;
+  for (gphi_iterator gsi = gsi_start_phis (loop->header);
+   !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  if (virtual_operand_p (PHI_RESULT (gsi.phi (
+	{
+	  phi = gsi.phi ();
+	  break;
+	}
+}
+
+  if (phi == NULL)
+return;
+
+  tree final_loop = PHI_ARG_DEF_FROM_EDGE (phi, single_succ_edge (loop->latch));
+
+  phi = NULL;
+  for (gphi_iterator gsi = gsi_start_phis (exit->dest);
+   !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  if (virtual_operand_p (PHI_RESULT (gsi.phi (
+	{
+	  phi = gsi.phi ();
+	  break;
+	}
+}
+
+  if (phi != NULL)
+{
+  tree final_exit = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+  gcc_assert (operand_equal_p (final_loop, final_exit, 0));
+  return;
+}
+
+  tree res_new = copy_ssa_name (final_loop, NULL);
+  gphi *nphi 

[gomp4, committed] Add bitmap_get_dominated_by

2015-06-24 Thread Tom de Vries

Hi,

this patch adds bitmap_get_dominated_by, a version of get_dominated_by 
that returns a bitmap rather than a vector.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Add bitmap_get_dominated_by

2015-06-18  Tom de Vries  

	* dominance.c (bitmap_get_dominated_by): New function.
	* dominance.h (bitmap_get_dominated_by): Declare.
	* tree-ssa-loop-manip.c (rewrite_virtuals_into_loop_closed_ssa): Use
	bitmap_get_dominated_by.
---
 gcc/dominance.c   | 21 +
 gcc/dominance.h   |  1 +
 gcc/tree-ssa-loop-manip.c | 10 +-
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 09c8c90..4b35ec4 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -757,6 +757,27 @@ set_immediate_dominator (enum cdi_direction dir, basic_block bb,
 dom_computed[dir_index] = DOM_NO_FAST_QUERY;
 }
 
+/* Returns in BBS the list of basic blocks immediately dominated by BB, in the
+   direction DIR.  As get_dominated_by, but returns result as a bitmap.  */
+
+void
+bitmap_get_dominated_by (enum cdi_direction dir, basic_block bb, bitmap bbs)
+{
+  unsigned int dir_index = dom_convert_dir_to_idx (dir);
+  struct et_node *node = bb->dom[dir_index], *son = node->son, *ason;
+
+  bitmap_clear (bbs);
+
+  gcc_checking_assert (dom_computed[dir_index]);
+
+  if (!son)
+return;
+
+  bitmap_set_bit (bbs, ((basic_block) son->data)->index);
+  for (ason = son->right; ason != son; ason = ason->right)
+bitmap_set_bit (bbs, ((basic_block) son->data)->index);
+}
+
 /* Returns the list of basic blocks immediately dominated by BB, in the
direction DIR.  */
 vec 
diff --git a/gcc/dominance.h b/gcc/dominance.h
index 37e138b..0a1a13e 100644
--- a/gcc/dominance.h
+++ b/gcc/dominance.h
@@ -41,6 +41,7 @@ extern void free_dominance_info (enum cdi_direction);
 extern basic_block get_immediate_dominator (enum cdi_direction, basic_block);
 extern void set_immediate_dominator (enum cdi_direction, basic_block,
  basic_block);
+extern void bitmap_get_dominated_by (enum cdi_direction, basic_block, bitmap);
 extern vec get_dominated_by (enum cdi_direction, basic_block);
 extern vec get_dominated_by_region (enum cdi_direction,
 			 basic_block *,
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 1150e6c..9c558ca 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -638,16 +638,8 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
 
   /* Gather the bbs dominated by the exit block.  */
   bitmap exit_dominated = BITMAP_ALLOC (NULL);
+  bitmap_get_dominated_by (CDI_DOMINATORS, exit->dest, exit_dominated);
   bitmap_set_bit (exit_dominated, exit->dest->index);
-  vec exit_dominated_vec
-= get_dominated_by (CDI_DOMINATORS, exit->dest);
-
-  int i;
-  basic_block dom_bb;
-  FOR_EACH_VEC_ELT (exit_dominated_vec, i, dom_bb)
-bitmap_set_bit (exit_dominated, dom_bb->index);
-
-  exit_dominated_vec.release ();
 
   replace_uses_in_bbs_by (final_loop, res_new, exit_dominated);
 
-- 
1.9.1



[gomp4, committed] Add get_virtual_phi

2015-06-24 Thread Tom de Vries

Hi,

this patch factors new function get_virtual_phi out of 
rewrite_virtuals_into_loop_closed_ssa.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Add get_virtual_phi

2015-06-18  Tom de Vries  

	* tree-ssa-loop-manip.c (get_virtual_phi): Factor out of ...
	(rewrite_virtuals_into_loop_closed_ssa): ... here.
---
 gcc/tree-ssa-loop-manip.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 0d2c972..b7c3676 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -603,6 +603,24 @@ replace_uses_in_dominated_bbs (tree old_val, tree new_val, basic_block bb)
   BITMAP_FREE (dominated);
 }
 
+/* Return the virtual phi in BB.  */
+
+static gphi *
+get_virtual_phi (basic_block bb)
+{
+  for (gphi_iterator gsi = gsi_start_phis (bb);
+   !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  gphi *phi = gsi.phi ();
+
+  if (virtual_operand_p (PHI_RESULT (phi)))
+	return phi;
+}
+
+  return NULL;
+}
+
 /* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
In other words, ensure loop-closed ssa normal form for virtuals.  */
 
@@ -612,35 +630,13 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
   gphi *phi;
   edge exit = single_dom_exit (loop);
 
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (loop->header);
-   !gsi_end_p (gsi);
-   gsi_next (&gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
+  phi = get_virtual_phi (loop->header);
   if (phi == NULL)
 return;
 
   tree final_loop = PHI_ARG_DEF_FROM_EDGE (phi, single_succ_edge (loop->latch));
 
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (exit->dest);
-   !gsi_end_p (gsi);
-   gsi_next (&gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
+  phi = get_virtual_phi (exit->dest);
   if (phi != NULL)
 {
   tree final_exit = PHI_ARG_DEF_FROM_EDGE (phi, exit);
-- 
1.9.1



[gomp4, committed] Add replace_uses_in_dominated_bbs

2015-06-24 Thread Tom de Vries

Hi,

this patch factors out a new function replace_uses_in_dominated_bbs
out of rewrite_virtuals_into_loop_closed_ssa.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Add replace_uses_in_dominated_bbs

2015-06-18  Tom de Vries  

	* tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs): Factor out
	of ...
	(rewrite_virtuals_into_loop_closed_ssa): ... here.
---
 gcc/tree-ssa-loop-manip.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 9c558ca..0d2c972 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -588,6 +588,21 @@ replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
 }
 }
 
+/* Replace uses of OLD_VAL with NEW_VAL in bbs dominated by BB.  */
+
+static void
+replace_uses_in_dominated_bbs (tree old_val, tree new_val, basic_block bb)
+{
+  bitmap dominated = BITMAP_ALLOC (NULL);
+
+  bitmap_get_dominated_by (CDI_DOMINATORS, bb, dominated);
+  bitmap_set_bit (dominated, bb->index);
+
+  replace_uses_in_bbs_by (old_val, new_val, dominated);
+
+  BITMAP_FREE (dominated);
+}
+
 /* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
In other words, ensure loop-closed ssa normal form for virtuals.  */
 
@@ -635,17 +650,8 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
 
   tree res_new = copy_ssa_name (final_loop, NULL);
   gphi *nphi = create_phi_node (res_new, exit->dest);
-
-  /* Gather the bbs dominated by the exit block.  */
-  bitmap exit_dominated = BITMAP_ALLOC (NULL);
-  bitmap_get_dominated_by (CDI_DOMINATORS, exit->dest, exit_dominated);
-  bitmap_set_bit (exit_dominated, exit->dest->index);
-
-  replace_uses_in_bbs_by (final_loop, res_new, exit_dominated);
-
+  replace_uses_in_dominated_bbs (final_loop, res_new, exit->dest);
   add_phi_arg (nphi, final_loop, exit, UNKNOWN_LOCATION);
-
-  BITMAP_FREE (exit_dominated);
 }
 
 /* Check invariants of the loop closed ssa form for the USE in BB.  */
-- 
1.9.1



Re: [PATCH/AARCH64] Update ThunderX schedule model

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 10:00:21PM +0100, Andrew Pinski wrote:
> Hi,
>   This patch updates the schedule model to be more accurate and model
> SIMD and fp instructions that I had missed out when I had the last
> patch.
> 
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regeessions.

These TBL descriptions look a little large...

> +;; 64bit TBL is emulated and takes 160 cycles
> +(define_insn_reservation "thunderx_tbl" 160
> +  (and (eq_attr "tune" "thunderx")
> +   (eq_attr "type" "neon_tbl1"))
> +  "(thunderx_pipe1+thunderx_pipe0)*160")
> +
> +;; 128bit TBL is emulated and takes 320 cycles
> +(define_insn_reservation "thunderx_tblq" 320
> +  (and (eq_attr "tune" "thunderx")
> +   (eq_attr "type" "neon_tbl1_q"))
> +  "(thunderx_pipe1+thunderx_pipe0)*320")
>  

Is there really value in modelling this as taking up 320 cycles and
blocking both execution units for the same? Surely you can achieve much
the same effect with smaller numbers... The difference modelling complete
machine reservation for 320 cycles and 10 cycles should have next to no
effect on performance - if these are blocking for so long it shouldn't
matter where you place them in the instruction stream. 

On the other hand, the thunderx scheduler is generally well behaved
as it splits execution units to their own automata - certainly when
compared to the 60,000 states of the combined Cortex-A53 scheduler,
and this patch as it stands does not bloat it by much:

Automaton `thunderx_main'
  323 NDFA states,657 NDFA arcs
  323 DFA states, 657 DFA arcs
  323 minimal DFA states, 657 minimal DFA arcs
  213 all insns  9 insn equivalence classes
0 locked states
  661 transition comb vector els,  2907 trans table els: use comb vect
 2907 min delay table els, compression factor 1

Compared with setting both of these reservations to

> +  "(thunderx_pipe1+thunderx_pipe0)*10")

Which gives:

Automaton `thunderx_main'
   13 NDFA states, 36 NDFA arcs
   13 DFA states,  36 DFA arcs
   13 minimal DFA states,  36 minimal DFA arcs
  213 all insns  8 insn equivalence classes
0 locked states
   39 transition comb vector els,   104 trans table els: use comb vect
  104 min delay table els, compression factor 2

So I think that while you could consider revisiting the TBL reservations;
given the above statistics, this patch is OK for trunk.

Thanks,
James

> ChangeLog:
> 
>  * config/aarch64/thunderx.md (thunderx_shift): Add rbit and rev.
> (thunderx_crc32): New reservation.
> (thunderx_fmov): Add fcsel, ffarithd and ffariths.
> (thunderx_fabs): New reservation.
> (thunderx_fcsel): New reservation.
> (thunderx_fcmp): New reservation.
> (thunderx_fsqrtd): Correct latency.
> (thunderx_frint): Add f_cvt.
> (thunderx_f_cvt): Remove f_cvt.
> (thunderx_simd_fp_store): Add neon_store1_one_lane
> and neon_store1_one_lane_q.
> (thunderx_neon_ld1): New reservation.
> (thunderx_neon_move): Add neon_dup.
> neon_ins, neon_from_gp, neon_to_gp,
> neon_abs, neon_neg,
> neon_fp_neg_s, and neon_fp_abs_s.
> (thunderx_neon_move_q): Add neon_dup_q,
> neon_ins_q, neon_from_gp_q, neon_to_gp_q,
> neon_abs_q, neon_neg_q,
> neon_fp_neg_s_q, neon_fp_neg_d_q,
> neon_fp_abs_s_q, and neon_fp_abs_d_q.
> (thunderx_neon_add): Add neon_arith_acc, neon_rev, neon_fp_abd_s,
> neon_fp_abd_d, and neon_fp_reduc_minmax_s.
> (thunderx_neon_add_q): Add neon_fp_abd_s_q, neon_fp_abd_d_q,
> neon_arith_acc_q, neon_rev_q,
> neon_fp_reduc_minmax_s_q, and neon_fp_reduc_minmax_d_q.
> (thunderx_neon_mult): New reservation.
> (thunderx_neon_mult_q): New reservation.
> (thunderx_crypto_aese): New reservation.
> (thunderx_crypto_aesmc): New reservation.
> (bypasses): Add bypass to thunderx_neon_mult_q.
> (thunderx_tbl): New reservation.
> (thunderx_tblq): New reservation.

> Index: config/aarch64/thunderx.md
> ===
> --- config/aarch64/thunderx.md(revision 224856)
> +++ config/aarch64/thunderx.md(working copy)
> @@ -39,7 +39,7 @@ (define_insn_reservation "thunderx_add"
>  
>  (define_insn_reservation "thunderx_shift" 1
>(and (eq_attr "tune" "thunderx")
> -   (eq_attr "type" "bfm,extend,shift_imm,shift_reg"))
> +   (eq_attr "type" "bfm,extend,shift_imm,shift_reg,rbit,rev"))
>"thunderx_pipe0 | thunderx_pipe1")
>  
>  
> @@ -66,12 +66,18 @@ (define_insn_reservation "thunderx_mul"
> (eq_attr "type" "mul,muls,mla,mlas,clz,smull,umull,smlal,umlal"))
>"thunderx_pipe1 + thunderx_mult")
>  
> -;; Multiply high instructions take an extra cycle and cause the muliply unit 
> to
> -;; be busy for an extra cycle.
> +;; crcb,crch,crcw is 4 cycles and can only happen on pipe 1
>  
> -;(define_insn_reservation "thunderx_mul_high" 5
> +(define_insn_reservation "thunderx_crc32" 4
> +  (and (eq_attr "tune" "thunderx")
> +   (eq_attr "type" "crc"))
> +  "thunderx_pipe1 + thunderx_mult")
> +
> +;; crcx is 

Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Tue, Jun 23, 2015 at 11:27 PM, Marc Glisse  wrote:
> On Tue, 23 Jun 2015, Richard Sandiford wrote:
>
>> +/* Vector comparisons are defined to produce all-one or all-zero results.
>> */
>> +(simplify
>> + (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>> +   (convert @0)))
>
>
> I am trying to understand why the test tree_nop_conversion_p is the right
> one (at least for the transformations not using VIEW_CONVERT_EXPR). By
> definition of VEC_COND_EXPR, type and TREE_TYPE (@0) are both integer vector
> types of the same size and number of elements. It thus seems like a
> conversion is always fine. For vectors, tree_nop_conversion_p apparently
> only checks that they have the same mode (quite often VOIDmode I guess).

The only conversion we seem to allow is changing the signed vector from
the comparison result to an unsigned vector (same number of elements
and same mode of the elements).  That is, a check using
TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (@0)) would probably
be better (well, technically a TYPE_VECTOR_SUBPARTS && element
mode compare should be better as generic vectors might not have a vector mode).

I'm fine with using tree_nop_conversion_p for now.

>> +/* We could instead convert all instances of the vec_cond to negate,
>> +   but that isn't necessarily a win on its own.  */

so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks more compact
at least ;)  It would also simplify the patterns below.

I'm missing a comment on the transform done by the following patterns.

>> +(simplify
>> + (plus:c @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>> +  (minus @3 (convert @0
>> +
>> +(simplify
>> + (plus:c @3 (view_convert_expr
>
>
> Aren't we suppose to drop _expr in match.pd?

Yes.  I probably should adjust genmatch.c to reject the _expr variants ;)

>
>> +(vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>> +  (minus @3 (convert @0
>> +
>> +(simplify
>> + (minus @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>> +  (plus @3 (convert @0
>> +
>> +(simplify
>> + (minus @3 (view_convert_expr
>> +   (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>> +  (plus @3 (convert @0
>> +

Generally for sign-conversions of vectors you should use view_convert.

The above also hints at missing conditional view_convert support
and a way to iterate over commutative vs. non-commutative ops so
we could write

(for op (plus:c minus)
 rop (minus plus)
  (op @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (rop @3 (view_convert @0)

I'll see implementing that.

Richard.


>> /* Simplifications of comparisons.  */
>>
>> /* We can simplify a logical negation of a comparison to the
>> Index: gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c
>> ===
>> --- /dev/null   2015-06-02 17:27:28.541944012 +0100
>> +++ gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c2015-06-23
>> 12:06:27.120203685 +0100
>> @@ -0,0 +1,94 @@
>> +/* Make sure that vector comaprison results are not unnecessarily ANDed
>> +   with vectors of 1.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-vectorize" } */
>> +
>> +#define COUNT1(X) if (X) count += 1
>> +#define COUNT2(X) if (X) count -= 1
>> +#define COUNT3(X) count += (X)
>> +#define COUNT4(X) count -= (X)
>> +
>> +#define COND1(X) (X)
>> +#define COND2(X) ((X) ? 1 : 0)
>> +#define COND3(X) ((X) ? -1 : 0)
>> +#define COND4(X) ((X) ? 0 : 1)
>> +#define COND5(X) ((X) ? 0 : -1)
>> +
>> +#define TEST_LT(X, Y) ((X) < (Y))
>> +#define TEST_LE(X, Y) ((X) <= (Y))
>> +#define TEST_GT(X, Y) ((X) > (Y))
>> +#define TEST_GE(X, Y) ((X) >= (Y))
>> +#define TEST_EQ(X, Y) ((X) == (Y))
>> +#define TEST_NE(X, Y) ((X) != (Y))
>> +
>> +#define COUNT_LOOP(ID, TYPE, CMP_ARRAY, TEST, COUNT) \
>> +  TYPE \
>> +  reduc_##ID (__typeof__ (CMP_ARRAY[0]) x) \
>> +  { \
>> +TYPE count = 0; \
>> +for (unsigned int i = 0; i < 1024; ++i) \
>> +  COUNT (TEST (CMP_ARRAY[i], x)); \
>> +return count; \
>> +  }
>> +
>> +#define COND_LOOP(ID, ARRAY, CMP_ARRAY, TEST, COND) \
>> +  void \
>> +  plus_##ID (__typeof__ (CMP_ARRAY[0]) x) \
>> +  { \
>> +for (unsigned int i = 0; i < 1024; ++i) \
>> +  ARRAY[i] += COND (TEST (CMP_ARRAY[i], x)); \
>> +  } \
>> +  void \
>> +  plusc_##ID (void) \
>> +  { \
>> +for (unsigned int i = 0; i < 1024; ++i) \
>> +  ARRAY[i] += COND (TEST (CMP_ARRAY[i], 10)); \
>> +  } \
>> +  void \
>> +  minus_##ID (__typeof__ (CMP_ARRAY[0]) x) \
>> +  { \
>> +for (unsigned int i = 0; i < 1024; ++i) \
>> +  ARRAY[i] -= COND (TEST (CMP_ARRAY[i], x)); \
>> +  } \
>>

[ARM] Correct spelling of references to ARMv6KZ

2015-06-24 Thread Matthew Wahab

Hello,

GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made
visible by the command line option -march=armv6zk and by the predefined macro
__ARM_ARCH_6ZK__.

This patch corrects the spelling internally and adds -march=armv6kz. To preserve
existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and
both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the
architecture.

Use of -march=arm6kz will need to wait for binutils to be updated, a patch has
been submitted (https://sourceware.org/ml/binutils/2015-06/msg00236.html). Use
of the existing spelling, -march=arm6zk, still works with current binutils.

Tested arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-15-24  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv6kz". Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit "__ARM_ARCH_6ZK__"
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add "armv6kz".
* doc/invoke.texi: Replace "armv6zk" with "armv6kz" and
"armv6zkt2" with "armv6kzt2".
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..3dafaa5 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -44,7 +44,8 @@ ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
 ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
 ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
 ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
+ARM_ARCH("armv6kz", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
+ARM_ARCH("armv6zk", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
 ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
 ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
 ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6aa59ad..e2d458c 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -169,6 +169,11 @@ arm_cpu_builtins (struct cpp_reader* pfile, int flags)
 }
   if (arm_arch_iwmmxt2)
 builtin_define ("__IWMMXT2__");
+  /* ARMv6KZ was originally identified as the misspelled __ARM_ARCH_6ZK__.  To
+ preserve the existing behaviour, the misspelled feature macro must still be
+ defined.  */
+  if (arm_arch6kz)
+builtin_define ("__ARM_ARCH_6ZK__");
   if (TARGET_AAPCS_BASED)
 {
   if (arm_pcs_default == ARM_PCS_AAPCS_VFP)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..9d47fcf 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,8 +125,8 @@ ARM_CORE("arm1026ej-s",	arm1026ejs, arm1026ejs,	5TEJ, FL_LDSCHED, 9e)
 /* V6 Architecture Processors */
 ARM_CORE("arm1136j-s",		arm1136js, arm1136js,		6J,  FL_LDSCHED, 9e)
 ARM_CORE("arm1136jf-s",		arm1136jfs, arm1136jfs,		6J,  FL_LDSCHED | FL_VFPV2, 9e)
-ARM_CORE("arm1176jz-s",		arm1176jzs, arm1176jzs,		6ZK, FL_LDSCHED, 9e)
-ARM_CORE("arm1176jzf-s",	arm1176jzfs, arm1176jzfs,	6ZK, FL_LDSCHED | FL_VFPV2, 9e)
+ARM_CORE("arm1176jz-s",		arm1176jzs, arm1176jzs,		6KZ, FL_LDSCHED, 9e)
+ARM_CORE("arm1176jzf-s",	arm1176jzfs, arm1176jzfs,	6KZ, FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE("mpcorenovfp",		mpcorenovfp, mpcorenovfp,	6K,  FL_LDSCHED, 9e)
 ARM_CORE("mpcore",		mpcore, mpcore,			6K,  FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE("arm1156t2-s",		arm1156t2s, arm1156t2s,		6T2, FL_LDSCHED, v6t2)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..7aae934 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -382,6 +382,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 
 #define FL_IWMMXT (1 << 29)	  /* XScale v2 or "Intel Wireless MMX technology".  */
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
+#define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -401,7 +402,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH6J	FL_FOR_ARCH6
 #define FL_FOR_ARCH6K	(FL_FOR_ARCH6 | FL_ARCH6K)
 #define FL_FOR_ARCH6Z	FL_FOR_ARCH6
-#define FL_FOR_ARCH6ZK	FL_FOR_ARCH6K
+#define FL_FOR_ARCH6KZ	(FL_FOR_ARCH6K | FL_ARCH6KZ)
 #defi

[PATCH IRA] save a bitmap check

2015-06-24 Thread Zhouyi Zhou

In function assign_hard_reg, checking the bit of conflict_a in 
consideration_allocno_bitmap is unneccesary, because when retry_p is 
false, conflicting objects are always inside of the same loop_node
(this is ensured in function process_bb_node_lives which marks the
living objects to death near the end of that function).

   

Bootstrap and regtest scheduled on x86_64 GNU/Linux
Signed-off-by: Zhouyi Zhou 
---
 gcc/ChangeLog   | 4 
 gcc/ira-color.c | 6 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d1f82b2..07605ae 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-06-24  Zhouyi Zhou  
+
+   * ira-color.c (assign_hard_reg): save a bitmap check
+   
 2015-06-24  Andreas Krebbel  
 
PR rtl-optimization/66306
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 6c53507..d7776d6 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -1733,14 +1733,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
  /* Reload can give another class so we need to check all
 allocnos.  */
  if (!retry_p
- && (!bitmap_bit_p (consideration_allocno_bitmap,
-ALLOCNO_NUM (conflict_a))
- || ((!ALLOCNO_ASSIGNED_P (conflict_a)
+ && ((!ALLOCNO_ASSIGNED_P (conflict_a)
   || ALLOCNO_HARD_REGNO (conflict_a) < 0)
  && !(hard_reg_set_intersect_p
   (profitable_hard_regs,
ALLOCNO_COLOR_DATA
-   (conflict_a)->profitable_hard_regs)
+   (conflict_a)->profitable_hard_regs
continue;
  conflict_aclass = ALLOCNO_CLASS (conflict_a);
  ira_assert (ira_reg_classes_intersect_p
-- 
1.9.1











Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Jun 23, 2015 at 11:27 PM, Marc Glisse  wrote:
>> On Tue, 23 Jun 2015, Richard Sandiford wrote:
>>
>>> +/* Vector comparisons are defined to produce all-one or all-zero results.
>>> */
>>> +(simplify
>>> + (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
>>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>> +   (convert @0)))
>>
>>
>> I am trying to understand why the test tree_nop_conversion_p is the right
>> one (at least for the transformations not using VIEW_CONVERT_EXPR). By
>> definition of VEC_COND_EXPR, type and TREE_TYPE (@0) are both integer vector
>> types of the same size and number of elements. It thus seems like a
>> conversion is always fine. For vectors, tree_nop_conversion_p apparently
>> only checks that they have the same mode (quite often VOIDmode I guess).
>
> The only conversion we seem to allow is changing the signed vector from
> the comparison result to an unsigned vector (same number of elements
> and same mode of the elements).  That is, a check using
> TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (@0)) would probably
> be better (well, technically a TYPE_VECTOR_SUBPARTS && element
> mode compare should be better as generic vectors might not have a vector 
> mode).

OK.  The reason I was being paranoid was that I couldn't see anywhere
where we enforced that the vector condition in a VEC_COND had to have
the same element width as the values being selected.  tree-cfg.c
only checks that rhs2 and rhs3 are compatible with the result.
There doesn't seem to be any checking of rhs1 vs. the other types.
So I wasn't sure whether anything stopped us from, e.g., comparing two
V4HIs and using the result to select between two V4SIs.

> I'm fine with using tree_nop_conversion_p for now.

I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
mode.  How about:

 (if (VECTOR_INTEGER_TYPE_P (type)
  && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
  && (TYPE_MODE (TREE_TYPE (type))
  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

(But is it really OK to be adding more mode-based compatibility checks?
I thought you were hoping to move away from modes in the middle end.)

>>> +/* We could instead convert all instances of the vec_cond to negate,
>>> +   but that isn't necessarily a win on its own.  */
>
> so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks more compact
> at least ;)  It would also simplify the patterns below.

In the past I've dealt with processors where arithmetic wasn't handled
as efficiently as logical ops.  Seems like an especial risk for 64-bit
elements, from a quick scan of the i386 scheduling models.

> I'm missing a comment on the transform done by the following patterns.

Heh.  The comment was supposed to be describing all four at once.
I originally had then bunched together without whitespace, but it
looked bad.

>>> +(simplify
>>> + (plus:c @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
>>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>> +  (minus @3 (convert @0
>>> +
>>> +(simplify
>>> + (plus:c @3 (view_convert_expr
>>
>>
>> Aren't we suppose to drop _expr in match.pd?
>
> Yes.  I probably should adjust genmatch.c to reject the _expr variants ;)

OK.

>>> +(vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>> +  (minus @3 (convert @0
>>> +
>>> +(simplify
>>> + (minus @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
>>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>> +  (plus @3 (convert @0
>>> +
>>> +(simplify
>>> + (minus @3 (view_convert_expr
>>> +   (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>> +  (plus @3 (convert @0
>>> +
>
> Generally for sign-conversions of vectors you should use view_convert.

OK.

> The above also hints at missing conditional view_convert support
> and a way to iterate over commutative vs. non-commutative ops so
> we could write
>
> (for op (plus:c minus)
>  rop (minus plus)
>   (op @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>(rop @3 (view_convert @0)
>
> I'll see implementing that.

Looks good. :-)

I also realised later that:

/* Vector comparisons are defined to produce all-one or all-zero results.  */
(simplify
 (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (convert @0)))

is redundant with some fold-const.c code.

Thanks,
Richard



[PATCH] Support conditional view_convert in match.pd

2015-06-24 Thread Richard Biener

Tested on match.pd (same code generation) and a toy pattern.  Will apply
after bootstrap & regtest on x86_64-unknown-linux-gnu.

Richard.

2015-06-24  Richard Biener  

* genmatch.c (enum tree_code): Add VIEW_CONVERT[012].
(main): Likewise.
(lower_opt_convert): Support lowering of conditional view_convert.
(parser::parse_operation): Likewise.
(parser::parse_for): Likewise.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 224888)
+++ gcc/genmatch.c  (working copy)
@@ -161,6 +161,9 @@ enum tree_code {
 CONVERT0,
 CONVERT1,
 CONVERT2,
+VIEW_CONVERT0,
+VIEW_CONVERT1,
+VIEW_CONVERT2,
 MAX_TREE_CODES
 };
 #undef DEFTREECODE
@@ -749,12 +752,14 @@ lower_commutative (simplify *s, vec (o))
 {
   if (c->what)
-   return new capture (c->where, lower_opt_convert (c->what, oper, strip));
+   return new capture (c->where,
+   lower_opt_convert (c->what, oper, to_oper, strip));
   else
return c;
 }
@@ -766,16 +771,18 @@ lower_opt_convert (operand *o, enum tree
   if (*e->operation == oper)
 {
   if (strip)
-   return lower_opt_convert (e->ops[0], oper, strip);
+   return lower_opt_convert (e->ops[0], oper, to_oper, strip);
 
-  expr *ne = new expr (get_operator ("CONVERT_EXPR"));
-  ne->append_op (lower_opt_convert (e->ops[0], oper, strip));
+  expr *ne = new expr (to_oper == CONVERT_EXPR
+  ? get_operator ("CONVERT_EXPR")
+  : get_operator ("VIEW_CONVERT_EXPR"));
+  ne->append_op (lower_opt_convert (e->ops[0], oper, to_oper, strip));
   return ne;
 }
 
   expr *ne = new expr (e->operation, e->is_commutative);
   for (unsigned i = 0; i < e->ops.length (); ++i)
-ne->append_op (lower_opt_convert (e->ops[i], oper, strip));
+ne->append_op (lower_opt_convert (e->ops[i], oper, to_oper, strip));
 
   return ne;
 }
@@ -818,20 +825,28 @@ lower_opt_convert (operand *o)
 
   v1.safe_push (o);
 
-  enum tree_code opers[] = { CONVERT0, CONVERT1, CONVERT2 };
+  enum tree_code opers[]
+= { CONVERT0, CONVERT_EXPR,
+   CONVERT1, CONVERT_EXPR,
+   CONVERT2, CONVERT_EXPR,
+   VIEW_CONVERT0, VIEW_CONVERT_EXPR,
+   VIEW_CONVERT1, VIEW_CONVERT_EXPR,
+   VIEW_CONVERT2, VIEW_CONVERT_EXPR };
 
   /* Conditional converts are lowered to a pattern with the
  conversion and one without.  The three different conditional
  convert codes are lowered separately.  */
 
-  for (unsigned i = 0; i < 3; ++i)
+  for (unsigned i = 0; i < sizeof (opers) / sizeof (enum tree_code); i += 2)
 {
   v2 = vNULL;
   for (unsigned j = 0; j < v1.length (); ++j)
if (has_opt_convert (v1[j], opers[i]))
  {
-   v2.safe_push (lower_opt_convert (v1[j], opers[i], false));
-   v2.safe_push (lower_opt_convert (v1[j], opers[i], true));
+   v2.safe_push (lower_opt_convert (v1[j],
+opers[i], opers[i+1], false));
+   v2.safe_push (lower_opt_convert (v1[j],
+opers[i], opers[i+1], true));
  }
 
   if (v2 != vNULL)
@@ -2890,14 +2905,22 @@ parser::parse_operation ()
   const cpp_token *token = peek ();
   if (strcmp (id, "convert0") == 0)
 fatal_at (id_tok, "use 'convert?' here");
+  else if (strcmp (id, "view_convert0") == 0)
+fatal_at (id_tok, "use 'view_convert?' here");
   if (token->type == CPP_QUERY
   && !(token->flags & PREV_WHITE))
 {
   if (strcmp (id, "convert") == 0)
id = "convert0";
-  else if (strcmp  (id, "convert1") == 0)
+  else if (strcmp (id, "convert1") == 0)
;
-  else if (strcmp  (id, "convert2") == 0)
+  else if (strcmp (id, "convert2") == 0)
+   ;
+  else if (strcmp (id, "view_convert") == 0)
+   id = "view_convert0";
+  else if (strcmp (id, "view_convert1") == 0)
+   ;
+  else if (strcmp (id, "view_convert2") == 0)
;
   else
fatal_at (id_tok, "non-convert operator conditionalized");
@@ -2907,8 +2930,10 @@ parser::parse_operation ()
  "match expression");
   eat_token (CPP_QUERY);
 }
-  else if (strcmp  (id, "convert1") == 0
-  || strcmp  (id, "convert2") == 0)
+  else if (strcmp (id, "convert1") == 0
+  || strcmp (id, "convert2") == 0
+  || strcmp (id, "view_convert1") == 0
+  || strcmp (id, "view_convert2") == 0)
 fatal_at (id_tok, "expected '?' after conditional operator");
   id_base *op = get_operator (id);
   if (!op)
@@ -3325,7 +3350,9 @@ parser::parse_for (source_location)
  id_base *idb = get_operator (oper);
  if (idb == NULL)
fatal_at (token, "no such operator '%s'", oper);
- if (*idb == CONVERT0 || *idb == CONVERT1 || *idb == CONVERT2)
+ if (*idb == CONVERT0 || *idb == CONVERT1 || *idb == CONVERT2
+   

[gomp4.1] Add affinity query routines

2015-06-24 Thread Jakub Jelinek
Hi!

This got enacted earlier this week, a couple of routines to query
the affinity.

2015-06-24  Jakub Jelinek  

* omp.h.in (omp_get_num_places, omp_get_place_num_procs,
omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
prototypes.
* omp_lib.f90.in (omp_get_num_places, omp_get_place_num_procs,
omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
interfaces.
* omp_lib.h.in (omp_get_num_places, omp_get_place_num_procs,
omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
externs.
* libgomp.map (OMP_4.1): Export omp_get_num_places,
omp_get_place_num_procs, omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums,
omp_get_num_places_, omp_get_place_num_procs_,
omp_get_place_num_procs_8_, omp_get_place_proc_ids_,
omp_get_place_proc_ids_8_, omp_get_place_num_,
omp_get_partition_num_places_, omp_get_partition_place_nums_
and omp_get_partition_place_nums_8_.
* libgomp.h (gomp_get_place_proc_ids_8): New prototype.
* env.c (omp_get_num_places, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
functions, add ialias for them.
* fortran.c (omp_get_num_places, omp_get_place_num, 
omp_get_partition_num_places, omp_get_partition_place_nums,
omp_get_place_num_procs, omp_get_place_proc_ids): New
ialias_redirects.
(omp_get_num_places_, omp_get_place_num_procs_,
omp_get_place_num_procs_8_, omp_get_place_proc_ids_,
omp_get_place_proc_ids_8_, omp_get_place_num_,
omp_get_partition_num_places_, omp_get_partition_place_nums_,
omp_get_partition_place_nums_8_): New functions.
* config/linux/affinity.c (omp_get_place_num_procs,
omp_get_place_proc_ids, gomp_get_place_proc_ids_8): New functions.
* config/posix/affinity.c (omp_get_place_num_procs,
omp_get_place_proc_ids, gomp_get_place_proc_ids_8): New functions.
* testsuite/libgomp.c/affinity-2.c: New test.
* testsuite/libgomp.fortran/affinity1.f90: New test.
* testsuite/libgomp.fortran/affinity2.f90: New test.

--- libgomp/omp.h.in.jj 2015-06-12 16:45:16.0 +0200
+++ libgomp/omp.h.in2015-06-23 19:24:15.056053879 +0200
@@ -125,6 +125,12 @@ extern int omp_in_final (void) __GOMP_NO
 
 extern int omp_get_cancellation (void) __GOMP_NOTHROW;
 extern omp_proc_bind_t omp_get_proc_bind (void) __GOMP_NOTHROW;
+extern int omp_get_num_places (void) __GOMP_NOTHROW;
+extern int omp_get_place_num_procs (int) __GOMP_NOTHROW;
+extern void omp_get_place_proc_ids (int, int *) __GOMP_NOTHROW;
+extern int omp_get_place_num (void) __GOMP_NOTHROW;
+extern int omp_get_partition_num_places (void) __GOMP_NOTHROW;
+extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
--- libgomp/omp_lib.f90.in.jj   2015-06-12 17:34:56.0 +0200
+++ libgomp/omp_lib.f90.in  2015-06-24 11:49:40.159360460 +0200
@@ -330,6 +330,58 @@
   end function omp_get_proc_bind
 end interface
 
+interface
+  function omp_get_num_places ()
+integer (4) :: omp_get_num_places
+  end function omp_get_num_places
+end interface
+
+interface omp_get_place_num_procs
+  function omp_get_place_num_procs (place_num)
+integer (4), intent(in) :: place_num
+integer (4) :: omp_get_place_num_procs
+  end function omp_get_place_num_procs
+
+  function omp_get_place_num_procs_8 (place_num)
+integer (8), intent(in) :: place_num
+integer (4) :: omp_get_place_num_procs_8
+  end function omp_get_place_num_procs_8
+end interface
+
+interface omp_get_place_proc_ids
+  subroutine omp_get_place_proc_ids (place_num, ids)
+integer (4), intent(in) :: place_num
+integer (4), intent(out) :: ids(*)
+  end subroutine omp_get_place_proc_ids
+
+  subroutine omp_get_place_proc_ids_8 (place_num, ids)
+integer (8), intent(in) :: place_num
+integer (8), intent(out) :: ids(*)
+  end subroutine omp_get_place_proc_ids_8
+end interface
+
+interface
+  function omp_get_place_num ()
+integer (4) :: omp_get_place_num
+  end function omp_get_place_num
+end interface
+
+interface
+  function omp_get_partition_num_places ()
+integer (4) :: omp_get_partition_num_places
+  end function omp_get_partition_num_places
+end interface
+
+interface omp_g

Re: [06/12] Consolidate string hashers

2015-06-24 Thread Mikhail Maltsev
On 23.06.2015 17:49, Richard Sandiford wrote:
> This patch replaces various string hashers with a single copy
> in hash-traits.h.

(snip)

> Index: gcc/config/alpha/alpha.c
> ===
> --- gcc/config/alpha/alpha.c  2015-06-23 15:48:30.751788389 +0100
> +++ gcc/config/alpha/alpha.c  2015-06-23 15:48:30.747788453 +0100
> @@ -4808,13 +4808,7 @@ alpha_multipass_dfa_lookahead (void)
>  
>  struct GTY(()) alpha_links;
>  
> -struct string_traits : default_hashmap_traits
> -{
> -  static bool equal_keys (const char *const &a, const char *const &b)
> -  {
> -return strcmp (a, b) == 0;
> -  }
> -};
> +typedef simple_hashmap_traits  string_traits;
>  

I remember that when we briefly discussed unification of string traits,
a looked through GCC code and this one seemed weird to me: it does not
reimplement the hash function. I.e. the pointer value is used as hash. I
wonder, is it intentional or not? This could actually work if strings
are interned (but in that case there is no need to compare them, because
comparing pointers would be enough).

-- 
Regards,
Mikhail Maltsev


Re: [PATCH 1/8] S/390 Vector ABI GNU Attribute.

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 8:57 AM, Andreas Krebbel
 wrote:
> With this patch .gnu_attribute is used to mark binaries with a vector
> ABI tag.  This is required since the z13 vector support breaks the ABI
> of existing vector_size attribute generated vector types:
>
> 1. vector_size(16) and bigger vectors are aligned to 8 byte
> boundaries (formerly vectors were always naturally aligned)
>
> 2. vector_size(16) or smaller vectors are passed via VR if available
> or by value on the stack (formerly vector were passed on the stack by
> reference).
>
> The .gnu_attribute will be used by ld to emit a warning if binaries
> with incompatible ABIs are being linked together:
> https://sourceware.org/ml/binutils/2015-04/msg00316.html
>
> And it will be used by GDB to perform inferior function calls using a
> vector ABI which fits to the binary being debugged:
> https://sourceware.org/ml/gdb-patches/2015-04/msg00833.html
>
> The current implementation tries to only set the attribute if the
> vector types are really used in ABI relevant contexts in order to
> avoid false positives during linking.
>
> However, this unfortunately has some limitations like in the following
> case where an ABI relevant context cannot be detected properly:
>
> typedef int __attribute__((vector_size(16))) v4si;
> struct A
> {
>   char x;
>   v4si y;
> };
> char a[sizeof(struct A)];
>
> The number of elements in a depends on the ABI (24 with -mvx and 32
> with -mno-vx).  However, the implementation is not able to detect this
> since the struct type is not used anywhere else and consequently does
> not survive until the checking code is able to see it.
>
> Ideas about how to improve the implementation without creating too
> many false postives are welcome.

I'd be more conservative and instead hook into
targetm.vector_mode_supported_p (and thus vector_type_mode).

Yes, it will trip on "local" vector types.  But I can't see how you
can avoid this in general without seeing the whole program.

If I'd do it retro-actively I'd reverse the flag and instead mark units
which use generic non-z13 vectors...

Note that other targets simply emit -Wpsabi warnings here:

> gcc t.c -S -m32
t.c: In function ‘foo’:
t.c:4:1: warning: SSE vector return without SSE enabled changes the
ABI [-Wpsabi]
 {
 ^
t.c:3:6: note: The ABI for passing parameters with 16-byte alignment
has changed in GCC 4.6
 v4si foo (v4si x)
  ^
t.c:3:6: warning: SSE vector argument without SSE enabled changes the
ABI [-Wpsabi]

for

typedef int v4si __attribute__((vector_size(16)));

v4si foo (v4si x)
{
  return x;
}

on i?86 without -msse2.  So you could as well do that - warn for vector
type uses on non-z13 and be done with that.

Richard.

> In particular we do not want to set the attribute for local uses of
> vector types as they would be natural for ifunc optimizations.
>
> gcc/
> * config/s390/s390.c (s390_vector_abi): New variable definition.
> (s390_check_type_for_vector_abi): New function.
> (TARGET_ASM_FILE_END): New macro definition.
> (s390_asm_file_end): New function.
> (s390_function_arg): Call s390_check_type_for_vector_abi.
> (s390_gimplify_va_arg): Likewise.
> * configure: Regenerate.
> * configure.ac: Check for .gnu_attribute Binutils feature.
>
> gcc/testsuite/
> * gcc.target/s390/vector/vec-abi-1.c: Add gnu attribute check.
> * gcc.target/s390/vector/vec-abi-attr-1.c: New test.
> * gcc.target/s390/vector/vec-abi-attr-2.c: New test.
> * gcc.target/s390/vector/vec-abi-attr-3.c: New test.
> * gcc.target/s390/vector/vec-abi-attr-4.c: New test.
> * gcc.target/s390/vector/vec-abi-attr-5.c: New test.
> * gcc.target/s390/vector/vec-abi-attr-6.c: New test.
> ---
>  gcc/config/s390/s390.c |  121 
> 
>  gcc/configure  |   36 ++
>  gcc/configure.ac   |7 ++
>  gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c   |1 +
>  .../gcc.target/s390/vector/vec-abi-attr-1.c|   18 +++
>  .../gcc.target/s390/vector/vec-abi-attr-2.c|   53 +
>  .../gcc.target/s390/vector/vec-abi-attr-3.c|   18 +++
>  .../gcc.target/s390/vector/vec-abi-attr-4.c|   17 +++
>  .../gcc.target/s390/vector/vec-abi-attr-5.c|   19 +++
>  .../gcc.target/s390/vector/vec-abi-attr-6.c|   24 
>  10 files changed, 314 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c
>
> diff --git a/gcc/config/s390/s390.c b/gcc/conf

Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 11:57 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Tue, Jun 23, 2015 at 11:27 PM, Marc Glisse  wrote:
>>> On Tue, 23 Jun 2015, Richard Sandiford wrote:
>>>
 +/* Vector comparisons are defined to produce all-one or all-zero results.
 */
 +(simplify
 + (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +   (convert @0)))
>>>
>>>
>>> I am trying to understand why the test tree_nop_conversion_p is the right
>>> one (at least for the transformations not using VIEW_CONVERT_EXPR). By
>>> definition of VEC_COND_EXPR, type and TREE_TYPE (@0) are both integer vector
>>> types of the same size and number of elements. It thus seems like a
>>> conversion is always fine. For vectors, tree_nop_conversion_p apparently
>>> only checks that they have the same mode (quite often VOIDmode I guess).
>>
>> The only conversion we seem to allow is changing the signed vector from
>> the comparison result to an unsigned vector (same number of elements
>> and same mode of the elements).  That is, a check using
>> TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (@0)) would probably
>> be better (well, technically a TYPE_VECTOR_SUBPARTS && element
>> mode compare should be better as generic vectors might not have a vector 
>> mode).
>
> OK.  The reason I was being paranoid was that I couldn't see anywhere
> where we enforced that the vector condition in a VEC_COND had to have
> the same element width as the values being selected.

We don't require that indeed.

>  tree-cfg.c
> only checks that rhs2 and rhs3 are compatible with the result.
> There doesn't seem to be any checking of rhs1 vs. the other types.
> So I wasn't sure whether anything stopped us from, e.g., comparing two
> V4HIs and using the result to select between two V4SIs.

Nothing does (or should).

>> I'm fine with using tree_nop_conversion_p for now.
>
> I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
> mode.  How about:
>
>  (if (VECTOR_INTEGER_TYPE_P (type)
>   && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
>   && (TYPE_MODE (TREE_TYPE (type))
>   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
>
> (But is it really OK to be adding more mode-based compatibility checks?
> I thought you were hoping to move away from modes in the middle end.)

The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
(the type of a comparison is always a signed vector integer type).
Yes, mode-based
checks are ok.  I don't see us moving away from them.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */
>>
>> so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks more compact
>> at least ;)  It would also simplify the patterns below.
>
> In the past I've dealt with processors where arithmetic wasn't handled
> as efficiently as logical ops.  Seems like an especial risk for 64-bit
> elements, from a quick scan of the i386 scheduling models.

But then expansion could undo this ...

>> I'm missing a comment on the transform done by the following patterns.
>
> Heh.  The comment was supposed to be describing all four at once.
> I originally had then bunched together without whitespace, but it
> looked bad.
>
 +(simplify
 + (plus:c @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (plus:c @3 (view_convert_expr
>>>
>>>
>>> Aren't we suppose to drop _expr in match.pd?
>>
>> Yes.  I probably should adjust genmatch.c to reject the _expr variants ;)
>
> OK.
>
 +(vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (minus @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +
 +(simplify
 + (minus @3 (view_convert_expr
 +   (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +
>>
>> Generally for sign-conversions of vectors you should use view_convert.
>
> OK.
>
>> The above also hints at missing conditional view_convert support
>> and a way to iterate over commutative vs. non-commutative ops so
>> we could write
>>
>> (for op (plus:c minus)
>>  rop (minus plus)
>>   (op @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>(rop @3 (view_convert @0)
>>
>> I'll see implementing that.
>
> Looks good. :-)
>
> I also realised later that:
>
> /* Vector comparisons are defined to produce all-one or all-zero results.  */
> (simplify
>  (vec_cond @0 integer_all_onesp@1 integer_zero

C PATCH to use is_global_var

2015-06-24 Thread Marek Polacek
This patch makes the C FE use the predicate is_global_var in place of direct

  TREE_STATIC (t) || DECL_EXTERNAL (t)

It should improve readability a bit and make predicates easier to follow.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  

* c-common.c (handle_no_reorder_attribute): Use is_global_var.
* cilk.c (extract_free_variables): Likewise.

* c-decl.c: Use is_global_var throughout.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index dee6550..d315854 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -7446,8 +7446,7 @@ handle_no_reorder_attribute (tree *pnode,
 {
   tree node = *pnode;
 
-  if (!VAR_OR_FUNCTION_DECL_P (node)
-   && !(TREE_STATIC (node) || DECL_EXTERNAL (node)))
+  if (!VAR_OR_FUNCTION_DECL_P (node) && !is_global_var (node))
 {
   warning (OPT_Wattributes,
"%qE attribute only affects top level objects",
diff --git gcc/c-family/cilk.c gcc/c-family/cilk.c
index c38e05f..347e4b9 100644
--- gcc/c-family/cilk.c
+++ gcc/c-family/cilk.c
@@ -1063,7 +1063,7 @@ extract_free_variables (tree t, struct wrapper_data *wd,
TREE_ADDRESSABLE (t) = 1;
 case VAR_DECL:
 case PARM_DECL:
-  if (!TREE_STATIC (t) && !DECL_EXTERNAL (t))
+  if (!is_global_var (t))
add_variable (wd, t, how);
   return;
 
diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index fc1fdf9..ab54db9 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2650,9 +2650,8 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, 
tree oldtype)
  tree_code_size (TREE_CODE (olddecl)) - sizeof (struct 
tree_decl_common));
  olddecl->decl_with_vis.symtab_node = snode;
 
- if ((DECL_EXTERNAL (olddecl)
-  || TREE_PUBLIC (olddecl)
-  || TREE_STATIC (olddecl))
+ if ((is_global_var (olddecl)
+  || TREE_PUBLIC (olddecl))
  && DECL_SECTION_NAME (newdecl) != NULL)
set_decl_section_name (olddecl, DECL_SECTION_NAME (newdecl));
 
@@ -4395,7 +4394,7 @@ c_decl_attributes (tree *node, tree attributes, int flags)
   /* Add implicit "omp declare target" attribute if requested.  */
   if (current_omp_declare_target_attribute
   && ((TREE_CODE (*node) == VAR_DECL
-  && (TREE_STATIC (*node) || DECL_EXTERNAL (*node)))
+  && is_global_var (*node))
  || TREE_CODE (*node) == FUNCTION_DECL))
 {
   if (TREE_CODE (*node) == VAR_DECL
@@ -4794,8 +4793,7 @@ finish_decl (tree decl, location_t init_loc, tree init,
   TREE_TYPE (decl) = error_mark_node;
 }
 
-  if ((DECL_EXTERNAL (decl) || TREE_STATIC (decl))
- && DECL_SIZE (decl) != 0)
+  if (is_global_var (decl) && DECL_SIZE (decl) != 0)
{
  if (TREE_CODE (DECL_SIZE (decl)) == INTEGER_CST)
constant_expression_warning (DECL_SIZE (decl));
@@ -4911,8 +4909,7 @@ finish_decl (tree decl, location_t init_loc, tree init,
{
  /* Recompute the RTL of a local array now
 if it used to be an incomplete type.  */
- if (was_incomplete
- && !TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
+ if (was_incomplete && !is_global_var (decl))
{
  /* If we used it already as memory, it must stay in memory.  */
  TREE_ADDRESSABLE (decl) = TREE_USED (decl);
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index e0ab0a1..f4d18bd 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -14769,7 +14769,7 @@ c_parser_omp_threadprivate (c_parser *parser)
error_at (loc, "%qD is not a variable", v);
   else if (TREE_USED (v) && !C_DECL_THREADPRIVATE_P (v))
error_at (loc, "%qE declared % after first use", v);
-  else if (! TREE_STATIC (v) && ! DECL_EXTERNAL (v))
+  else if (! is_global_var (v))
error_at (loc, "automatic variable %qE cannot be %", v);
   else if (TREE_TYPE (v) == error_mark_node)
;
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index aeb1043..3dc1f07 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -4380,7 +4380,7 @@ c_mark_addressable (tree exp)
if (C_DECL_REGISTER (x)
&& DECL_NONLOCAL (x))
  {
-   if (TREE_PUBLIC (x) || TREE_STATIC (x) || DECL_EXTERNAL (x))
+   if (TREE_PUBLIC (x) || is_global_var (x))
  {
error
  ("global register variable %qD used in nested function", x);
@@ -4390,7 +4390,7 @@ c_mark_addressable (tree exp)
  }
else if (C_DECL_REGISTER (x))
  {
-   if (TREE_PUBLIC (x) || TREE_STATIC (x) || DECL_EXTERNAL (x))
+   if (TREE_PUBLIC (x) || is_global_var (x))
  error ("address of global register variable %qD requested", x);
else
  error ("address of register variable %qD requested", x);
@@ -9470,8 +9470,7 @@ c_finish_return (location_t loc, tree retv

Re: [PATCH 3/3][AArch64 nofp] Fix another ICE with +nofp/-mgeneral-regs-only

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 05:03:28PM +0100, Alan Lawrence wrote:
> This fixes another ICE, obtained with the attached testcase - yes, there was 
> a 
> way to get hold of a float, without passing an argument or going through 
> movsf/movdf!
> 
> Bootstrapped + check-gcc on aarch64-none-linux-gnu.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.md (2):
>   Condition on TARGET_FLOAT.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/mgeneral-regs_3.c: New.

OK.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 99cefece8093791ccf17cb071a4e9997bda8fd89..bcaafda5ea46f136dc90f34aa8f2dfaddabd09f5
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4106,7 +4106,7 @@
>  (define_insn "2"
>[(set (match_operand:GPF 0 "register_operand" "=w,w")
>  (FLOATUORS:GPF (match_operand: 1 "register_operand" 
> "w,r")))]
> -  ""
> +  "TARGET_FLOAT"
>"@
> cvtf\t%0, %1
> cvtf\t%0, %1"
> diff --git a/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_3.c 
> b/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_3.c
> new file mode 100644
> index 
> ..225d9eaa45530d88315a146f3fae72d86fe66373
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_3.c
> @@ -0,0 +1,11 @@
> +/* { dg-options "-mgeneral-regs-only -O2" } */
> +
> +extern void abort (void);
> +
> +int
> +test (int i, ...)
> +{
> +  float f = (float) i; /* { dg-error "'-mgeneral-regs-only' is incompatible 
> with floating point code" } */
> +  if (f != f) abort ();
> +  return 2;
> +}



Re: [PATCH 2/3][AArch64 nofp] Clarify docs for +nofp/-mgeneral-regs-only

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 05:03:13PM +0100, Alan Lawrence wrote:
> James Greenhalgh wrote:
<>

> To my eye, beginning a sentence in lowercase looks very odd in pdf, and still 
> a 
> bit odd in html. Have changed to "That is"...?
> 
> Tested with make pdf & make html.
> 
> gcc/ChangeLog (unchanged):
> 
>   * doc/invoke.texi: Clarify AArch64 feature modifiers (no)fp, (no)simd
>   and (no)crypto.

OK.

Thanks,
James

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> d8e982c3aa338819df3785696c493a66c1f5b674..0579bf2ecf993bb56987e0bb9686925537ab61e3
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12359,7 +12359,10 @@ Generate big-endian code.  This is the default when 
> GCC is configured for an
>  
>  @item -mgeneral-regs-only
>  @opindex mgeneral-regs-only
> -Generate code which uses only the general registers.
> +Generate code which uses only the general-purpose registers.  This is 
> equivalent
> +to feature modifier @option{nofp} of @option{-march} or @option{-mcpu}, 
> except
> +that @option{-mgeneral-regs-only} takes precedence over any conflicting 
> feature
> +modifier regardless of sequence.
>  
>  @item -mlittle-endian
>  @opindex mlittle-endian
> @@ -12498,20 +12501,22 @@ over the appropriate part of this option.
>  @subsubsection @option{-march} and @option{-mcpu} Feature Modifiers
>  @cindex @option{-march} feature modifiers
>  @cindex @option{-mcpu} feature modifiers
> -Feature modifiers used with @option{-march} and @option{-mcpu} can be one
> -the following:
> +Feature modifiers used with @option{-march} and @option{-mcpu} can be any of
> +the following and their inverses @option{no@var{feature}}:
>  
>  @table @samp
>  @item crc
>  Enable CRC extension.
>  @item crypto
> -Enable Crypto extension.  This implies Advanced SIMD is enabled.
> +Enable Crypto extension.  This also enables Advanced SIMD and floating-point
> +instructions.
>  @item fp
> -Enable floating-point instructions.
> +Enable floating-point instructions.  This is on by default for all possible
> +values for options @option{-march} and @option{-mcpu}.
>  @item simd
> -Enable Advanced SIMD instructions.  This implies floating-point instructions
> -are enabled.  This is the default for all current possible values for options
> -@option{-march} and @option{-mcpu=}.
> +Enable Advanced SIMD instructions.  This also enables floating-point
> +instructions.  This is on by default for all possible values for options
> +@option{-march} and @option{-mcpu}.
>  @item lse
>  Enable Large System Extension instructions.
>  @item pan
> @@ -12522,6 +12527,10 @@ Enable Limited Ordering Regions support.
>  Enable ARMv8.1 Advanced SIMD instructions.
>  @end table
>  
> +That is, @option{crypto} implies @option{simd} implies @option{fp}.
> +Conversely, @option{nofp} (or equivalently, @option{-mgeneral-regs-only})
> +implies @option{nosimd} implies @option{nocrypto}.
> +
>  @node Adapteva Epiphany Options
>  @subsection Adapteva Epiphany Options
>  



Re: [PATCH] PR c++/65750

2015-06-24 Thread Paolo Carlini

Hi,

On 04/14/2015 11:34 PM, Jason Merrill wrote:

On 04/14/2015 05:27 PM, Adam Butcher wrote:

On 2015-04-10 15:57, Adam Butcher wrote:

+  cp_lexer_consume_token (parser->lexer);


Actually there should be two of these as the 'auto' isn't consumed yet.

OK.


I'm finishing retesting the amended patch and, if everything goes well, 
I will apply it to trunk, as approved by Jason (only additional minor 
tweak: testcase in cpp0x instead of cpp1y).


What about gcc-5-branch? It's a regression.

Thanks,
Paolo.


Re: New type-based pool allocator code miscompiled due to aliasing issue?

2015-06-24 Thread Martin Liška

On 06/23/2015 09:44 PM, Pat Haugen wrote:

On 06/18/2015 06:10 AM, Richard Biener wrote:

You are right that we should call ::new just for classes that have 
m_ignore_type_size == false.
>I've come up with following patch, that I tested slightly:
>
>diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
>index 1785df5..7da5f7a 100644
>--- a/gcc/alloc-pool.h
>+++ b/gcc/alloc-pool.h
>@@ -412,8 +412,16 @@ pool_allocator::allocate ()
>  #endif
>VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (header, size));
>
>+  T *ptr = (T *)header;
>+
>/* Call default constructor.  */
>-  return (T *)(header);
>+  if (!m_ignore_type_size)
>+{
>+  memset (header + sizeof (T), 0, m_extra_size);
>+  return ::new (ptr) T;
>+}
>+  else
>+return ptr;
>  }
>
>  /* Puts PTR back on POOL's free list.  */
>
>Would it be suitable?

Suitable with the memset removed, yes.

What's the status of this patch? I have a couple spec regression testers that 
have been unable to build GCC due to this issue, specifically the sched-deps.c 
change. The above patch (with memset removed) does result in a successful build.

Thanks,
Pat



Hello.

I've finishing a new patch that will do the job in more suitable way.

Martin


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
Richard Biener  writes:
>>> I'm fine with using tree_nop_conversion_p for now.
>>
>> I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
>> mode.  How about:
>>
>>  (if (VECTOR_INTEGER_TYPE_P (type)
>>   && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
>>   && (TYPE_MODE (TREE_TYPE (type))
>>   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
>>
>> (But is it really OK to be adding more mode-based compatibility checks?
>> I thought you were hoping to move away from modes in the middle end.)
>
> The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
> (the type of a comparison is always a signed vector integer type).

OK, will just use VECTOR_TYPE_P then.

> +/* We could instead convert all instances of the vec_cond to negate,
> +   but that isn't necessarily a win on its own.  */
>>>
>>> so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks more compact
>>> at least ;)  It would also simplify the patterns below.
>>
>> In the past I've dealt with processors where arithmetic wasn't handled
>> as efficiently as logical ops.  Seems like an especial risk for 64-bit
>> elements, from a quick scan of the i386 scheduling models.
>
> But then expansion could undo this ...

So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
Is there precendent for doing that kind of thing?

>> I also realised later that:
>>
>> /* Vector comparisons are defined to produce all-one or all-zero results.  */
>> (simplify
>>  (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
>>  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>(convert @0)))
>>
>> is redundant with some fold-const.c code.
>
> If so then you should remove the fold-const.c at the time you add the pattern.

Can I just drop that part of the patch instead?  The fold-const.c
code handles COND_EXPR and VEC_COND_EXPR analogously, so I'd have
to move COND_EXPR at the same time.  And then the natural follow-on
would be: why not move the other COND_EXPR and VEC_COND_EXPR folds too? :-)

> Note that ISTR code performing exactly the opposite transform in
> fold-const.c ...

That's another reason why I'm worried about just doing the (negate ...)
thing without knowing whether the negate can be folded into anything else.

Thanks,
Richard



Re: [06/12] Consolidate string hashers

2015-06-24 Thread Richard Sandiford
Mikhail Maltsev  writes:
> On 23.06.2015 17:49, Richard Sandiford wrote:
>> Index: gcc/config/alpha/alpha.c
>> ===
>> --- gcc/config/alpha/alpha.c 2015-06-23 15:48:30.751788389 +0100
>> +++ gcc/config/alpha/alpha.c 2015-06-23 15:48:30.747788453 +0100
>> @@ -4808,13 +4808,7 @@ alpha_multipass_dfa_lookahead (void)
>>  
>>  struct GTY(()) alpha_links;
>>  
>> -struct string_traits : default_hashmap_traits
>> -{
>> -  static bool equal_keys (const char *const &a, const char *const &b)
>> -  {
>> -return strcmp (a, b) == 0;
>> -  }
>> -};
>> +typedef simple_hashmap_traits  string_traits;
>>  
>
> I remember that when we briefly discussed unification of string traits,
> a looked through GCC code and this one seemed weird to me: it does not
> reimplement the hash function. I.e. the pointer value is used as hash. I
> wonder, is it intentional or not? This could actually work if strings
> are interned (but in that case there is no need to compare them, because
> comparing pointers would be enough).

I think it was accidental.  The code originally used splay trees and
so didn't need to provide a hash.

SYMBOL_REF names are unique, like you say, so pointer equality should be
enough.  Even then though, htab_hash_string ought to give a better hash
than the pointer value (as well as giving a stable order, although that
isn't important here).  So IMO the patch as it stands is still an
improvement: we're keeping the existing comparison function but adding
a better hasher.

If the series goes anywhere I might look at adding a dedicated "interned
string hasher" that sits inbetween pointer_hash and string_hash.

Thanks,
Richard



Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data

2015-06-24 Thread Jakub Jelinek
On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > Given that a mapped variable in 4.1 can have different kinds across nested 
> > data
> > regions, we need to store map-type not only for each var, but also for each
> > structured mapping.  Here is my WIP patch, is it sane? :)
> > Attached testcase works OK on the device with non-shared memory.
> 
> A bit updated version with a fix for GOMP_MAP_TO_PSET.
> make check-target-libgomp passed.

Thinking about this more, for always modifier this isn't really sufficient.
Consider:
void
foo (int *p)
{
  #pragma omp target data (alloc:p[0:32])
  {
#pragma omp target data (always, from:p[7:9])
{
  ...
}
  }
}
If all we record is the corresponding splay_tree and the flags
(from/always_from), then this would try to copy from the device
the whole array section, rather than just the small portion of it.
So, supposedly in addition to the splay_tree for always from case we also
need to remember e.g. [relative offset, length] within the splay tree
object.

Jakub


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 1:10 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
 I'm fine with using tree_nop_conversion_p for now.
>>>
>>> I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
>>> mode.  How about:
>>>
>>>  (if (VECTOR_INTEGER_TYPE_P (type)
>>>   && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE 
>>> (@0))
>>>   && (TYPE_MODE (TREE_TYPE (type))
>>>   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
>>>
>>> (But is it really OK to be adding more mode-based compatibility checks?
>>> I thought you were hoping to move away from modes in the middle end.)
>>
>> The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
>> (the type of a comparison is always a signed vector integer type).
>
> OK, will just use VECTOR_TYPE_P then.

Given we're in a VEC_COND_EXPR that's redundant as well.

>> +/* We could instead convert all instances of the vec_cond to negate,
>> +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks more 
 compact
 at least ;)  It would also simplify the patterns below.
>>>
>>> In the past I've dealt with processors where arithmetic wasn't handled
>>> as efficiently as logical ops.  Seems like an especial risk for 64-bit
>>> elements, from a quick scan of the i386 scheduling models.
>>
>> But then expansion could undo this ...
>
> So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
> Is there precendent for doing that kind of thing?

Expanding it as this, yes.  Whether there is precedence no idea, but
surely the expand_unop path could, if there is no optab for neg:vector_mode,
try expanding as vec_cond .. 1 0.  There is precedence for different
expansion paths dependent on optabs (or even rtx cost?).  Of course
expand_unop doesn't get the original tree ops (expand_expr.c does,
where some special-casing using get_gimple_for_expr is).  Not sure
if expand_unop would get 'cond' in a form where it can recognize
the result is either -1 or 0.

>>> I also realised later that:
>>>
>>> /* Vector comparisons are defined to produce all-one or all-zero results.  
>>> */
>>> (simplify
>>>  (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
>>>  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>>(convert @0)))
>>>
>>> is redundant with some fold-const.c code.
>>
>> If so then you should remove the fold-const.c at the time you add the 
>> pattern.
>
> Can I just drop that part of the patch instead?  The fold-const.c
> code handles COND_EXPR and VEC_COND_EXPR analogously, so I'd have
> to move COND_EXPR at the same time.  And then the natural follow-on
> would be: why not move the other COND_EXPR and VEC_COND_EXPR folds too? :-)

Yes, why not? ;)  But sure, you can also drop the case for now.

>> Note that ISTR code performing exactly the opposite transform in
>> fold-const.c ...
>
> That's another reason why I'm worried about just doing the (negate ...)
> thing without knowing whether the negate can be folded into anything else.

I'm not aware of anything here.

Richard.

> Thanks,
> Richard
>


Re: [PATCH] Fix PR c++/30044

2015-06-24 Thread Patrick Palka
On Wed, Jun 24, 2015 at 5:08 AM, Markus Trippelsdorf
 wrote:
> On 2015.06.23 at 19:40 -0400, Patrick Palka wrote:
>> On Tue, Jun 23, 2015 at 12:38 AM, Jason Merrill  wrote:
>> > On 06/15/2015 02:32 PM, Patrick Palka wrote:
>> >>
>> >> On Mon, Jun 15, 2015 at 2:05 PM, Jason Merrill  wrote:
>> >>>
>> >>> Any reason not to use grow_tree_vec?
>> >>
>> >>
>> >> Doing so causes a lot of ICEs in the testsuite.  I think it's because
>> >> grow_tree_vec invalidates the older parameter_vec which some trees may
>> >> still be holding a reference to in their DECL_TEMPLATE_PARMS field.
>> >
>> >
>> > Hmm, that's unfortunate, as doing it this way means we get a bunch of
>> > garbage TREE_VECs in the process.  But I guess the patch is OK as is.
>>
>> Yeah, though I can't think of a simple way to work around this -- any
>> solution I think of seems to require a change in the representation of
>> current_template_parms, something that would be quite invasive
>> Will commit the patch shortly.
>
> Your patch causes LLVM build to hang on the attached testcase. (I killed
> gcc after ~10 minutes compile time.)
>
> perf shows:
>   23.03%  cc1plus  cc1plus  [.] comp_template_parms
>   19.41%  cc1plus  cc1plus  [.] structural_comptypes
>   16.28%  cc1plus  cc1plus  [.] cp_type_quals
>   15.89%  cc1plus  cc1plus  [.] comp_template_parms_position
>   14.01%  cc1plus  cc1plus  [.] comp_type_attributes
>6.58%  cc1plus  cc1plus  [.] comptypes
> ...
>
> To reproduce just run:
>  g++ -c -O3 -std=c++11 gtest-all.ii

Thanks.  I don't think infinite recursion is going on.  Rather, it
seems that this patch causes a quadratic slowdown (in the number of
template template parameters in a parameter list and in the number of
partial specializations of a template) in the structural_comptypes ->
comp_template_parms -> comptypes loop when comparing two
TEMPLATE_TEMPLATE_PARMs to find the canonical template template
parameter of a partial specialization.  The test case has a good
amount of mechanical partial specializations of templates with big
parameter lists containing lots of template template parameters so
it's very sensitive to this quadratic slowdown.

To compare two template template parameters for structural equality,
structural_comptypes must compare their DECL_TEMPLATE_PARMS for
structural equality.  Since the patch gives the DECL_TEMPLATE_PARMS
field a level containing all previously declared template parameters
in the parameter list it's defined in, this comparison becomes
recursive and quadratic if all the parameters of the template are
template template parameters which is what the test has starting at
line 48518.

In the meantime I will revert this patch since I won't be able to find
a solution in time.

What should be done about the PR?  I suppose I should reopen it...

>
> --
> Markus


C PATCH to use VAR_P

2015-06-24 Thread Marek Polacek
Similarly to what Gaby did in 2013 for C++
(), this patch
makes the c/ and c-family/ code use VAR_P rather than

  TREE_CODE (t) == VAR_DECL

(This is on top of the previous patch with is_global_var.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  

* array-notation-common.c: Use VAR_P throughout.
* c-ada-spec.c: Likewise.
* c-common.c: Likewise.
* c-format.c: Likewise.
* c-gimplify.c: Likewise.
* c-omp.c: Likewise.
* c-pragma.c: Likewise.
* c-pretty-print.c: Likewise.
* cilk.c: Likewise.

* c-array-notation.c: Use VAR_P throughout.
* c-decl.c: Likewise.
* c-objc-common.c: Likewise.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.

diff --git gcc/c-family/array-notation-common.c 
gcc/c-family/array-notation-common.c
index d60ec3f..f517424 100644
--- gcc/c-family/array-notation-common.c
+++ gcc/c-family/array-notation-common.c
@@ -231,7 +231,7 @@ find_rank (location_t loc, tree orig_expr, tree expr, bool 
ignore_builtin_fn,
   || TREE_CODE (ii_tree) == INDIRECT_REF)
ii_tree = TREE_OPERAND (ii_tree, 0);
  else if (TREE_CODE (ii_tree) == PARM_DECL
-  || TREE_CODE (ii_tree) == VAR_DECL)
+  || VAR_P (ii_tree))
break;
  else
gcc_unreachable ();
diff --git gcc/c-family/c-ada-spec.c gcc/c-family/c-ada-spec.c
index ab29f86..ef3c5e3 100644
--- gcc/c-family/c-ada-spec.c
+++ gcc/c-family/c-ada-spec.c
@@ -2826,7 +2826,7 @@ print_ada_declaration (pretty_printer *buffer, tree t, 
tree type, int spc)
 }
   else
 {
-  if (TREE_CODE (t) == VAR_DECL
+  if (VAR_P (t)
  && decl_name
  && *IDENTIFIER_POINTER (decl_name) == '_')
return 0;
diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index d315854..d7ccf0e 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1620,7 +1620,7 @@ decl_constant_value_for_optimization (tree exp)
 gcc_unreachable ();
 
   if (!optimize
-  || TREE_CODE (exp) != VAR_DECL
+  || !VAR_P (exp)
   || TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE
   || DECL_MODE (exp) == BLKmode)
 return exp;
@@ -6952,7 +6952,7 @@ handle_nocommon_attribute (tree *node, tree name,
   tree ARG_UNUSED (args),
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) == VAR_DECL)
+  if (VAR_P (*node))
 DECL_COMMON (*node) = 0;
   else
 {
@@ -6970,7 +6970,7 @@ static tree
 handle_common_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) == VAR_DECL)
+  if (VAR_P (*node))
 DECL_COMMON (*node) = 1;
   else
 {
@@ -7349,12 +7349,12 @@ handle_used_attribute (tree *pnode, tree name, tree 
ARG_UNUSED (args),
   tree node = *pnode;
 
   if (TREE_CODE (node) == FUNCTION_DECL
-  || (TREE_CODE (node) == VAR_DECL && TREE_STATIC (node))
+  || (VAR_P (node) && TREE_STATIC (node))
   || (TREE_CODE (node) == TYPE_DECL))
 {
   TREE_USED (node) = 1;
   DECL_PRESERVE_P (node) = 1;
-  if (TREE_CODE (node) == VAR_DECL)
+  if (VAR_P (node))
DECL_READ_P (node) = 1;
 }
   else
@@ -7378,14 +7378,13 @@ handle_unused_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
   tree decl = *node;
 
   if (TREE_CODE (decl) == PARM_DECL
- || TREE_CODE (decl) == VAR_DECL
+ || VAR_P (decl)
  || TREE_CODE (decl) == FUNCTION_DECL
  || TREE_CODE (decl) == LABEL_DECL
  || TREE_CODE (decl) == TYPE_DECL)
{
  TREE_USED (decl) = 1;
- if (TREE_CODE (decl) == VAR_DECL
- || TREE_CODE (decl) == PARM_DECL)
+ if (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL)
DECL_READ_P (decl) = 1;
}
   else
@@ -7913,7 +7912,7 @@ handle_section_attribute (tree *node, tree ARG_UNUSED 
(name), tree args,
   goto fail;
 }
 
-  if (TREE_CODE (decl) == VAR_DECL
+  if (VAR_P (decl)
   && current_function_decl != NULL_TREE
   && !TREE_STATIC (decl))
 {
@@ -7932,7 +7931,7 @@ handle_section_attribute (tree *node, tree ARG_UNUSED 
(name), tree args,
   goto fail;
 }
 
-  if (TREE_CODE (decl) == VAR_DECL
+  if (VAR_P (decl)
   && !targetm.have_tls && targetm.emutls.tmpl_section
   && DECL_THREAD_LOCAL_P (decl))
 {
@@ -8223,7 +8222,7 @@ handle_alias_ifunc_attribute (bool is_alias, tree *node, 
tree name, tree args,
   tree decl = *node;
 
   if (TREE_CODE (decl) != FUNCTION_DECL
-  && (!is_alias || TREE_CODE (decl) != VAR_DECL))
+  && (!is_alias || !VAR_P (decl)))
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
@@ -8518,7 +8517,7 @@ c_determine_visibility (tree decl)
  DECL

Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jun 24, 2015 at 1:10 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
> I'm fine with using tree_nop_conversion_p for now.

 I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
 mode.  How about:

  (if (VECTOR_INTEGER_TYPE_P (type)
   && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE 
 (@0))
   && (TYPE_MODE (TREE_TYPE (type))
   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

 (But is it really OK to be adding more mode-based compatibility checks?
 I thought you were hoping to move away from modes in the middle end.)
>>>
>>> The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
>>> (the type of a comparison is always a signed vector integer type).
>>
>> OK, will just use VECTOR_TYPE_P then.
>
> Given we're in a VEC_COND_EXPR that's redundant as well.

Hmm, but is it really guaranteed in:

 (plus:c @3 (view_convert (vec_cond @0 integer_each_onep@1 integer_zerop@2)))

that the @3 and the view_convert are also vectors?  I thought we allowed
view_converts from vector to non-vector types.

>>> +/* We could instead convert all instances of the vec_cond to negate,
>>> +   but that isn't necessarily a win on its own.  */
>
> so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks
> more compact
> at least ;)  It would also simplify the patterns below.

 In the past I've dealt with processors where arithmetic wasn't handled
 as efficiently as logical ops.  Seems like an especial risk for 64-bit
 elements, from a quick scan of the i386 scheduling models.
>>>
>>> But then expansion could undo this ...
>>
>> So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
>> Is there precendent for doing that kind of thing?
>
> Expanding it as this, yes.  Whether there is precedence no idea, but
> surely the expand_unop path could, if there is no optab for neg:vector_mode,
> try expanding as vec_cond .. 1 0.

Yeah, that part isn't the problem.  It's when there is an implementation
of (neg ...) (which I'd hope all real integer vector architectures would
support) but it's not as efficient as the (and ...) that most targets
would use for a (vec_cond ... 0).

> There is precedence for different
> expansion paths dependent on optabs (or even rtx cost?).  Of course
> expand_unop doesn't get the original tree ops (expand_expr.c does,
> where some special-casing using get_gimple_for_expr is).  Not sure
> if expand_unop would get 'cond' in a form where it can recognize
> the result is either -1 or 0.

It just seems inconsistent to have the optabs machinery try to detect
this ad-hoc combination opportunity while still leaving the vcond optab
to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
The vcond optabs would still have the logic needed to produce the
right code, but we'd be circumventing it and trying to reimplement
one particular case in a different way.

Thanks,
Richard



Re: C PATCH to use VAR_P

2015-06-24 Thread Uros Bizjak
Hello!

> Similarly to what Gaby did in 2013 for C++
> (), this patch
> makes the c/ and c-family/ code use VAR_P rather than
>
>   TREE_CODE (t) == VAR_DECL
>
> (This is on top of the previous patch with is_global_var.)

You could also use VAR_OR_FUNCTION_DECL, e.g. in the part below.

Uros.

@@ -7378,14 +7378,13 @@ handle_unused_attribute (tree *node, tree
name, tree ARG_UNUSED (args),
   tree decl = *node;

   if (TREE_CODE (decl) == PARM_DECL
-  || TREE_CODE (decl) == VAR_DECL
+  || VAR_P (decl)
   || TREE_CODE (decl) == FUNCTION_DECL
   || TREE_CODE (decl) == LABEL_DECL
   || TREE_CODE (decl) == TYPE_DECL)


Re: C PATCH to use VAR_P

2015-06-24 Thread Marek Polacek
On Wed, Jun 24, 2015 at 02:37:30PM +0200, Uros Bizjak wrote:
> Hello!
> 
> > Similarly to what Gaby did in 2013 for C++
> > (), this patch
> > makes the c/ and c-family/ code use VAR_P rather than
> >
> >   TREE_CODE (t) == VAR_DECL
> >
> > (This is on top of the previous patch with is_global_var.)
> 
> You could also use VAR_OR_FUNCTION_DECL, e.g. in the part below.
 
Sure, I thought I had dealt with VAR_OR_FUNCTION_DECL_P in
, but
I must have missed this.  Thanks,


Marek


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 2:28 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Wed, Jun 24, 2015 at 1:10 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
>> I'm fine with using tree_nop_conversion_p for now.
>
> I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
> mode.  How about:
>
>  (if (VECTOR_INTEGER_TYPE_P (type)
>   && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE 
> (@0))
>   && (TYPE_MODE (TREE_TYPE (type))
>   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
>
> (But is it really OK to be adding more mode-based compatibility checks?
> I thought you were hoping to move away from modes in the middle end.)

 The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
 (the type of a comparison is always a signed vector integer type).
>>>
>>> OK, will just use VECTOR_TYPE_P then.
>>
>> Given we're in a VEC_COND_EXPR that's redundant as well.
>
> Hmm, but is it really guaranteed in:
>
>  (plus:c @3 (view_convert (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
>
> that the @3 and the view_convert are also vectors?  I thought we allowed
> view_converts from vector to non-vector types.

Hmm, true.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */
>>
>> so p ? 1 : 0 -> -p?  Why isn't that a win on its own?  It looks
>> more compact
>> at least ;)  It would also simplify the patterns below.
>
> In the past I've dealt with processors where arithmetic wasn't handled
> as efficiently as logical ops.  Seems like an especial risk for 64-bit
> elements, from a quick scan of the i386 scheduling models.

 But then expansion could undo this ...
>>>
>>> So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
>>> Is there precendent for doing that kind of thing?
>>
>> Expanding it as this, yes.  Whether there is precedence no idea, but
>> surely the expand_unop path could, if there is no optab for neg:vector_mode,
>> try expanding as vec_cond .. 1 0.
>
> Yeah, that part isn't the problem.  It's when there is an implementation
> of (neg ...) (which I'd hope all real integer vector architectures would
> support) but it's not as efficient as the (and ...) that most targets
> would use for a (vec_cond ... 0).

I would suppose that a single-operand op (neg) is always bettter than a
two-operand (and) one.  But you of course never know...

>> There is precedence for different
>> expansion paths dependent on optabs (or even rtx cost?).  Of course
>> expand_unop doesn't get the original tree ops (expand_expr.c does,
>> where some special-casing using get_gimple_for_expr is).  Not sure
>> if expand_unop would get 'cond' in a form where it can recognize
>> the result is either -1 or 0.
>
> It just seems inconsistent to have the optabs machinery try to detect
> this ad-hoc combination opportunity while still leaving the vcond optab
> to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
> The vcond optabs would still have the logic needed to produce the
> right code, but we'd be circumventing it and trying to reimplement
> one particular case in a different way.

That's true.  One could also leave it to combine / simplify_rtx and
thus rtx_cost.  But that's true of all of the match.pd stuff you add, no?

Richard.

> Thanks,
> Richard
>


Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-24 Thread Bernd Schmidt

On 06/19/2015 03:45 PM, Jakub Jelinek wrote:


If the loop remains in the IL (isn't optimized away as unreachable or
isn't removed, e.g. as a non-loop - say if it contains a noreturn call),
the flags on struct loop should be still there.  For the loop clauses
(reduction always, and private/lastprivate if addressable etc.) for
OpenMP simd / Cilk+ simd we use special arrays indexed by internal
functions, which then during vectorization are shrunk (but in theory could
be expanded too) to the right vectorization factor if vectorized, of course
accesses within the loop vectorized using SIMD, and if not vectorized,
shrunk to 1 element.


I'd appreciate if you could describe that mechanism in more detail. As 
far as I can tell it is very poorly commented and documented in the 
code. I mean, it doesn't even follow the minimal coding standards of 
describing function inputs:


/* Helper function of lower_rec_input_clauses, used for #pragma omp simd
   privatization.  */

static bool
lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int &max_vf,
  tree &idx, tree &lane, tree &ivar, tree &lvar)


Bernd



Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
>>> There is precedence for different
>>> expansion paths dependent on optabs (or even rtx cost?).  Of course
>>> expand_unop doesn't get the original tree ops (expand_expr.c does,
>>> where some special-casing using get_gimple_for_expr is).  Not sure
>>> if expand_unop would get 'cond' in a form where it can recognize
>>> the result is either -1 or 0.
>>
>> It just seems inconsistent to have the optabs machinery try to detect
>> this ad-hoc combination opportunity while still leaving the vcond optab
>> to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
>> The vcond optabs would still have the logic needed to produce the
>> right code, but we'd be circumventing it and trying to reimplement
>> one particular case in a different way.
>
> That's true.  One could also leave it to combine / simplify_rtx and
> thus rtx_cost.  But that's true of all of the match.pd stuff you add, no?

It's probably true of most match.pd stuff in general though :-)
One advantage of match.pd of course is that it works across
block boundaries.

The difference between the stuff I added and converting vec_cond_expr
to negate is that the stuff I added avoids the vec_cond_expr altogether
and so ought to be an unequivocal win.  Replacing vec_cond_expr with
negate just rewrites it into another (arguably more surprising) form.

Thanks,
Richard



Re: [gomp4] Preserve NVPTX "reconvergence" points

2015-06-24 Thread Jakub Jelinek
On Wed, Jun 24, 2015 at 03:11:04PM +0200, Bernd Schmidt wrote:
> On 06/19/2015 03:45 PM, Jakub Jelinek wrote:
> 
> >If the loop remains in the IL (isn't optimized away as unreachable or
> >isn't removed, e.g. as a non-loop - say if it contains a noreturn call),
> >the flags on struct loop should be still there.  For the loop clauses
> >(reduction always, and private/lastprivate if addressable etc.) for
> >OpenMP simd / Cilk+ simd we use special arrays indexed by internal
> >functions, which then during vectorization are shrunk (but in theory could
> >be expanded too) to the right vectorization factor if vectorized, of course
> >accesses within the loop vectorized using SIMD, and if not vectorized,
> >shrunk to 1 element.
> 
> I'd appreciate if you could describe that mechanism in more detail. As far
> as I can tell it is very poorly commented and documented in the code. I
> mean, it doesn't even follow the minimal coding standards of describing
> function inputs:
> 
> /* Helper function of lower_rec_input_clauses, used for #pragma omp simd
>privatization.  */
> 
> static bool
> lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int &max_vf,
> tree &idx, tree &lane, tree &ivar, tree &lvar)

Here is the theory behind it:
https://gcc.gnu.org/ml/gcc-patches/2013-04/msg01661.html
In the end it is using internal functions instead of uglified builtins.
I'd suggest you look at some of the libgomp.c/simd*.c tests, say
with -O2 -mavx2 -fdump-tree-{omplower,ssa,ifcvt,vect,optimized}
to see how it is lowered and expanded.  I assume #pragma omp simd roughly
corresponds to #pragma acc loop vector, maxvf for PTX vectorization is
supposedly 32 (warp size).  For SIMD vectorization, if the vectorization
fails, the arrays are shrunk to 1 element, otherwise they are shrunk to the
vectorization factor, and later optimizations if they aren't really
addressable optimized using FRE and other memory optimizations so that they
don't touch memory unless really needed.
For the PTX style vectorization (parallelization between threads in a warp),
I'd say you would always shrink to 1 element again, but such variables would
be local to each of the threads in the warp (or another possibility is
shared arrays of size 32 indexed by %tid.x & 31), while addressable variables
without such magic type would be shared among all threads; non-addressable
variables (SSA_NAMEs) depending on where they are used.
You'd need to transform reductions (which are right now represented as
another loop, from 0 to an internal function, so easily recognizable) into
the PTX reductions.  Also, lastprivate is now an access to the array using
last lane internal function, dunno what that corresponds to in PTX
(perhaps also a reduction where all but the thread executing the last
iteration say or in 0 and the remaining thread ors in the lastprivate value).

Jakub


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 3:37 PM, Richard Sandiford
 wrote:
 There is precedence for different
 expansion paths dependent on optabs (or even rtx cost?).  Of course
 expand_unop doesn't get the original tree ops (expand_expr.c does,
 where some special-casing using get_gimple_for_expr is).  Not sure
 if expand_unop would get 'cond' in a form where it can recognize
 the result is either -1 or 0.
>>>
>>> It just seems inconsistent to have the optabs machinery try to detect
>>> this ad-hoc combination opportunity while still leaving the vcond optab
>>> to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
>>> The vcond optabs would still have the logic needed to produce the
>>> right code, but we'd be circumventing it and trying to reimplement
>>> one particular case in a different way.
>>
>> That's true.  One could also leave it to combine / simplify_rtx and
>> thus rtx_cost.  But that's true of all of the match.pd stuff you add, no?
>
> It's probably true of most match.pd stuff in general though :-)
> One advantage of match.pd of course is that it works across
> block boundaries.
>
> The difference between the stuff I added and converting vec_cond_expr
> to negate is that the stuff I added avoids the vec_cond_expr altogether
> and so ought to be an unequivocal win.  Replacing vec_cond_expr with
> negate just rewrites it into another (arguably more surprising) form.

True.  Btw, conditional view_convert is now in trunk so you can at least
merge both plus:c patterns and both minus patterns.

Richard.

> Thanks,
> Richard
>


[Patch ARM] Fixup testsuite noise with various multilibs in arm.exp

2015-06-24 Thread Ramana Radhakrishnan
Pretty much self explanatory. This fixes up a significant amount of 
noise in a testsuite run with an -mfloat-abi=soft, -mfpu=fp-armv8 
variant in the multilib flags provided for the testsuite run.


Tested arm-none-eabi cross with a set of multilibs and applied to trunk.

regards
ramana

2015-06-24  Ramana Radhakrishnan  

* gcc.target/arm/fixed_float_conversion.c: Skip for inappropriate
  multilibs.
* gcc.target/arm/memset-inline-10.c: Likewise.
* gcc.target/arm/pr58784.c: Likewise.
* gcc.target/arm/pr59985.C: Likewise.
* gcc.target/arm/vfp-1.c: Likewise and test only for the non fma cases.


commit 6187bfc6ca02ec16f8443188f740958079d8e6ea
Author: Ramana Radhakrishnan 
Date:   Wed Jun 24 14:33:51 2015 +0100

more noise.

diff --git a/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c 
b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
index 078b103..05ccd14 100644
--- a/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
+++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
@@ -3,6 +3,7 @@
 /* { dg-require-effective-target arm_vfp3_ok } */
 /* { dg-options "-O1" } */
 /* { dg-add-options arm_vfp3 } */
+/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } 
} */
 
 float
 fixed_to_float (int i)
diff --git a/gcc/testsuite/gcc.target/arm/memset-inline-10.c 
b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
index d3b777c..c1087c8 100644
--- a/gcc/testsuite/gcc.target/arm/memset-inline-10.c
+++ b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-march=armv7-a -mfloat-abi=hard -mfpu=neon -O2" } */
+/* { dg-skip-if "need SIMD instructions" { *-*-* } { "-mfloat-abi=soft" } { "" 
} } */
+/* { dg-skip-if "need SIMD instructions" { *-*-* } { "-mfpu=vfp*" } { "" } } */
 
 #define BUF 100
 long a[BUF];
diff --git a/gcc/testsuite/gcc.target/arm/pr58784.c 
b/gcc/testsuite/gcc.target/arm/pr58784.c
index 4ee3ef5..29a0f73 100644
--- a/gcc/testsuite/gcc.target/arm/pr58784.c
+++ b/gcc/testsuite/gcc.target/arm/pr58784.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-skip-if "incompatible options" { arm_thumb1 } { "*" } { "" } } */
 /* { dg-options "-march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2" } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+
 
 typedef struct __attribute__ ((__packed__))
 {
diff --git a/gcc/testsuite/gcc.target/arm/pr59985.C 
b/gcc/testsuite/gcc.target/arm/pr59985.C
index 1351c48..97d5915 100644
--- a/gcc/testsuite/gcc.target/arm/pr59985.C
+++ b/gcc/testsuite/gcc.target/arm/pr59985.C
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "incompatible options" { arm_thumb1 } { "*" } { "" } } */
 /* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
-mfpu=vfpv3-d16 -mfloat-abi=hard" } */
+/* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
 
 extern void *f1 (unsigned long, unsigned long);
 extern const struct line_map *f2 (void *, int, unsigned int, const char *, 
unsigned int);
diff --git a/gcc/testsuite/gcc.target/arm/vfp-1.c 
b/gcc/testsuite/gcc.target/arm/vfp-1.c
index b6bb7be..9aa5302 100644
--- a/gcc/testsuite/gcc.target/arm/vfp-1.c
+++ b/gcc/testsuite/gcc.target/arm/vfp-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfpu=vfp -mfloat-abi=softfp" } */
+/* { dg-options "-O2 -mfpu=vfp -mfloat-abi=softfp -ffp-contract=off" } */
 /* { dg-require-effective-target arm_vfp_ok } */
+/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } 
} */
 
 extern float fabsf (float);
 extern float sqrtf (float);


[PATCH, committed] Fix warning

2015-06-24 Thread Ilya Enkovich
Hi,

I've committed this patch to fix a warning for mpx-bootstrap.

/export/users/aguskov/MPX/git_branch/source/gcc/tree.h:2858:51: error: 
'vectype' may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
 tree_check_failed (__t, __f, __l, __g, __c, 0);
   ^
/export/users/aguskov/MPX/git_branch/source/gcc/tree-vect-slp.c:483:8: note: 
'vectype' was declared here
   tree vectype, scalar_type, first_op1 = NULL_TREE;
^

Thanks,
Ilya
--
2015-06-24  Ilya Enkovich  

* tree-vect-slp.c (vect_build_slp_tree_1): Init vectype.


diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 91ddc0f..bbc7d13 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -480,7 +480,7 @@ vect_build_slp_tree_1 (loop_vec_info loop_vinfo, 
bb_vec_info bb_vinfo,
   enum tree_code first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
-  tree vectype, scalar_type, first_op1 = NULL_TREE;
+  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
   optab optab;
   int icode;
   machine_mode optab_op2_mode;


Re: [PATCH 1/8] S/390 Vector ABI GNU Attribute.

2015-06-24 Thread Andreas Krebbel
On 06/24/2015 12:14 PM, Richard Biener wrote:
> On Wed, Jun 24, 2015 at 8:57 AM, Andreas Krebbel
>> Ideas about how to improve the implementation without creating too
>> many false postives are welcome.
> 
> I'd be more conservative and instead hook into
> targetm.vector_mode_supported_p (and thus vector_type_mode).
> 
> Yes, it will trip on "local" vector types.  But I can't see how you
> can avoid this in general without seeing the whole program.

This would mean that the GNU ABI marker would be set for code which makes use 
of one of the builtins
locally to optimize a special case wrapped in a runtime check. This is what I 
was trying to avoid
actually.
We do have some optimizations for Glibc. However, these are all written in 
assembler what would not
trigger the ABI flag to be set. So for Glibc the more conservative approach 
would be no problem so far.

The current implementation has the hole that an ABI vector type usage might not 
be detected if the
type is used in a sizeof construct without being used anywhere else. One 
question is how big that
problem actually is? Another is if there are more cases which might slip 
through? If it turns out to
be too risky I probably have to go with one of the more conservative approaches 
:(

> If I'd do it retro-actively I'd reverse the flag and instead mark units
> which use generic non-z13 vectors...
> 
> Note that other targets simply emit -Wpsabi warnings here:
> 
>> gcc t.c -S -m32
> t.c: In function ‘foo’:
> t.c:4:1: warning: SSE vector return without SSE enabled changes the
> ABI [-Wpsabi]
>  {
>  ^
> t.c:3:6: note: The ABI for passing parameters with 16-byte alignment
> has changed in GCC 4.6
>  v4si foo (v4si x)
>   ^
> t.c:3:6: warning: SSE vector argument without SSE enabled changes the
> ABI [-Wpsabi]
> 
> for
> 
> typedef int v4si __attribute__((vector_size(16)));
> 
> v4si foo (v4si x)
> {
>   return x;
> }
> 
> on i?86 without -msse2.  So you could as well do that - warn for vector
> type uses on non-z13 and be done with that.

Yes. I've seen this and plan to implement this ontop of the other mechanism. 
But we would still need
something like the GNU ABI marker for GDB.

Bye,

-Andreas-


> 
> Richard.
> 
>> In particular we do not want to set the attribute for local uses of
>> vector types as they would be natural for ifunc optimizations.
>>
>> gcc/
>> * config/s390/s390.c (s390_vector_abi): New variable definition.
>> (s390_check_type_for_vector_abi): New function.
>> (TARGET_ASM_FILE_END): New macro definition.
>> (s390_asm_file_end): New function.
>> (s390_function_arg): Call s390_check_type_for_vector_abi.
>> (s390_gimplify_va_arg): Likewise.
>> * configure: Regenerate.
>> * configure.ac: Check for .gnu_attribute Binutils feature.
>>
>> gcc/testsuite/
>> * gcc.target/s390/vector/vec-abi-1.c: Add gnu attribute check.
>> * gcc.target/s390/vector/vec-abi-attr-1.c: New test.
>> * gcc.target/s390/vector/vec-abi-attr-2.c: New test.
>> * gcc.target/s390/vector/vec-abi-attr-3.c: New test.
>> * gcc.target/s390/vector/vec-abi-attr-4.c: New test.
>> * gcc.target/s390/vector/vec-abi-attr-5.c: New test.
>> * gcc.target/s390/vector/vec-abi-attr-6.c: New test.
>> ---
>>  gcc/config/s390/s390.c |  121 
>> 
>>  gcc/configure  |   36 ++
>>  gcc/configure.ac   |7 ++
>>  gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c   |1 +
>>  .../gcc.target/s390/vector/vec-abi-attr-1.c|   18 +++
>>  .../gcc.target/s390/vector/vec-abi-attr-2.c|   53 +
>>  .../gcc.target/s390/vector/vec-abi-attr-3.c|   18 +++
>>  .../gcc.target/s390/vector/vec-abi-attr-4.c|   17 +++
>>  .../gcc.target/s390/vector/vec-abi-attr-5.c|   19 +++
>>  .../gcc.target/s390/vector/vec-abi-attr-6.c|   24 
>>  10 files changed, 314 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c
>>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c
>>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c
>>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c
>>
>> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
>> index d6ed179..934f7c0 100644
>> --- a/gcc/config/s390/s390.c
>> +++ b/gcc/config/s390/s390.c
>> @@ -461,6 +461,97 @@ struct GTY(()) machine_function
>>  #define PREDICT_DISTANCE (TARGET_Z10 ? 384 : 2048)
>>
>>
>> +/* Indicate which ABI has been used for passing vector args.
>> +   0 - no vector type arguments have been passed where the ABI is relevant
>> +   1 - the old ABI has been used
>> +   2 - a vector type argument has been pass

[PATCH, i386] Fix `misaligned_operand' predicate.

2015-06-24 Thread Kirill Yukhin
Hello,

Patch in the bottom uses proper check of valid memory
in `misaligned_operand' predicate.

gcc/
* config/i386/predicates.md (misaligned_operand): Properly
check if operand is memory.

Bootstrapped and reg-tested.

Is it ok for trunk?

--
Thanks,  K

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 4e45246..7d6ae77 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1365,7 +1365,7 @@

 ;; Return true if OP is misaligned memory operand
 (define_predicate "misaligned_operand"
-  (and (match_code "mem")
+  (and (match_operand 0 "memory_operand")
(match_test "MEM_ALIGN (op) < GET_MODE_ALIGNMENT (mode)")))

 ;; Return true if OP is a emms operation, known to be a PARALLEL.


Re: [C/C++ PATCH] PR c++/66572. Fix Wlogical-op false positive

2015-06-24 Thread Mikhail Maltsev
On 23.06.2015 22:49, Marek Polacek wrote:
> On Sat, Jun 20, 2015 at 03:02:06AM +0300, Mikhail Maltsev wrote:
>> -  /* We do not warn for constants because they are typical of macro
>> - expansions that test for features.  */
>> -  if (CONSTANT_CLASS_P (op_left) || CONSTANT_CLASS_P (op_right))
>> +  /* We do not warn for literal constants because they are typical of macro
>> + expansions that test for features.  Likewise, we do not warn for
>> + const-qualified and constexpr variables which are initialized by 
>> constant
>> + expressions, because they can come from e.g.  or similar 
>> user
>> + code.  */
>> +  if (TREE_CONSTANT (op_left) || TREE_CONSTANT (op_right))
>>  return;
> 
> That looks wrong, because with TREE_CONSTANT we'd warn in C but not in C++
> for the following:
> 
> const int a = 4;
> void
> f (void)
> {
>   const int b = 4;
>   static const int c = 5;
>   if (a && a) {}
>   if (b && b) {}
>   if (c && c) {}
> }
> 
Actually for this case the patch silences the warning both for C and
C++. It's interesting that Clang warns like this:

test.c:7:10: warning: use of logical '&&' with constant operand
[-Wconstant-logical-operand]

It does not warn for my testcase with templates. It also does not warn
about:

void
bar (const int parm_a)
{
  const bool a = parm_a;
  if (a && a) {}
  if (a || a) {}
  if (parm_a && parm_a) {}
  if (parm_a || parm_a) {}
}

EDG does not give any warnings at all (in all 3 testcases).

> Note that const-qualified types are checked using TYPE_READONLY.
Yes, but I think we should warn about const-qualified types like in the
example above (and in your recent patch).

> 
> But I'm not even sure that the warning in the original testcase in the PR
> is bogus; you won't get any warning when using e.g.
>   foo();
> in main().

Maybe my snippet does not express clearly enough what it was supposed to
express. I actually meant something like this:

  template,
is_convertible<_U2, _T2>>::value>::type>
constexpr pair(pair<_U1, _U2>&& __p)
: first(std::forward<_U1>(__p.first)),
  second(std::forward<_U2>(__p.second)) { }

(it's std::pair move constructor)
It is perfectly possible that the user will construct an std::pair
object from an std::pair. In this case we get an "and" of two
identical is_convertible instantiations. The difference is that here
there is a clever __and_ template which helps to avoid warnings. Well,
at least I now know a good way to suppress them in my code :).

Though I still think that this warning is bogus. Probably the correct
(and the hard) way to check templates is to compare ASTs of the operands
before any substitutions.

But for now I could try to implement an idea, which I mentioned in the
bug report: add a new flag to enum tsubst_flags, and set it when we
check ASTs which depend on parameters of a template being instantiated
(we already have similar checks for macro expansions). What do you think
about such approach?

-- 
Regards,
Mikhail Maltsev


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Sandra Loosemore

On 06/24/2015 01:58 AM, Ramana Radhakrishnan wrote:



On 24/06/15 02:00, Sandra Loosemore wrote:

On 06/18/2015 11:32 AM, Eric Botcazou wrote:

The attached patch teaches regrename to validate insns affected by each
register renaming before making the change.  I can see at least two
other ways to handle this -- earlier, by rejecting renamings that
result
in invalid instructions when it's searching for the best renaming; or
later, by validating the entire set of renamings as a group instead of
incrementally for each one -- but doing it all in regname_do_replace
seems least disruptive and risky in terms of the existing code.


OK, but the patch looks incomplete, rename_chains should be adjusted
as well,
i.e. regrename_do_replace should now return a boolean.


Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
built for aarch64-linux-gnu and ran the gcc testsuite.


Hopefully that was built with --with-cpu=cortex-a57 to enable the
renaming pass ?


No, sorry.  I was assuming there were compile-only unit tests for this 
pass that automatically add the right options to enable it.  I don't 
know that I can actually run cortex-a57 code (I was struggling with a 
flaky test harness as it was).


-Sandra



Re: [PATCH, i386] Fix `misaligned_operand' predicate.

2015-06-24 Thread Uros Bizjak
On Wed, Jun 24, 2015 at 4:35 PM, Kirill Yukhin  wrote:
> Hello,
>
> Patch in the bottom uses proper check of valid memory
> in `misaligned_operand' predicate.
>
> gcc/
> * config/i386/predicates.md (misaligned_operand): Properly
> check if operand is memory.
>
> Bootstrapped and reg-tested.
>
> Is it ok for trunk?

I have reviewed the uses of misaligned_operand predicate, and AFAICS
they always operate after the check for "memory_operand". So, there is
no point to re-check it with full memory_operand predicate.

Please introduce another predicate for legitimate misaligned memory
operand, perhaps named "misaligned_memory_operand".

Uros.

> --
> Thanks,  K
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 4e45246..7d6ae77 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -1365,7 +1365,7 @@
>
>  ;; Return true if OP is misaligned memory operand
>  (define_predicate "misaligned_operand"
> -  (and (match_code "mem")
> +  (and (match_operand 0 "memory_operand")
> (match_test "MEM_ALIGN (op) < GET_MODE_ALIGNMENT (mode)")))
>
>  ;; Return true if OP is a emms operation, known to be a PARALLEL.


Re: [PATCH 1/3][AArch64 nofp] Fix ICEs with +nofp/-mgeneral-regs-only and improve error messages

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 05:02:46PM +0100, Alan Lawrence wrote:
> James Greenhalgh wrote:
 
> Bootstrap + check-gcc on aarch64-none-linux-gnu.
> 
> (ChangeLog's identical to v1)
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-protos.h (aarch64_err_no_fpadvsimd): New.
> 
>   * config/aarch64/aarch64.md (mov/GPF, movtf): Use
>   aarch64_err_no_fpadvsimd.
> 
>   * config/aarch64/aarch64.c (aarch64_err_no_fpadvsimd): New.
>   (aarch64_layout_arg, aarch64_init_cumulative_args): Use
>   aarch64_err_no_fpadvsimd if !TARGET_FLOAT and we need FP regs.
>   (aarch64_expand_builtin_va_start, aarch64_setup_incoming_varargs):
>   Turn error into assert, test TARGET_FLOAT.
>   (aarch64_gimplify_va_arg_expr): Use aarch64_err_no_fpadvsimd, test
>   TARGET_FLOAT.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/mgeneral-regs_1.c: New file.
>   * gcc.target/aarch64/mgeneral-regs_2.c: New file.
>   * gcc.target/aarch64/nofp_1.c: New file.


OK.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 965a11b7bee188819796e2b17017a87dca80..ac92c5924a4cfc5941fe8eeb31281e18bd21a5a0
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -259,6 +259,7 @@ unsigned aarch64_dbx_register_number (unsigned);
>  unsigned aarch64_trampoline_size (void);
>  void aarch64_asm_output_labelref (FILE *, const char *);
>  void aarch64_elf_asm_named_section (const char *, unsigned, tree);
> +void aarch64_err_no_fpadvsimd (machine_mode, const char *);
>  void aarch64_expand_epilogue (bool);
>  void aarch64_expand_mov_immediate (rtx, rtx);
>  void aarch64_expand_prologue (void);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> a79bb6a96572799181a5bff3c3818e294f87cb7a..3193a15970e5524e0f3a8a5505baea5582e55731
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -522,6 +522,16 @@ static const char * const aarch64_condition_codes[] =
>"hi", "ls", "ge", "lt", "gt", "le", "al", "nv"
>  };
>  
> +void
> +aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg)
> +{
> +  const char *mc = FLOAT_MODE_P (mode) ? "floating-point" : "vector";
> +  if (TARGET_GENERAL_REGS_ONLY)
> +error ("%qs is incompatible with %s %s", "-mgeneral-regs-only", mc, msg);
> +  else
> +error ("%qs feature modifier is incompatible with %s %s", "+nofp", mc, 
> msg);
> +}
> +
>  static unsigned int
>  aarch64_min_divisions_for_recip_mul (enum machine_mode mode)
>  {
> @@ -1772,6 +1782,9 @@ aarch64_layout_arg (cumulative_args_t pcum_v, 
> machine_mode mode,
>   and homogenous short-vector aggregates (HVA).  */
>if (allocate_nvrn)
>  {
> +  if (!TARGET_FLOAT)
> + aarch64_err_no_fpadvsimd (mode, "argument");
> +
>if (nvrn + nregs <= NUM_FP_ARG_REGS)
>   {
> pcum->aapcs_nextnvrn = nvrn + nregs;
> @@ -1898,6 +1911,17 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
>pcum->aapcs_stack_words = 0;
>pcum->aapcs_stack_size = 0;
>  
> +  if (!TARGET_FLOAT
> +  && fndecl && TREE_PUBLIC (fndecl)
> +  && fntype && fntype != error_mark_node)
> +{
> +  const_tree type = TREE_TYPE (fntype);
> +  machine_mode mode ATTRIBUTE_UNUSED; /* To pass pointer as argument.  */
> +  int nregs ATTRIBUTE_UNUSED; /* Likewise.  */
> +  if (aarch64_vfp_is_call_or_return_candidate (TYPE_MODE (type), type,
> +&mode, &nregs, NULL))
> + aarch64_err_no_fpadvsimd (TYPE_MODE (type), "return type");
> +}
>return;
>  }
>  
> @@ -7557,9 +7581,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
> nextarg ATTRIBUTE_UNUSED)
>  
>if (!TARGET_FLOAT)
>  {
> -  if (cum->aapcs_nvrn > 0)
> - sorry ("%qs and floating point or vector arguments",
> -"-mgeneral-regs-only");
> +  gcc_assert (cum->aapcs_nvrn == 0);
>vr_save_area_size = 0;
>  }
>  
> @@ -7666,8 +7688,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, 
> gimple_seq *pre_p,
>  {
>/* TYPE passed in fp/simd registers.  */
>if (!TARGET_FLOAT)
> - sorry ("%qs and floating point or vector arguments",
> -"-mgeneral-regs-only");
> + aarch64_err_no_fpadvsimd (mode, "varargs");
>  
>f_top = build3 (COMPONENT_REF, TREE_TYPE (f_vrtop),
> unshare_expr (valist), f_vrtop, NULL_TREE);
> @@ -7904,9 +7925,7 @@ aarch64_setup_incoming_varargs (cumulative_args_t 
> cum_v, machine_mode mode,
>  
>if (!TARGET_FLOAT)
>  {
> -  if (local_cum.aapcs_nvrn > 0)
> - sorry ("%qs and floating point or vector arguments",
> -"-mgeneral-regs-only");
> +  gcc_assert (local_cum.aapcs_nvrn == 0);
>vr_saved = 0;
>  }
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 1efe57c91b10e47ab7511d089f7b4bb53f18f06e.

[gomp4] Additional tests for declare directive and fixes.

2015-06-24 Thread James Norris

Hi!

The following patch adds additional testing of the declare directive
and fixes for issues that arose from the testing.


Committed to gomp-4_0-branch.

Jim
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index e7df751..bcbd163 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1767,12 +1767,15 @@ finish_oacc_declare (tree fnbody, tree decls)
 	break;
 }
 
-  stmt = make_node (OACC_DECLARE);
-  TREE_TYPE (stmt) = void_type_node;
-  OACC_DECLARE_CLAUSES (stmt) = ret_clauses;
-  SET_EXPR_LOCATION (stmt, loc);
+  if (ret_clauses)
+{
+  stmt = make_node (OACC_DECLARE);
+  TREE_TYPE (stmt) = void_type_node;
+  OACC_DECLARE_CLAUSES (stmt) = ret_clauses;
+  SET_EXPR_LOCATION (stmt, loc);
 
-  tsi_link_before (&i, stmt, TSI_CONTINUE_LINKING);
+  tsi_link_before (&i, stmt, TSI_CONTINUE_LINKING);
+}
 
   DECL_ATTRIBUTES (fndecl)
 	  = remove_attribute ("oacc declare", DECL_ATTRIBUTES (fndecl));
@@ -12812,6 +12815,14 @@ c_parser_oacc_declare (c_parser *parser)
 	  error = true;
 	  continue;
 	}
+	  else if (TREE_PUBLIC (decl))
+	{
+	  error_at (loc,
+			"invalid use of % variable %qD "
+			"in %<#pragma acc declare%>", decl);
+	  error = true;
+	  continue;
+	}
 	  break;
 	}
 
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 15da51e..a35f599 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14343,7 +14343,17 @@ finish_oacc_declare (tree fndecl, tree decls)
  {
 	t = tsi_stmt (i);
 	if (TREE_CODE (t) == BIND_EXPR)
-	  list = BIND_EXPR_BODY (t);
+	  {
+	list = BIND_EXPR_BODY (t);
+	if (TREE_CODE (list) != STATEMENT_LIST)
+	  {
+		stmt = list;
+		list = alloc_stmt_list ();
+		BIND_EXPR_BODY (t) = list;
+		i = tsi_start (list);
+		tsi_link_after (&i, stmt, TSI_CONTINUE_LINKING);
+	  }
+	  }
   }
 
   if (clauses)
@@ -14371,11 +14381,11 @@ finish_oacc_declare (tree fndecl, tree decls)
 	}
 	}
 
-	if (!found)
-	  {
-	i = tsi_start (list);
-	tsi_link_before (&i, stmt, TSI_CONTINUE_LINKING);
-	  }
+  if (!found)
+	{
+	  i = tsi_start (list);
+	  tsi_link_before (&i, stmt, TSI_CONTINUE_LINKING);
+	}
 }
 
 while (oacc_returns)
@@ -14405,18 +14415,21 @@ finish_oacc_declare (tree fndecl, tree decls)
 	free (r);
  }
 
-  for (i = tsi_start (list); !tsi_end_p (i); tsi_next (&i))
+  if (ret_clauses)
 {
-  if (tsi_end_p (i))
-	break;
-}
+  for (i = tsi_start (list); !tsi_end_p (i); tsi_next (&i))
+	{
+	  if (tsi_end_p (i))
+	break;
+	}
 
-  stmt = make_node (OACC_DECLARE);
-  TREE_TYPE (stmt) = void_type_node;
-  OMP_STANDALONE_CLAUSES (stmt) = ret_clauses;
-  SET_EXPR_LOCATION (stmt, loc);
+  stmt = make_node (OACC_DECLARE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_STANDALONE_CLAUSES (stmt) = ret_clauses;
+  SET_EXPR_LOCATION (stmt, loc);
 
-  tsi_link_before (&i, stmt, TSI_CONTINUE_LINKING);
+  tsi_link_before (&i, stmt, TSI_CONTINUE_LINKING);
+}
 
   DECL_ATTRIBUTES (fndecl)
 	  = remove_attribute ("oacc declare", DECL_ATTRIBUTES (fndecl));
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 78bcb0a1..41fb35e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32123,6 +32123,14 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
 	  error = true;
 	  continue;
 	}
+	  else if (TREE_PUBLIC (decl))
+	{
+	  error_at (loc,
+			"invalid use of % variable %qD "
+			"in %<#pragma acc declare%>", decl);
+	  error = true;
+	  continue;
+	}
 	  break;
 	}
 
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-2.c b/gcc/testsuite/c-c++-common/goacc/declare-2.c
index ce12463..7979f0c 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-2.c
@@ -63,4 +63,6 @@ f (void)
 
   extern int ve6;
 #pragma acc declare present_or_create(ve6) /* { dg-error "invalid use of" } */
+
+#pragma acc declare present (v9) /* { dg-error "invalid use of" } */
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
index 59cfe51..584b921 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
@@ -4,6 +4,26 @@
 #include 
 #include 
 
+#define N 8
+
+void
+subr1 (int *a)
+{
+  int f[N];
+#pragma acc declare copy (f)
+
+#pragma acc parallel copy (a[0:N])
+  {
+int i;
+
+for (i = 0; i < N; i++)
+  {
+	f[i] = a[i];
+	a[i] = f[i] + f[i];
+  }
+  }
+}
+
 int b[8];
 #pragma acc declare create (b)
 
@@ -13,7 +33,6 @@ int d[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
 int
 main (int argc, char **argv)
 {
-  const int N = 8;
   int a[N];
   int e[N];
 #pragma acc declare create (e)
@@ -61,5 +80,18 @@ main (int argc, char **argv)
 	abort ();
 }
 
+  for (i = 0; i < N; i++)
+{
+  a[i] = 1234;
+}
+
+  subr1 (&a[0]);
+
+  for (i = 0; i < N; i++)
+{
+  if (a[i] != 1234 * 2)
+	abort ();
+}
+
   return 0;
 }
diff --git a/libgo

Re: [C++17] Implement N3928 - Extending static_assert

2015-06-24 Thread Ed Smith-Rowland

On 06/17/2015 03:22 PM, Jason Merrill wrote:

On 06/17/2015 01:53 PM, Ed Smith-Rowland wrote:

I tried the obvious: an error message with %qE and got 'false'.
constexpr values are evaluated early on.

Is there a possibility that late folding could help or is that
completely different?


Late folding could help, but I think handling it in libcpp (by 
actually stringizing the argument) would work better.


Jason


OK, Doing the bit with libcpp and getting a better message is taking 
longer than I thought it would.

I'm going ahead with what I had back in May.
I'm still working on better - it'll just take a hot minute.

Meanwhile, this takes care of an annoyance for many.

Committed 224903.

Ed

cp/

2015-06-24  Edward Smith-Rowland  <3dw...@verizon.net>

Implement N3928 - Extending static_assert
* parser.c (cp_parser_static_assert): Support static_assert with
no message string.  Supply an empty string in this case.
* semantics.c (finish_static_assert): Don't try to print a message if
the message strnig is empty.


testsuite/

2015-06-24  Edward Smith-Rowland  <3dw...@verizon.net>

Implement N3928 - Extending static_assert
* g++.dg/cpp0x/static_assert8.C: Adjust.
* g++.dg/cpp0x/static_assert12.C: New.
* g++.dg/cpp0x/static_assert13.C: New.
* g++.dg/cpp1y/static_assert1.C: New.
* g++.dg/cpp1y/static_assert2.C: New.
* g++.dg/cpp1z/static_assert-nomsg.C: New.

Index: cp/parser.c
===
--- cp/parser.c (revision 224897)
+++ cp/parser.c (working copy)
@@ -12173,6 +12173,7 @@
 
static_assert-declaration:
  static_assert ( constant-expression , string-literal ) ; 
+ static_assert ( constant-expression ) ; (C++1Z)
 
If MEMBER_P, this static_assert is a class member.  */
 
@@ -12210,20 +12211,35 @@
/*allow_non_constant_p=*/true,
/*non_constant_p=*/&dummy);
 
-  /* Parse the separating `,'.  */
-  cp_parser_require (parser, CPP_COMMA, RT_COMMA);
+  if (cp_lexer_peek_token (parser->lexer)->type == CPP_CLOSE_PAREN)
+{
+  if (cxx_dialect < cxx1z)
+   pedwarn (input_location, OPT_Wpedantic,
+"static_assert without a message "
+"only available with -std=c++1z or -std=gnu++1z");
+  /* Eat the ')'  */
+  cp_lexer_consume_token (parser->lexer);
+  message = build_string (1, "");
+  TREE_TYPE (message) = char_array_type_node;
+  fix_string_type (message);
+}
+  else
+{
+  /* Parse the separating `,'.  */
+  cp_parser_require (parser, CPP_COMMA, RT_COMMA);
 
-  /* Parse the string-literal message.  */
-  message = cp_parser_string_literal (parser, 
-  /*translate=*/false,
-  /*wide_ok=*/true);
+  /* Parse the string-literal message.  */
+  message = cp_parser_string_literal (parser, 
+ /*translate=*/false,
+ /*wide_ok=*/true);
 
-  /* A `)' completes the static assertion.  */
-  if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, 
-   /*recovering=*/true, 
-   /*or_comma=*/false,
-  /*consume_paren=*/true);
+  /* A `)' completes the static assertion.  */
+  if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+   cp_parser_skip_to_closing_parenthesis (parser, 
+   /*recovering=*/true, 
+   /*or_comma=*/false,
+  /*consume_paren=*/true);
+}
 
   /* A semicolon terminates the declaration.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 224897)
+++ cp/semantics.c  (working copy)
@@ -7174,8 +7174,17 @@
   input_location = location;
   if (TREE_CODE (condition) == INTEGER_CST 
   && integer_zerop (condition))
-/* Report the error. */
-error ("static assertion failed: %s", TREE_STRING_POINTER (message));
+   {
+ int sz = TREE_INT_CST_LOW (TYPE_SIZE_UNIT
+(TREE_TYPE (TREE_TYPE (message;
+ int len = TREE_STRING_LENGTH (message) / sz - 1;
+  /* Report the error. */
+ if (len == 0)
+error ("static assertion failed");
+ else
+error ("static assertion failed: %s",
+  TREE_STRING_POINTER (message));
+   }
   else if (condition && condition != error_mark_node)
{
  error ("non-constant condition for static as

Re: C++ PATCH for c++/66501 (wrong code with array move assignment)

2015-06-24 Thread Jason Merrill

On 06/23/2015 10:05 AM, Jason Merrill wrote:

build_vec_init was assuming that if a class has a trivial copy
assignment, then an array assignment is trivial.  But overload
resolution might not choose the copy assignment operator.  So this patch
changes build_vec_init to check for any non-trivial assignment operator.


On further consideration, it occurred to me that is_trivially_xible 
gives the precise answer we want, so I'm changing build_vec_init to use it.


On 4.9 we don't have is_trivially_xible, so I'm doing a simpler check, 
just adding TYPE_HAS_COMPLEX_MOVE_ASSIGN to the mix.


Tested x86_64-pc-linux-gnu, applying to trunk, 5 and 4.9.


commit d4d071b1f6552bfe57a1ed9e27de028580958afd
Author: Jason Merrill 
Date:   Tue Jun 23 22:02:30 2015 -0400

	PR c++/66501
	* init.c (vec_copy_assign_is_trivial): New.
	(build_vec_init): Use it.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 957a7a4..04c09d8 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3367,6 +3367,18 @@ get_temp_regvar (tree type, tree init)
   return decl;
 }
 
+/* Subroutine of build_vec_init.  Returns true if assigning to an array of
+   INNER_ELT_TYPE from INIT is trivial.  */
+
+static bool
+vec_copy_assign_is_trivial (tree inner_elt_type, tree init)
+{
+  tree fromtype = inner_elt_type;
+  if (real_lvalue_p (init))
+fromtype = cp_build_reference_type (fromtype, /*rval*/false);
+  return is_trivially_xible (MODIFY_EXPR, inner_elt_type, fromtype);
+}
+
 /* `build_vec_init' returns tree structure that performs
initialization of a vector of aggregate types.
 
@@ -3443,8 +3455,7 @@ build_vec_init (tree base, tree maxindex, tree init,
   && TREE_CODE (atype) == ARRAY_TYPE
   && TREE_CONSTANT (maxindex)
   && (from_array == 2
-	  ? (!CLASS_TYPE_P (inner_elt_type)
-	 || !TYPE_HAS_COMPLEX_COPY_ASSIGN (inner_elt_type))
+	  ? vec_copy_assign_is_trivial (inner_elt_type, init)
 	  : !TYPE_NEEDS_CONSTRUCTING (type))
   && ((TREE_CODE (init) == CONSTRUCTOR
 	   /* Don't do this if the CONSTRUCTOR might contain something
diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-array1.C b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
new file mode 100644
index 000..9075764
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
@@ -0,0 +1,55 @@
+// PR c++/66501
+// { dg-do run { target c++11 } }
+
+int total_size;
+
+struct Object
+{
+  int size = 0;
+
+  Object () = default;
+
+  ~Object () {
+total_size -= size;
+  }
+
+  Object (const Object &) = delete;
+  Object & operator= (const Object &) = delete;
+
+  Object (Object && b) {
+size = b.size;
+b.size = 0;
+  }
+
+  Object & operator= (Object && b) {
+if (this != & b) {
+  total_size -= size;
+  size = b.size;
+  b.size = 0;
+}
+return * this;
+  }
+
+  void grow () {
+size ++;
+total_size ++;
+  }
+};
+
+struct Container {
+  Object objects[2];
+};
+
+int main (void)
+{
+  Container container;
+
+  // grow some objects in the container
+  for (auto & object : container.objects)
+object.grow ();
+
+  // now empty it
+  container = Container ();
+
+  return total_size;
+}
commit 1c9edbe05d9e9437bcb7b3f621809461399aefe0
Author: Jason Merrill 
Date:   Tue Jun 23 22:02:30 2015 -0400

	PR c++/66501
	* init.c (vec_copy_assign_is_trivial): New.
	(build_vec_init): Use it.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 5cb7fc4..09a897f 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3379,6 +3379,21 @@ get_temp_regvar (tree type, tree init)
   return decl;
 }
 
+/* Subroutine of build_vec_init.  Returns true if assigning to an array of
+   INNER_ELT_TYPE from INIT is trivial.  */
+
+static bool
+vec_copy_assign_is_trivial (tree inner_elt_type, tree init)
+{
+  if (!CLASS_TYPE_P (inner_elt_type))
+return true;
+  if (cxx_dialect >= cxx11
+  && !real_lvalue_p (init)
+  && type_has_move_assign (inner_elt_type))
+return !TYPE_HAS_COMPLEX_MOVE_ASSIGN (inner_elt_type);
+  return TYPE_HAS_TRIVIAL_COPY_ASSIGN (inner_elt_type);
+}
+
 /* `build_vec_init' returns tree structure that performs
initialization of a vector of aggregate types.
 
@@ -3460,8 +3475,7 @@ build_vec_init (tree base, tree maxindex, tree init,
   && TREE_CODE (atype) == ARRAY_TYPE
   && TREE_CONSTANT (maxindex)
   && (from_array == 2
-	  ? (!CLASS_TYPE_P (inner_elt_type)
-	 || !TYPE_HAS_COMPLEX_COPY_ASSIGN (inner_elt_type))
+	  ? vec_copy_assign_is_trivial (inner_elt_type, init)
 	  : !TYPE_NEEDS_CONSTRUCTING (type))
   && ((TREE_CODE (init) == CONSTRUCTOR
 	   /* Don't do this if the CONSTRUCTOR might contain something
diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-array1.C b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
new file mode 100644
index 000..9075764
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
@@ -0,0 +1,55 @@
+// PR c++/66501
+// { dg-do run { target c++11 } }
+
+int total_size;
+
+struct Object
+{
+  int size = 0;
+
+  Object () = default;
+
+  ~Object () {
+total_size -= size;
+  }
+
+  

Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1

2015-06-24 Thread Christophe Lyon
On 22 June 2015 at 18:59, Jason Merrill  wrote:
> On 06/19/2015 08:23 PM, Mikhail Maltsev wrote:
>>
>> I see that version 5.2 is set as target milestone for this bug. Should I
>> backport the patch?
>
>
> Please.
>
> Jason
>

Hi Mikhail,

In the gcc-5-branch, I can see that your new inhibit-warn-2.C test
fails (targets ARM and AArch64).

I can see this error message in g++.log:
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:
In function 'void fn1()':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
error: 'typename A<(F
>::type>::value || B:: value)>::type D::operator=(Expr) [with Expr =
int; typename A<(F
>::type>::value || B:: value)>::type = int]' is private
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
error: within this context

Christophe.


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Jeff Law

On 06/24/2015 05:43 AM, Richard Biener wrote:


Note that ISTR code performing exactly the opposite transform in
fold-const.c ...


That's another reason why I'm worried about just doing the (negate ...)
thing without knowing whether the negate can be folded into anything else.


I'm not aware of anything here.
It's worth looking at -- I've certainly seen cases where we end up 
infinite recursion because we've got a transformation in once place (say 
match.pd) and its inverse elsewhere (fold-const.c).


Jeff


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Evandro Menezes
Benedikt,

You beat me to it! :-)  Do you have the implementation for dividing using
the Newton series as well?

I'm not sure that the series is always for all data types and on all
processors.  It would be useful to allow each AArch64 processor to enable
this or not depending on the data type.  BTW, do you have some tests showing
the speed up?

Thank you,

-- 
Evandro Menezes  Austin, TX

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
> Behalf Of Benedikt Huber
> Sent: Thursday, June 18, 2015 7:04
> To: gcc-patches@gcc.gnu.org
> Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
> systems.com
> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
> 
> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
and
> a Newton-Raphson step, respectively.
> There are ARMv8 implementations where this is faster than using fdiv and
> rsqrt.
> It runs three steps for double and two steps for float to achieve the
needed
> precision.
> 
> There is one caveat and open question.
> Since -ffast-math enables flush to zero intermediate values between
> approximation steps will be flushed to zero if they are denormal.
> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
> The test cases pass, but it is unclear to me whether this is expected
> behavior with -ffast-math.
> 
> The patch applies to commit:
> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> 
> Please consider including this patch.
> Thank you and best regards,
> Benedikt Huber
> 
> Benedikt Huber (1):
>   2015-06-15  Benedikt Huber  
> 
>  gcc/ChangeLog|   9 +++
>  gcc/config/aarch64/aarch64-builtins.c|  60 
>  gcc/config/aarch64/aarch64-protos.h  |   2 +
>  gcc/config/aarch64/aarch64-simd.md   |  27 
>  gcc/config/aarch64/aarch64.c |  63 +
>  gcc/config/aarch64/aarch64.md|   3 +
>  gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> +++
>  7 files changed, 277 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> 
> --
> 1.9.1
--- Begin Message ---
   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
   * config/aarch64/aarch64-protos.h: Declare.
   * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte
and frsqrts.
   * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code
in fast math mode.
   * config/aarch64/aarch64.md: Added enum entry.
   * testsuite/gcc.target/aarch64/rsqrt.c: Tests for single and double.
---
 gcc/ChangeLog|   9 +++
 gcc/config/aarch64/aarch64-builtins.c|  60 
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64-simd.md   |  27 
 gcc/config/aarch64/aarch64.c |  63 +
 gcc/config/aarch64/aarch64.md|   3 +
 gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
+++
 7 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c9b156f..690ebba 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-06-15  Benedikt Huber  
+
+   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
+   * config/aarch64/aarch64-protos.h: Declare.
+   * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte
and frsqrts.
+   * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation
code in fast math mode.
+   * config/aarch64/aarch64.md: Added enum entry.
+   * testsuite/gcc.target/aarch64/rsqrt.c: Tests for single and double.
+
 2015-06-14  Richard Sandiford  
 
* rtl.h (classify_insn): Declare.
diff --git a/gcc/config/aarch64/aarch64-builtins.c
b/gcc/config/aarch64/aarch64-builtins.c
index f7a39ec..484bb84 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -342,6 +342,8 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
+  AARCH64_BUILTIN_RSQRT,
+  AARCH64_BUILTIN_RSQRTF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include "aarch64-simd-builtins.def"
@@ -831,6 +833,32 @@ aarch64_init_crc32_builtins ()
 }
 
 void
+aarch64_add_builtin_rsqrt (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+  ftype = build_function_type_list (double_type_node, double_type_node,
NULL_TREE);
+
+  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt",
+ ftype,
+ AARCH64_BUILTIN_RSQRT,
+ BUILT_IN_MD,
+ NULL,
+ NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT] = fndecl;
+
+  tree ftypef = NULL;
+  ftypef 

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Eric Botcazou
> Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
> built for aarch64-linux-gnu and ran the gcc testsuite.

Yes, the patch is OK, modulo...

> The c6x back end also calls regrename_do_replace.  I am not set up to
> build or test on that target, and Bernd told me off-list that it would
> never fail on that target anyway so I have left that code alone.

... Bernd has obviously the final say here, but it would be better to add an 
assertion that it indeed did not fail (just build the cc1 as a sanity check).

Thanks for adding the missing head comment to regrename_do_replace.

-- 
Eric Botcazou


Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-24 Thread Ramana Radhakrishnan



On 12/06/15 21:50, Abe Skolnik wrote:

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

   * loads were always executed, even when they should have not been.
 Some source code could be rendered invalid due to null pointers
 that were OK in the original program because they were never
 dereferenced.

   * writes were if-converted via load/maybe-modify/store, which
 renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

   * loads are done through an indirection, reading either the correct
 data from the correct source [if the condition is true] or reading
 from the scratchpad and later ignoring this read result [if the
 condition is false].

   * writes are also done through an indirection, writing either to the
 correct destination [if the condition is true] or to the
 scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?


I can't approve or reject but this certainly looks like an improvement 
compared to where we are as we get rid of the data races.


The only gotcha I can think with this approach is that this introduces 
false dependencies that would cause "unnecessary" write-after-write 
hazards with the writes to the scratchpad when you unroll the loop - but 
that's not necessarily worse than where we are today.


Some fun stats from a previous Friday afternoon poke at this without 
doing any benchmarking as such.


In a bootstrap with BOOT_CFLAGS="-O2 -ftree-loop-if-convert-stores" and 
one without it, I see about 12.20% more csel's on an AArch64 bootstrap 
(goes from 7898 -> 8862) vs plain old -O2.


And I did see the one case in libquantum get sorted with this, though 
the performance results were funny let's say (+5% in one case, -1.5% on 
another core), I haven't analysed it deeply yet but it does look 
interesting.


regards
Ramana




Regards,

Abe




2015-06-12  Sebastian Pop  
 Abe Skolnik  

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
---
  gcc/ChangeLog  |  28 ++
  gcc/doc/invoke.texi|  18 +-
  gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
  gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
  gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
  gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
  gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
  gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
  gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
  .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
  .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
  gcc/tree-data-ref.c|  13 +-
  gcc/tree-dat

Re: C PATCH to use is_global_var

2015-06-24 Thread Jeff Law

On 06/24/2015 04:22 AM, Marek Polacek wrote:

This patch makes the C FE use the predicate is_global_var in place of direct

   TREE_STATIC (t) || DECL_EXTERNAL (t)

It should improve readability a bit and make predicates easier to follow.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  

* c-common.c (handle_no_reorder_attribute): Use is_global_var.
* cilk.c (extract_free_variables): Likewise.

* c-decl.c: Use is_global_var throughout.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.
OK.  If you find other places where you can use is_global_var to replace 
the TREE_STATIC || DECL_EXTERNAL check, consider them pre-approved.


jeff



Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Dr. Philipp Tomsich
Evandro,

We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal sqrt.

Also, the “reciprocal divide” patches are floating around in various of our 
git-tree, but 
aren’t ready for public consumption, yet… I’ll leave Benedikt to comment on 
potential 
timelines for getting that pushed out.

Best,
Philipp.

> On 24 Jun 2015, at 18:42, Evandro Menezes  wrote:
> 
> Benedikt,
> 
> You beat me to it! :-)  Do you have the implementation for dividing using
> the Newton series as well?
> 
> I'm not sure that the series is always for all data types and on all
> processors.  It would be useful to allow each AArch64 processor to enable
> this or not depending on the data type.  BTW, do you have some tests showing
> the speed up?
> 
> Thank you,
> 
> -- 
> Evandro Menezes  Austin, TX
> 
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On
>> Behalf Of Benedikt Huber
>> Sent: Thursday, June 18, 2015 7:04
>> To: gcc-patches@gcc.gnu.org
>> Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
>> systems.com
>> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>> estimation in -ffast-math
>> 
>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
> and
>> a Newton-Raphson step, respectively.
>> There are ARMv8 implementations where this is faster than using fdiv and
>> rsqrt.
>> It runs three steps for double and two steps for float to achieve the
> needed
>> precision.
>> 
>> There is one caveat and open question.
>> Since -ffast-math enables flush to zero intermediate values between
>> approximation steps will be flushed to zero if they are denormal.
>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
>> The test cases pass, but it is unclear to me whether this is expected
>> behavior with -ffast-math.
>> 
>> The patch applies to commit:
>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
>> 
>> Please consider including this patch.
>> Thank you and best regards,
>> Benedikt Huber
>> 
>> Benedikt Huber (1):
>>  2015-06-15  Benedikt Huber  
>> 
>> gcc/ChangeLog|   9 +++
>> gcc/config/aarch64/aarch64-builtins.c|  60 
>> gcc/config/aarch64/aarch64-protos.h  |   2 +
>> gcc/config/aarch64/aarch64-simd.md   |  27 
>> gcc/config/aarch64/aarch64.c |  63 +
>> gcc/config/aarch64/aarch64.md|   3 +
>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
>> +++
>> 7 files changed, 277 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>> 
>> --
>> 1.9.1
> 



Re: C PATCH to use VAR_P

2015-06-24 Thread Jeff Law

On 06/24/2015 06:25 AM, Marek Polacek wrote:

Similarly to what Gaby did in 2013 for C++
(), this patch
makes the c/ and c-family/ code use VAR_P rather than

   TREE_CODE (t) == VAR_DECL

(This is on top of the previous patch with is_global_var.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  

* array-notation-common.c: Use VAR_P throughout.
* c-ada-spec.c: Likewise.
* c-common.c: Likewise.
* c-format.c: Likewise.
* c-gimplify.c: Likewise.
* c-omp.c: Likewise.
* c-pragma.c: Likewise.
* c-pretty-print.c: Likewise.
* cilk.c: Likewise.

* c-array-notation.c: Use VAR_P throughout.
* c-decl.c: Likewise.
* c-objc-common.c: Likewise.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.
I spot checked mostly for VAR_P vs !VAR_P correctness and everything 
looked correct.  OK for the trunk.  Consider any follow-ups to use VAR_P 
in a similar way pre-approved.


jeff



Re: C PATCH to use VAR_P

2015-06-24 Thread Jeff Law

On 06/24/2015 06:45 AM, Marek Polacek wrote:

On Wed, Jun 24, 2015 at 02:37:30PM +0200, Uros Bizjak wrote:

Hello!


Similarly to what Gaby did in 2013 for C++
(), this patch
makes the c/ and c-family/ code use VAR_P rather than

   TREE_CODE (t) == VAR_DECL

(This is on top of the previous patch with is_global_var.)


You could also use VAR_OR_FUNCTION_DECL, e.g. in the part below.


Sure, I thought I had dealt with VAR_OR_FUNCTION_DECL_P in
, but
I must have missed this.  Thanks,

Consider that follow-up approved as well.
jeff


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Eric Botcazou
> Yes, the patch is OK, modulo...

But you also need the approval of an ARM maintainer.

-- 
Eric Botcazou


Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-24 Thread Jeff Law

On 06/24/2015 10:50 AM, Ramana Radhakrishnan wrote:



On 12/06/15 21:50, Abe Skolnik wrote:

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

   * loads were always executed, even when they should have not been.
 Some source code could be rendered invalid due to null pointers
 that were OK in the original program because they were never
 dereferenced.

   * writes were if-converted via load/maybe-modify/store, which
 renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

   * loads are done through an indirection, reading either the correct
 data from the correct source [if the condition is true] or reading
 from the scratchpad and later ignoring this read result [if the
 condition is false].

   * writes are also done through an indirection, writing either to the
 correct destination [if the condition is true] or to the
 scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?


I can't approve or reject but this certainly looks like an improvement
compared to where we are as we get rid of the data races.
Right.  I was going to assume the primary purpose to to address 
correctness issues, not increase the amount of if-conversion for 
optimization purposes.


I have a couple of high level concerns around the scratchpad usage 
(aliasing, write-write hazards), but until I dig into the patch I don't 
know if they're real issues or not.



Jeff


Re: [02/13] Replace handle_cache_entry with new interface

2015-06-24 Thread Jeff Law

On 06/24/2015 02:16 AM, Richard Sandiford wrote:

So for all the keep_cache_entry functions, I guess they're trivial
enough that a function comment probably isn't needed.


Yeah.  For cases like this where the function is implementing a defined
interface (described in hash-table.h), I think it's better to only have
comments for implementations that are doing something non-obvious.

That works for me.




Presumably no good way to share the trivial implementation?


Probably not without sharing the other parts of the traits in some way.
That might be another possible cleanup :-)
I'll let you decide whether or not to pursue.  I'd like to hope that ICF 
would help us here.


jeff


Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-24 Thread Jeff Law

On 06/22/2015 10:27 AM, Alan Lawrence wrote:



My main thought concerns the direction we are travelling here. A major
reason why we do if-conversion is to enable vectorization. Is this is
targetted at gathering/scattering loads? Following vectorization,
different elements of the vector being loaded/stored may have to go
to/from the scratchpad or to/from main memory.

Or, are we aiming at the case where the predicate or address are
invariant? That seems unlikely - loop unswitching would be better for
the predicate; loading from an address, we'd just peel and hoist;
storing, this'd result in the address holding the last value written, at
exit from the loop, a curious idiom. Where the predicate/address is
invariant across the vector? (!)

Or, at we aiming at non-vectorized code?
I think we're aiming at correctness issues, particularly WRT not 
allowing the optimizers to introduce new data races for C11/C++11.





Re. creation of scratchpads:
(1) Should the '64' byte size be the result of scanning the
function, for the largest data size to which we store? (ideally,
conditionally store!)
I suspect most functions have conditional stores, but far fewer have 
conditional stores that we'd like to if-convert.  ISTM that if we can 
lazily allocate the scratchpad that'd be best.   If this were an RTL 
pass, then I'd say query the backend for the widest mode store insn and 
use that to size the scratchpad.  We may have something similar we can 
do in gimple without resorting querying insn backend capabilities. 
Perhaps walking the in-scope addressable variables or somesuch.




(2) Allocating only once per function: if we had one scratchpad per
loop, it could/would live inside the test of "gimple_build_call_internal
(IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more
loops in the function, but then fail to vectorize them, we'll leave the
scratchpad around for later phases to clean up. Is that OK?
If the scratchpad is local to a function, then I'd expect we'd clean it 
up just like any other unused local.  Once it's a global, then all bets 
are off.


Anyway, I probably should just look at the patch before making more 
comments.


jeff



Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Benedikt Huber
Evandro,

Yes, we also have the 1/x approximation.
However we do not have the test cases yet, and it also would need some clean up.
I am going to provide a patch for that soon (say next week).
Also, for this optimization we have *not* yet found a benchmark with 
significant improvements.

Best Regards,
Benedikt


> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich 
>  wrote:
> 
> Evandro,
> 
> We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal 
> sqrt.
> 
> Also, the “reciprocal divide” patches are floating around in various of our 
> git-tree, but
> aren’t ready for public consumption, yet… I’ll leave Benedikt to comment on 
> potential
> timelines for getting that pushed out.
> 
> Best,
> Philipp.
> 
>> On 24 Jun 2015, at 18:42, Evandro Menezes  wrote:
>> 
>> Benedikt,
>> 
>> You beat me to it! :-)  Do you have the implementation for dividing using
>> the Newton series as well?
>> 
>> I'm not sure that the series is always for all data types and on all
>> processors.  It would be useful to allow each AArch64 processor to enable
>> this or not depending on the data type.  BTW, do you have some tests showing
>> the speed up?
>> 
>> Thank you,
>> 
>> --
>> Evandro Menezes  Austin, TX
>> 
>>> -Original Message-
>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
>> On
>>> Behalf Of Benedikt Huber
>>> Sent: Thursday, June 18, 2015 7:04
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
>>> systems.com
>>> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>>> estimation in -ffast-math
>>> 
>>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
>> and
>>> a Newton-Raphson step, respectively.
>>> There are ARMv8 implementations where this is faster than using fdiv and
>>> rsqrt.
>>> It runs three steps for double and two steps for float to achieve the
>> needed
>>> precision.
>>> 
>>> There is one caveat and open question.
>>> Since -ffast-math enables flush to zero intermediate values between
>>> approximation steps will be flushed to zero if they are denormal.
>>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
>>> The test cases pass, but it is unclear to me whether this is expected
>>> behavior with -ffast-math.
>>> 
>>> The patch applies to commit:
>>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
>>> 
>>> Please consider including this patch.
>>> Thank you and best regards,
>>> Benedikt Huber
>>> 
>>> Benedikt Huber (1):
>>> 2015-06-15  Benedikt Huber  
>>> 
>>> gcc/ChangeLog|   9 +++
>>> gcc/config/aarch64/aarch64-builtins.c|  60 
>>> gcc/config/aarch64/aarch64-protos.h  |   2 +
>>> gcc/config/aarch64/aarch64-simd.md   |  27 
>>> gcc/config/aarch64/aarch64.c |  63 +
>>> gcc/config/aarch64/aarch64.md|   3 +
>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
>>> +++
>>> 7 files changed, 277 insertions(+)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>>> 
>>> --
>>> 1.9.1
>> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: C PATCH to use is_global_var

2015-06-24 Thread Joseph Myers
On Wed, 24 Jun 2015, Marek Polacek wrote:

> diff --git gcc/c/c-decl.c gcc/c/c-decl.c
> index fc1fdf9..ab54db9 100644
> --- gcc/c/c-decl.c
> +++ gcc/c/c-decl.c
> @@ -2650,9 +2650,8 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, 
> tree oldtype)
> tree_code_size (TREE_CODE (olddecl)) - sizeof (struct 
> tree_decl_common));
> olddecl->decl_with_vis.symtab_node = snode;
>  
> -   if ((DECL_EXTERNAL (olddecl)
> -|| TREE_PUBLIC (olddecl)
> -|| TREE_STATIC (olddecl))
> +   if ((is_global_var (olddecl)
> +|| TREE_PUBLIC (olddecl))
> && DECL_SECTION_NAME (newdecl) != NULL)
>   set_decl_section_name (olddecl, DECL_SECTION_NAME (newdecl));
>  

At least this case covers both FUNCTION_DECL and VAR_DECL.  If 
is_global_var is appropriate for functions as well as variables, I think 
it should be renamed (and have its comment updated to explain what it 
means for functions).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][GSoC] Extend shared_ptr to support arrays

2015-06-24 Thread Fan You
Hi,

Here is the revised patch including all the test case.

This can also be seen at  on branch


Any comments?

2015-06-23 12:19 GMT+08:00 Tim Shen :
> On Sun, Jun 21, 2015 at 3:50 AM, Tim Shen  wrote:
>> Quickly looked at __shared_ptr<__libfund_v1<_Tp>, _Lp>; will look at
>> the rest parts later.
>
> All suggestions apply for all occursions, not just for the quoted code.
>
> +  // helpers for std::experimental::enable_shared_from_this
> +
> +  template
> +struct __helper_for_experimental_enable_shared
> +{
> +  void _Call_M_assign(__weak_ptr<__libfund_v1<_Tp>, _Lp>& __wp,
> + _Tp* __ptr,
> + const __shared_count<_Lp>& __refcount)
> +   { __wp._M_assign(__ptr, __refcount); }
> +};
> Make the function it static; Suggested class name: __weak_ptr_friend,
> function name _S_assign.
>
> +  // Used by __enable_shared_from_this.
> +  void
> +  _M_assign(_Tp* __ptr, const __shared_count<_Lp>& __refcount) noexcept
> +  {
> +   _M_ptr = __ptr;
> +   _M_refcount = __refcount;
> +  }
> element_type* __ptr?
>
> Also need a _Compatible; possible implementation:
>
> template
>   struct __sp_compatible_helper
>   {  static constexpr bool value = std::is_convertible<_From_type*,
> _To_type*>::value;  };
>
> template
>   struct __sp_compatible_helper<_Tp[_Nm], _Tp[]>
>   { static constexpr bool value = true; };
>
> ...
>
> template
>   using _Compatible = typename std::enable_if<__sp_compatible<_Tp1,
> _Tp>::value>::type;
>
> +   template
> + inline bool
> + operator<(const shared_ptr<_Tp1>& __a,
> +  const shared_ptr<_Tp2>& __b) noexcept
> + {
> +   using _Tp1_RE = typename remove_extent<_Tp1>::type;
> +   using _Tp2_RE = typename remove_extent<_Tp2>::type;
> +   using _CT = typename std::common_type<_Tp1_RE*, _Tp2_RE*>::type;
> +   return std::less<_CT>()(__a.get(), __b.get());
> + }
> using _Tp1_RE = typename shared_ptr<_Tp1>::element_type;
>
> +   // 8.2.1.3, shared_ptr casts
> +   template
> + inline shared_ptr<_Tp>
> + static_pointer_cast(const shared_ptr<_Tp1>& __r) noexcept
> + { shared_ptr<_Tp>(__r, static_cast shared_ptr<_Tp>::element_type*>(__r.get())); }
> +
> Missing "return". You can turn on -Wsystem-headers to check for warnings.
>
>
> --
> Regards,
> Tim Shen


a
Description: Binary data


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Evandro Menezes
Benedikt,

Are you developing the reciprocal approximation just for 1/x proper or for any 
division, as in x/y = x * 1/y?

Thank you,

-- 
Evandro Menezes  Austin, TX


> -Original Message-
> From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com]
> Sent: Wednesday, June 24, 2015 12:11
> To: Dr. Philipp Tomsich
> Cc: Evandro Menezes; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
> 
> Evandro,
> 
> Yes, we also have the 1/x approximation.
> However we do not have the test cases yet, and it also would need some clean
> up.
> I am going to provide a patch for that soon (say next week).
> Also, for this optimization we have *not* yet found a benchmark with
> significant improvements.
> 
> Best Regards,
> Benedikt
> 
> 
> > On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich  systems.com> wrote:
> >
> > Evandro,
> >
> > We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal
> sqrt.
> >
> > Also, the “reciprocal divide” patches are floating around in various
> > of our git-tree, but aren’t ready for public consumption, yet… I’ll
> > leave Benedikt to comment on potential timelines for getting that pushed
> out.
> >
> > Best,
> > Philipp.
> >
> >> On 24 Jun 2015, at 18:42, Evandro Menezes  wrote:
> >>
> >> Benedikt,
> >>
> >> You beat me to it! :-)  Do you have the implementation for dividing
> >> using the Newton series as well?
> >>
> >> I'm not sure that the series is always for all data types and on all
> >> processors.  It would be useful to allow each AArch64 processor to
> >> enable this or not depending on the data type.  BTW, do you have some
> >> tests showing the speed up?
> >>
> >> Thank you,
> >>
> >> --
> >> Evandro Menezes  Austin, TX
> >>
> >>> -Original Message-
> >>> From: gcc-patches-ow...@gcc.gnu.org
> >>> [mailto:gcc-patches-ow...@gcc.gnu.org]
> >> On
> >>> Behalf Of Benedikt Huber
> >>> Sent: Thursday, June 18, 2015 7:04
> >>> To: gcc-patches@gcc.gnu.org
> >>> Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
> >>> systems.com
> >>> Subject: [PATCH] [aarch64] Implemented reciprocal square root
> >>> (rsqrt) estimation in -ffast-math
> >>>
> >>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt
> >>> estimation
> >> and
> >>> a Newton-Raphson step, respectively.
> >>> There are ARMv8 implementations where this is faster than using fdiv
> >>> and rsqrt.
> >>> It runs three steps for double and two steps for float to achieve
> >>> the
> >> needed
> >>> precision.
> >>>
> >>> There is one caveat and open question.
> >>> Since -ffast-math enables flush to zero intermediate values between
> >>> approximation steps will be flushed to zero if they are denormal.
> >>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
> >>> The test cases pass, but it is unclear to me whether this is
> >>> expected behavior with -ffast-math.
> >>>
> >>> The patch applies to commit:
> >>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> >>>
> >>> Please consider including this patch.
> >>> Thank you and best regards,
> >>> Benedikt Huber
> >>>
> >>> Benedikt Huber (1):
> >>> 2015-06-15  Benedikt Huber  
> >>>
> >>> gcc/ChangeLog|   9 +++
> >>> gcc/config/aarch64/aarch64-builtins.c|  60 
> >>> gcc/config/aarch64/aarch64-protos.h  |   2 +
> >>> gcc/config/aarch64/aarch64-simd.md   |  27 
> >>> gcc/config/aarch64/aarch64.c |  63 +
> >>> gcc/config/aarch64/aarch64.md|   3 +
> >>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> >>> +++
> >>> 7 files changed, 277 insertions(+)
> >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> >>>
> >>> --
> >>> 1.9.1
> >> 
> >




[PR fortran/66528] unbalanced IF/ENDIF with -fmax-errors=1 causes invalid free

2015-06-24 Thread Manuel López-Ibáñez
The problem is that diagnostic_action_after_output tries to delete the
active pretty-printer which tries to delete its output_buffer, which
is normally dynamically allocated via placement-new, but the
output_buffer used by the error_buffer of Fortran is statically
allocated. Being statically allocated simplifies a lot pushing/poping
several instances of error_buffer.

The solution I found is to reset the active output_buffer back to the
default one before calling diagnostic_action_after_output. This is a
bit ugly, because this function does use the output_buffer, however,
at the point that Fortran calls it, both are in an equivalent state,
thus there is no visible difference.


Bootstrapped and regression tested on x86_64-linux-gnu.

2015-06-24  Manuel López-Ibáñez  

PR fortran/66528
* gfortran.dg/maxerrors.f90: New test.

gcc/fortran/ChangeLog:

2015-06-24  Manuel López-Ibáñez  

PR fortran/66528
* error.c (gfc_warning_check): Restore the default output_buffer
before calling diagnostic_action_after_output.
(gfc_error_check): Likewise.
(gfc_diagnostics_init): Add comment.
Index: gcc/testsuite/gfortran.dg/maxerrors.f90
===
--- gcc/testsuite/gfortran.dg/maxerrors.f90 (revision 0)
+++ gcc/testsuite/gfortran.dg/maxerrors.f90 (revision 0)
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-fmax-errors=1" }
+! PR66528
+! { dg-prune-output "compilation terminated" }
+program main
+  read (*,*) n
+  if (n<0) then
+print *,foo
+  end ! { dg-error "END IF statement expected" }
+print *,bar
+end program main
+
Index: gcc/fortran/error.c
===
--- gcc/fortran/error.c (revision 224844)
+++ gcc/fortran/error.c (working copy)
@@ -1247,24 +1247,23 @@ gfc_clear_warning (void)
If so, print the warning.  */
 
 void
 gfc_warning_check (void)
 {
-  /* This is for the new diagnostics machinery.  */
   if (! gfc_output_buffer_empty_p (pp_warning_buffer))
 {
   pretty_printer *pp = global_dc->printer;
   output_buffer *tmp_buffer = pp->buffer;
   pp->buffer = pp_warning_buffer;
   pp_really_flush (pp);
   warningcount += warningcount_buffered;
   werrorcount += werrorcount_buffered;
   gcc_assert (warningcount_buffered + werrorcount_buffered == 1);
+  pp->buffer = tmp_buffer;
   diagnostic_action_after_output (global_dc, 
  warningcount_buffered 
  ? DK_WARNING : DK_ERROR);
-  pp->buffer = tmp_buffer;
 }
 }
 
 
 /* Issue an error.  */
@@ -1379,12 +1378,12 @@ gfc_error_check (void)
   output_buffer *tmp_buffer = pp->buffer;
   pp->buffer = pp_error_buffer;
   pp_really_flush (pp);
   ++errorcount;
   gcc_assert (gfc_output_buffer_empty_p (pp_error_buffer));
-  diagnostic_action_after_output (global_dc, DK_ERROR);
   pp->buffer = tmp_buffer;
+  diagnostic_action_after_output (global_dc, DK_ERROR);
   return true;
 }
 
   return false;
 }
@@ -1470,10 +1469,12 @@ gfc_diagnostics_init (void)
   diagnostic_format_decoder (global_dc) = gfc_format_decoder;
   global_dc->caret_chars[0] = '1';
   global_dc->caret_chars[1] = '2';
   pp_warning_buffer = new (XNEW (output_buffer)) output_buffer ();
   pp_warning_buffer->flush_p = false;
+  /* pp_error_buffer is statically allocated.  This simplifies memory
+ management when using gfc_push/pop_error. */
   pp_error_buffer = &(error_buffer.buffer);
   pp_error_buffer->flush_p = false;
 }
 
 void


Re: Debug mode enhancements

2015-06-24 Thread François Dumont
Hello

Is this one ok ?

François


On 12/06/2015 19:11, François Dumont wrote:
> Hi
>
> This is a patch to:
>
> - Enhance __get_distance to get a better feedback about distance between
> iterators so that we can take sharper decision about what is right or
> not. This function is now aware about safe iterators and leverage on
> those a little like std::distance does with C++ 11 list::iterator.
> - Make debug mode aware about iterator adapters reverse_iterator and
> move_iterator.
> - Thanks to previous points this patch also extend situations where it
> is possible to remove debug layers on iterators to lower performance
> hint of this mode. We now detect at runtime if we know enough about the
> iterator range to get rid of the potential debug layer.
>
> For the last point I introduced __gnu_debug::__unsafe which remove
> debug layer unconditionally in opposition to __gnu_debug::__base which
> do so only for random access iterator. The latter has been kept to be
> used in context of constructors.
>
> I had to introduced new debug headers to limit impact in
> stl_iterator.h. We shall not include debug.h here as the purpose is not
> to inject debug checks in the normal code.
>
> Note that the new __get_distance will be very useful to implement
> proper debug algos
>
> Here is the tricky part for me, the ChangeLog entry, much more
> complicated than the code :-)
>
> * include/bits/stl_iterator_base_types.h (_Iter_base): Limit definition
> to pre-C++11 mode.
> * include/debug/functions.h
> (__gnu_debug::__valid_range, __gnu_debug::__base): Move...
> * include/debug/safe_iterator.h
> (__gnu_debug::_Sequence_traits): New.
> (__gnu_debug::__get_distance_from_begin): New.
> (__gnu_debug::__get_distance_to_end): New.
> (__gnu_debug::_Safe_iterator<>::_M_valid_range): Expose iterator range
> distance information. Add optional check_dereferenceable parameter,
> default true.
> (__gnu_debug::_Distance_precision, __gnu_debug::__get_distance): Move
> default definition...
> (__gnu_debug::__get_distance): New overload for _Safe_iterator.
> (__gnu_debug::__unsafe): Likewise.
> * include/debug/helper_functions.h: ...here. New.
> (__gnu_debug::__unsafe): New helper function to remove safe iterator
> layer.
> * include/debug/stl_iterator.h: New. Include latter.
> * include/bits/stl_iterator.h: Include latter in debug mode.
> * include/debug/stl_iterator.tcc: Adapt.
> * include/debug/safe_local_iterator.h (__gnu_debug::__get_distance): Add
> overload for _Safe_local_iterator.
> (__gnu_debug::__unsafe): Likewise.
> * include/debug/safe_local_iterator.tcc: Adapt.
> * include/debug/macros.h (__glibcxx_check_valid_range2): New.
> (__glibcxx_check_insert_range): Add _Dist parameter.
> (__glibcxx_check_insert_range_after): Likewise.
> * include/debug/deque (deque<>::assign): Remove iterator debug layer
> when possible.
> (deque<>::insert): Likewise.
> * include/debug/forward_list (__glibcxx_check_valid_fl_range): New.
> (forward_list<>::splice_after): Use latter.
> (forward_list<>::assign): Remove iterator debug layer when possible.
> (forward_list<>::insert_after): Likewise.
> (__gnu_debug::_Sequence_traits<>): Partial specialization.
> * include/debug/list (list<>::assign): Remove iterator debug layer when
> possible.
> (list<>::insert): Likewise.
> [__gnu_debug::_Sequence_traits<>]: Partial specialization pre C++11 ABI.
> * include/debug/map.h (map<>::insert): Remove iterator debug layer when
> possible.
> * include/debug/multimap.h (multimap<>::insert): Likewise.
> * include/debug/set.h (set<>::insert): Likewise.
> * include/debug/multiset.h (multiset<>::insert): Likewise.
> * include/debug/string (basic_string<>::append, basic_string<>::assign,
> basic_string<>::insert, basic_string<>::replace): Likewise.
> * include/debug/unordered_map
> (unordered_map<>::insert, unordered_multimap<>::insert): Likewise.
> * include/debug/unordered_set
> (unordered_set<>::insert, unordered_multiset<>insert): Likewise.
> * include/debug/vector
> (vector<>::assign, vector<>::insert): Likewise.
> * include/Makefile.am: Add new debug headers.
> * include/Makefile.in: Regenerate.
>
> Being fully tested under Linux x86_64.
>
> François
>



C++ PATCH for c++/66647 (ICE with alias templates)

2015-06-24 Thread Jason Merrill
Another issue with dependent alias template specializations.  In this 
case, we were dealing with an alias template that expands to a function 
type; dependent_type_p_r was looking at the function type and never 
considering whether the alias specialization itself was dependent. 
Fixed by checking that sooner.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit c9afe9335ac08dcbbfbf4aeee747d76ad36e74d1
Author: Jason Merrill 
Date:   Wed Jun 24 15:28:06 2015 -0400

	PR c++/66647
	* pt.c (dependent_type_p_r): Check for dependent alias template
	specialization sooner.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 8800af8..b63c0d4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20992,6 +20992,12 @@ dependent_type_p_r (tree type)
 	names a dependent type.  */
   if (TREE_CODE (type) == TYPENAME_TYPE)
 return true;
+
+  /* An alias template specialization can be dependent even if the
+ resulting type is not.  */
+  if (dependent_alias_template_spec_p (type))
+return true;
+
   /* -- a cv-qualified type where the cv-unqualified type is
 	dependent.
  No code is necessary for this bullet; the code below handles
@@ -21043,10 +21049,6 @@ dependent_type_p_r (tree type)
 	   && (any_dependent_template_arguments_p
 	   (INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (type)
 return true;
-  /* For an alias template specialization, check the arguments both to the
- class template and the alias template.  */
-  else if (dependent_alias_template_spec_p (type))
-return true;
 
   /* All TYPEOF_TYPEs, DECLTYPE_TYPEs, and UNDERLYING_TYPEs are
  dependent; if the argument of the `typeof' expression is not
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-49.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-49.C
new file mode 100644
index 000..5fd3b65
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-49.C
@@ -0,0 +1,54 @@
+// PR c++/66647
+// { dg-do compile { target c++11 } }
+
+template  struct A
+{
+  static constexpr _Tp value = 1;
+};
+template  class B
+{
+public:
+  template  struct rebind
+  {
+  };
+};
+
+template  class C
+{
+  template 
+  static A _S_chk (typename _Alloc2::template rebind<_Tp2> *);
+
+public:
+  using __type = decltype (_S_chk<_Alloc, _Tp> (0));
+};
+
+template ::__type::value>
+struct D;
+template  struct D<_Alloc, _Tp, 1>
+{
+  typedef typename _Alloc::template rebind<_Tp> __type;
+};
+template  struct F
+{
+  template  using rebind_alloc = typename D<_Alloc, _Tp>::__type;
+};
+template  struct __alloc_traits
+{
+  template  struct rebind
+  {
+typedef typename F<_Alloc>::template rebind_alloc other;
+  };
+};
+template  struct G
+{
+  typename __alloc_traits<_Alloc>::template rebind::other _Tp_alloc_type;
+};
+template  > class vector : G<_Alloc>
+{
+};
+
+template  using tfuncptr = void();
+template  struct H
+{
+  vector > funcs;
+};


Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Dr. Philipp Tomsich
Evandro,

Shouldn't ‘execute_cse_reciprocals_1’ take care of this, once the 
reciprocal-division is implemented?
Do you think there’s additional work needed to catch all cases/opportunities?

Best,
Philipp.

> On 24 Jun 2015, at 20:19, Evandro Menezes  wrote:
> 
> Benedikt,
> 
> Are you developing the reciprocal approximation just for 1/x proper or for 
> any division, as in x/y = x * 1/y?
> 
> Thank you,
> 
> -- 
> Evandro Menezes  Austin, TX
> 
> 
>> -Original Message-
>> From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com]
>> Sent: Wednesday, June 24, 2015 12:11
>> To: Dr. Philipp Tomsich
>> Cc: Evandro Menezes; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>> estimation in -ffast-math
>> 
>> Evandro,
>> 
>> Yes, we also have the 1/x approximation.
>> However we do not have the test cases yet, and it also would need some clean
>> up.
>> I am going to provide a patch for that soon (say next week).
>> Also, for this optimization we have *not* yet found a benchmark with
>> significant improvements.
>> 
>> Best Regards,
>> Benedikt
>> 
>> 
>>> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich > systems.com> wrote:
>>> 
>>> Evandro,
>>> 
>>> We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal
>> sqrt.
>>> 
>>> Also, the “reciprocal divide” patches are floating around in various
>>> of our git-tree, but aren’t ready for public consumption, yet… I’ll
>>> leave Benedikt to comment on potential timelines for getting that pushed
>> out.
>>> 
>>> Best,
>>> Philipp.
>>> 
 On 24 Jun 2015, at 18:42, Evandro Menezes  wrote:
 
 Benedikt,
 
 You beat me to it! :-)  Do you have the implementation for dividing
 using the Newton series as well?
 
 I'm not sure that the series is always for all data types and on all
 processors.  It would be useful to allow each AArch64 processor to
 enable this or not depending on the data type.  BTW, do you have some
 tests showing the speed up?
 
 Thank you,
 
 --
 Evandro Menezes  Austin, TX
 
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org
> [mailto:gcc-patches-ow...@gcc.gnu.org]
 On
> Behalf Of Benedikt Huber
> Sent: Thursday, June 18, 2015 7:04
> To: gcc-patches@gcc.gnu.org
> Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
> systems.com
> Subject: [PATCH] [aarch64] Implemented reciprocal square root
> (rsqrt) estimation in -ffast-math
> 
> arch64 offers the instructions frsqrte and frsqrts, for rsqrt
> estimation
 and
> a Newton-Raphson step, respectively.
> There are ARMv8 implementations where this is faster than using fdiv
> and rsqrt.
> It runs three steps for double and two steps for float to achieve
> the
 needed
> precision.
> 
> There is one caveat and open question.
> Since -ffast-math enables flush to zero intermediate values between
> approximation steps will be flushed to zero if they are denormal.
> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
> The test cases pass, but it is unclear to me whether this is
> expected behavior with -ffast-math.
> 
> The patch applies to commit:
> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> 
> Please consider including this patch.
> Thank you and best regards,
> Benedikt Huber
> 
> Benedikt Huber (1):
> 2015-06-15  Benedikt Huber  
> 
> gcc/ChangeLog|   9 +++
> gcc/config/aarch64/aarch64-builtins.c|  60 
> gcc/config/aarch64/aarch64-protos.h  |   2 +
> gcc/config/aarch64/aarch64-simd.md   |  27 
> gcc/config/aarch64/aarch64.c |  63 +
> gcc/config/aarch64/aarch64.md|   3 +
> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> +++
> 7 files changed, 277 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> 
> --
> 1.9.1
 
>>> 
> 
> 



Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data

2015-06-24 Thread Ilya Verbin
On Wed, Jun 24, 2015 at 13:39:03 +0200, Jakub Jelinek wrote:
> Thinking about this more, for always modifier this isn't really sufficient.
> Consider:
> void
> foo (int *p)
> {
>   #pragma omp target data (alloc:p[0:32])
>   {
> #pragma omp target data (always, from:p[7:9])
> {
>   ...
> }
>   }
> }
> If all we record is the corresponding splay_tree and the flags
> (from/always_from), then this would try to copy from the device
> the whole array section, rather than just the small portion of it.
> So, supposedly in addition to the splay_tree for always from case we also
> need to remember e.g. [relative offset, length] within the splay tree
> object.

Indeed, here is the fix, make check-target-libgomp passed.


libgomp/
* libgomp.h (struct target_var_desc): Add offset and length.
* target.c (gomp_map_vars_existing): New argument tgt_var, fill it.
(gomp_map_vars): Move filling of tgt->list[i] into
gomp_map_vars_existing.  Add missed case GOMP_MAP_ALWAYS_FROM.
(gomp_unmap_vars): Add list[i].offset to host and target addresses,
use list[i].length instead of k->host_end - k->host_start.
* testsuite/libgomp.c/target-11.c: Extend for testing array sections.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index bd17828..c48e708 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -644,6 +644,12 @@ struct target_var_desc {
   bool copy_from;
   /* True if data always should be copied from device to host at the end.  */
   bool always_copy_from;
+  /* Used for unmapping of array sections, can be nonzero only when
+ always_copy_from is true.  */
+  uintptr_t offset;
+  /* Used for unmapping of array sections, can be less than the size of the
+ whole object only when always_copy_from is true.  */
+  uintptr_t length;
 };
 
 struct target_mem_desc {
diff --git a/libgomp/target.c b/libgomp/target.c
index b1640c1..a394e95 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -149,8 +149,15 @@ resolve_device (int device_id)
 
 static inline void
 gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
-   splay_tree_key newn, unsigned char kind)
+   splay_tree_key newn, struct target_var_desc *tgt_var,
+   unsigned char kind)
 {
+  tgt_var->key = oldn;
+  tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
+  tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
+  tgt_var->offset = newn->host_start - oldn->host_start;
+  tgt_var->length = newn->host_end - newn->host_start;
+
   if ((kind & GOMP_MAP_FLAG_FORCE)
   || oldn->host_start > newn->host_start
   || oldn->host_end < newn->host_end)
@@ -276,13 +283,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
cur_node.host_end = cur_node.host_start + sizeof (void *);
   splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
   if (n)
-   {
- tgt->list[i].key = n;
- tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
- tgt->list[i].always_copy_from
-   = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
- gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask);
-   }
+   gomp_map_vars_existing (devicep, n, &cur_node, &tgt->list[i],
+   kind & typemask);
   else
{
  tgt->list[i].key = NULL;
@@ -367,13 +369,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
  k->host_end = k->host_start + sizeof (void *);
splay_tree_key n = splay_tree_lookup (mem_map, k);
if (n)
- {
-   tgt->list[i].key = n;
-   tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
-   tgt->list[i].always_copy_from
- = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
-   gomp_map_vars_existing (devicep, n, k, kind & typemask);
- }
+ gomp_map_vars_existing (devicep, n, k, &tgt->list[i],
+ kind & typemask);
else
  {
size_t align = (size_t) 1 << (kind >> rshift);
@@ -385,6 +382,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
tgt->list[i].always_copy_from
  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
+   tgt->list[i].offset = 0;
+   tgt->list[i].length = k->host_end - k->host_start;
k->refcount = 1;
k->async_refcount = 0;
tgt->refcount++;
@@ -397,6 +396,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
  case GOMP_MAP_FROM:
  case GOMP_MAP_FORCE_ALLOC:
  case GOMP_MAP_FORCE_FROM:
+ case GOMP_MAP_ALWAYS_FROM:
break;
  case GOMP

Re: Do not take address of empty string front

2015-06-24 Thread François Dumont
On 22/06/2015 17:10, Jonathan Wakely wrote:
> On 20/06/15 12:59 +0100, Jonathan Wakely wrote:
>> On 20/06/15 12:03 +0200, François Dumont wrote:
>>> Hi
>>>
>>>   2 experimental tests are failing in debug mode because
>>> __do_str_codecvt is sometimes taking address of string front() and
>>> back() even if empty. It wasn't use so not a big issue but it still
>>> seems better to avoid. I propose to rather use string begin() to get
>>> buffer address.
>>
>> But derefencing begin() is still undefined for an empty string.
>> Shouldn't that fail for debug mode too?
It would if we were using basic_string debug implementation but we
aren't. We are just using normal implementation with some debug checks
which is not enough to detect invalid deference operations.
>> Why change one form of
>> undefined behaviour that we diagnose to another form that we don't
>> diagnose?
>>
>> It would be better if that function didn't do any work when the input
>> range is empty:
>>
>> --- a/libstdc++-v3/include/bits/locale_conv.h
>> +++ b/libstdc++-v3/include/bits/locale_conv.h
>> @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>_OutStr& __outstr, const _Codecvt& __cvt, _State&
>> __state,
>>size_t& __count, _Fn __fn)
>>{
>> +  if (__first == __last)
>> +   {
>> + __outstr.clear();
>> + return true;
>> +   }
>> +
>>  size_t __outchars = 0;
>>  auto __next = __first;
>>  const auto __maxlen = __cvt.max_length() + 1;
>
> This makes that change, and also moves wstring_convert into the
> ABI-tagged __cxx11 namespace, and fixes a copy&paste error in the
> exception thrown from wbuffer_convert.
>
> Tested powerpc64le-linux, committed to trunk.
>
> François, your changes to add extra checks in std::string are still
> useful separately.
>
I just applied attached patch then.

François
Index: include/bits/basic_string.h
===
--- include/bits/basic_string.h	(revision 224914)
+++ include/bits/basic_string.h	(working copy)
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+
 #if __cplusplus >= 201103L
 #include 
 #endif
@@ -903,7 +904,10 @@
*/
   reference
   front() noexcept
-  { return operator[](0); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](0);
+  }
 
   /**
*  Returns a read-only (constant) reference to the data at the first
@@ -911,7 +915,10 @@
*/
   const_reference
   front() const noexcept
-  { return operator[](0); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](0);
+  }
 
   /**
*  Returns a read/write reference to the data at the last
@@ -919,7 +926,10 @@
*/
   reference
   back() noexcept
-  { return operator[](this->size() - 1); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](this->size() - 1);
+  }
 
   /**
*  Returns a read-only (constant) reference to the data at the
@@ -927,7 +937,10 @@
*/
   const_reference
   back() const noexcept
-  { return operator[](this->size() - 1); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](this->size() - 1);
+  }
 #endif
 
   // Modifiers:
@@ -1506,7 +1519,10 @@
*/
   void
   pop_back() noexcept
-  { _M_erase(size()-1, 1); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	_M_erase(size() - 1, 1);
+  }
 #endif // C++11
 
   /**
@@ -3308,7 +3324,10 @@
*/
   reference
   front()
-  { return operator[](0); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](0);
+  }
 
   /**
*  Returns a read-only (constant) reference to the data at the first
@@ -3316,7 +3335,10 @@
*/
   const_reference
   front() const _GLIBCXX_NOEXCEPT
-  { return operator[](0); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](0);
+  }
 
   /**
*  Returns a read/write reference to the data at the last
@@ -3324,7 +3346,10 @@
*/
   reference
   back()
-  { return operator[](this->size() - 1); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](this->size() - 1);
+  }
 
   /**
*  Returns a read-only (constant) reference to the data at the
@@ -3332,7 +3357,10 @@
*/
   const_reference
   back() const _GLIBCXX_NOEXCEPT
-  { return operator[](this->size() - 1); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	return operator[](this->size() - 1);
+  }
 #endif
 
   // Modifiers:
@@ -3819,7 +3847,10 @@
*/
   void
   pop_back() // FIXME C++11: should be noexcept.
-  { erase(size()-1, 1); }
+  {
+	_GLIBCXX_DEBUG_ASSERT(!empty());
+	erase(size() - 1, 1);
+  }
 #endif // C++11
 
   /**


Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data

2015-06-24 Thread Jakub Jelinek
On Wed, Jun 24, 2015 at 11:11:12PM +0300, Ilya Verbin wrote:
> Indeed, here is the fix, make check-target-libgomp passed.
> 
> 
> libgomp/
>   * libgomp.h (struct target_var_desc): Add offset and length.
>   * target.c (gomp_map_vars_existing): New argument tgt_var, fill it.
>   (gomp_map_vars): Move filling of tgt->list[i] into
>   gomp_map_vars_existing.  Add missed case GOMP_MAP_ALWAYS_FROM.
>   (gomp_unmap_vars): Add list[i].offset to host and target addresses,
>   use list[i].length instead of k->host_end - k->host_start.
>   * testsuite/libgomp.c/target-11.c: Extend for testing array sections.

Ok, thanks.

Jakub


Re: [PR fortran/66528] unbalanced IF/ENDIF with -fmax-errors=1 causes invalid free

2015-06-24 Thread Steve Kargl
On Wed, Jun 24, 2015 at 08:36:45PM +0200, Manuel López-Ibáñez wrote:
> The problem is that diagnostic_action_after_output tries to delete the
> active pretty-printer which tries to delete its output_buffer, which
> is normally dynamically allocated via placement-new, but the
> output_buffer used by the error_buffer of Fortran is statically
> allocated. Being statically allocated simplifies a lot pushing/poping
> several instances of error_buffer.
> 
> The solution I found is to reset the active output_buffer back to the
> default one before calling diagnostic_action_after_output. This is a
> bit ugly, because this function does use the output_buffer, however,
> at the point that Fortran calls it, both are in an equivalent state,
> thus there is no visible difference.
> 
> 
> Bootstrapped and regression tested on x86_64-linux-gnu.
> 
> 2015-06-24  Manuel López-Ibáñez  
> 
> PR fortran/66528
> * gfortran.dg/maxerrors.f90: New test.
> 
> gcc/fortran/ChangeLog:
> 
> 2015-06-24  Manuel López-Ibáñez  
> 
> PR fortran/66528
> * error.c (gfc_warning_check): Restore the default output_buffer
> before calling diagnostic_action_after_output.
> (gfc_error_check): Likewise.
> (gfc_diagnostics_init): Add comment.

Patch looks ok to me.

-- 
Steve


Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1

2015-06-24 Thread Mikhail Maltsev
On 06/24/2015 06:52 PM, Christophe Lyon wrote:
> Hi Mikhail,
> 
> In the gcc-5-branch, I can see that your new inhibit-warn-2.C test
> fails (targets ARM and AArch64).
> 
> I can see this error message in g++.log:
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:
> In function 'void fn1()':
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
> error: 'typename A<(F
>> ::type>::value || B:: value)>::type D::operator=(Expr) [with Expr =
> int; typename A<(F
>> ::type>::value || B:: value)>::type = int]' is private
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
> error: within this context
> 
> Christophe.
> 
Oops. Sorry for that, it seems that I messed up with my testing box and
the backport did not actually get regtested :(.

The problem is caused by difference in wording of diagnostics. GCC 6
gives an error on line 35 and a note on line 29:

$ ./cc1plus ~/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
 void fn1()
/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
error: 'typename A<(F
>::type>::value || B:: value)>::type D::operator=(Expr) [with Expr =
int; typename A<(F >::type>::value
|| B:: value)>::type = int]' is private within this context
   opt = 0;
/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
note: declared private here
   operator=(Expr);

GCC 5 gives two errors:

/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
error: 'typename A<(F
>::type>::value || B:: value)>::type D::operator=(Expr) [with Expr =
int; typename A<(F >::type>::value
|| B:: value)>::type = int]' is private
   operator=(Expr);
/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
error: within this context
   opt = 0;

It can probably be fixed like this:

diff --git a/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
b/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
index cb16b4c..f658c1d 100644
--- a/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
+++ b/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
@@ -26,11 +26,11 @@ class D
 {
   template 
   typename A::type>::value || B::value>::type
-  operator=(Expr); // { dg-message "declared" }
+  operator=(Expr); // { dg-message "private" }
 };

 void fn1()
 {
   D opt;
-  opt = 0; // { dg-error "private" }
+  opt = 0; // { dg-error "this context" }
 }

But I am not sure, what should I do in this case. Maybe it is better to
remove the failing testcase from GCC 5 branch (provided that
inhibit-warn-1.C tests a fix for the same bug and does not fail)?

-- 
Regards,
Mikhail Maltsev


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Evandro Menezes
Philipp,

I think that execute_cse_reciprocals_1() applies only when the denominator is 
known at compile-time, otherwise the division stays.  It doesn't seem to know 
whether the target supports the approximate reciprocal or not.

Cheers,

-- 
Evandro Menezes  Austin, TX


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
> Behalf Of Dr. Philipp Tomsich
> Sent: Wednesday, June 24, 2015 15:08
> To: Evandro Menezes
> Cc: Benedikt Huber; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
> 
> Evandro,
> 
> Shouldn't ‘execute_cse_reciprocals_1’ take care of this, once the reciprocal-
> division is implemented?
> Do you think there’s additional work needed to catch all cases/opportunities?
> 
> Best,
> Philipp.
> 
> > On 24 Jun 2015, at 20:19, Evandro Menezes  wrote:
> >
> > Benedikt,
> >
> > Are you developing the reciprocal approximation just for 1/x proper or for
> any division, as in x/y = x * 1/y?
> >
> > Thank you,
> >
> > --
> > Evandro Menezes  Austin, TX
> >
> >
> >> -Original Message-
> >> From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com]
> >> Sent: Wednesday, June 24, 2015 12:11
> >> To: Dr. Philipp Tomsich
> >> Cc: Evandro Menezes; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
> >> (rsqrt) estimation in -ffast-math
> >>
> >> Evandro,
> >>
> >> Yes, we also have the 1/x approximation.
> >> However we do not have the test cases yet, and it also would need
> >> some clean up.
> >> I am going to provide a patch for that soon (say next week).
> >> Also, for this optimization we have *not* yet found a benchmark with
> >> significant improvements.
> >>
> >> Best Regards,
> >> Benedikt
> >>
> >>
> >>> On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich
> >>>  >> systems.com> wrote:
> >>>
> >>> Evandro,
> >>>
> >>> We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar)
> >>> reciprocal
> >> sqrt.
> >>>
> >>> Also, the “reciprocal divide” patches are floating around in various
> >>> of our git-tree, but aren’t ready for public consumption, yet… I’ll
> >>> leave Benedikt to comment on potential timelines for getting that
> >>> pushed
> >> out.
> >>>
> >>> Best,
> >>> Philipp.
> >>>
>  On 24 Jun 2015, at 18:42, Evandro Menezes  wrote:
> 
>  Benedikt,
> 
>  You beat me to it! :-)  Do you have the implementation for dividing
>  using the Newton series as well?
> 
>  I'm not sure that the series is always for all data types and on
>  all processors.  It would be useful to allow each AArch64 processor
>  to enable this or not depending on the data type.  BTW, do you have
>  some tests showing the speed up?
> 
>  Thank you,
> 
>  --
>  Evandro Menezes  Austin, TX
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org
> > [mailto:gcc-patches-ow...@gcc.gnu.org]
>  On
> > Behalf Of Benedikt Huber
> > Sent: Thursday, June 18, 2015 7:04
> > To: gcc-patches@gcc.gnu.org
> > Cc: benedikt.hu...@theobroma-systems.com;
> > philipp.tomsich@theobroma- systems.com
> > Subject: [PATCH] [aarch64] Implemented reciprocal square root
> > (rsqrt) estimation in -ffast-math
> >
> > arch64 offers the instructions frsqrte and frsqrts, for rsqrt
> > estimation
>  and
> > a Newton-Raphson step, respectively.
> > There are ARMv8 implementations where this is faster than using
> > fdiv and rsqrt.
> > It runs three steps for double and two steps for float to achieve
> > the
>  needed
> > precision.
> >
> > There is one caveat and open question.
> > Since -ffast-math enables flush to zero intermediate values
> > between approximation steps will be flushed to zero if they are
> denormal.
> > E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
> > The test cases pass, but it is unclear to me whether this is
> > expected behavior with -ffast-math.
> >
> > The patch applies to commit:
> > svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> >
> > Please consider including this patch.
> > Thank you and best regards,
> > Benedikt Huber
> >
> > Benedikt Huber (1):
> > 2015-06-15  Benedikt Huber  
> >
> > gcc/ChangeLog|   9 +++
> > gcc/config/aarch64/aarch64-builtins.c|  60 
> > gcc/config/aarch64/aarch64-protos.h  |   2 +
> > gcc/config/aarch64/aarch64-simd.md   |  27 
> > gcc/config/aarch64/aarch64.c |  63 +
> > gcc/config/aarch64/aarch64.md|   3 +
> > gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> > +++
> > 7 files changed, 277 insertions

Re: [PATCH] config/bfin/bfin.c (hwloop_optimize): Set JUMP_LABEL() after emit jump_insn

2015-06-24 Thread Chen Gang
On 6/24/15 12:25, Jeff Law wrote:
> On 06/20/2015 04:48 AM, Chen Gang wrote:
>> JUMP_LABLE() must be defined after optimization completed. In this case,
>> it is doing optimization, and is almost finished, so it is no chances to
>> set JUMP_LABLE() next. The related issue is Bug 65803.
>>
>> 2015-06-20  Chen Gang  
>>
>> * config/bfin/bfin.c (hwloop_optimize): Set JUMP_LABEL() after
>> emit jump_insn.
> Thanks.  I've reduced the testcase from pr65803 and committed the changes to 
> the trunk along with the reduced testcase.
> 
> I tested the bfin port lightly -- just confirmed that it'd build newlib as a 
> sanity test.
> 
> Actual committed patch is attached for archival purposes.
> 

OK, thanks. I shall continue to try another bfin bugs (which I found
during building Linux kernel with allmodconfig). I shall try to finish
one bug within this month (2015-06-30).

After finish bfin bugs (may be several bugs left, now), I shall try
tilegx testsuite with qemu (I almost finish qemu tilegx linux-user with
much help by qemu members). I shall try to start within next month.

And sorry for my disappearing several months:

 - I had to spend more time resources on qemu tilegx (for qemu tilegx, I
   had already delayed too much to bear).

 - I am not quite familiar with gcc internal contents, so if I do not
   spend enough time resources on an issue, I will probably send spams (
   may be even worse: hiding the issues instead of solving it).


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed


[C++ Patch] PR 51911

2015-06-24 Thread Paolo Carlini

Hi,

the below implements quite literally the requirements. It does that 
after the cp_parser_new_initializer call, I think that makes in general 
for better error recovery. The wording definitely needs a review, though 
(more concise?). Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2015-06-24  Paolo Carlini  

PR c++/51911
* parser.c (cp_parser_new_expression): Enforce 5.3.4/2.

/testsuite
2015-06-24  Paolo Carlini  

PR c++/51911
* g++.dg/cpp0x/new-auto1.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 224918)
+++ cp/parser.c (working copy)
@@ -7457,6 +7457,7 @@ cp_parser_new_expression (cp_parser* parser)
   vec *initializer;
   tree nelts = NULL_TREE;
   tree ret;
+  cp_token *token;
 
   /* Look for the optional `::' operator.  */
   global_scope_p
@@ -7482,7 +7483,6 @@ cp_parser_new_expression (cp_parser* parser)
  type-id.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
 {
-  cp_token *token;
   const char *saved_message = parser->type_definition_forbidden_message;
 
   /* Consume the `('.  */
@@ -7513,9 +7513,11 @@ cp_parser_new_expression (cp_parser* parser)
   else
 type = cp_parser_new_type_id (parser, &nelts);
 
+  token = cp_lexer_peek_token (parser->lexer);
+
   /* If the next token is a `(' or '{', then we have a new-initializer.  */
-  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
-  || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+  if (token->type == CPP_OPEN_PAREN
+  || token->type == CPP_OPEN_BRACE)
 initializer = cp_parser_new_initializer (parser);
   else
 initializer = NULL;
@@ -7524,6 +7526,19 @@ cp_parser_new_expression (cp_parser* parser)
  expression.  */
   if (cp_parser_non_integral_constant_expression (parser, NIC_NEW))
 ret = error_mark_node;
+  /* 5.3.4/2: "If the auto type-specifier appears in the type-specifier-seq
+ of a new-type-id or type-id of a new-expression, the new-expression shall
+ contain a new-initializer of the form ( assignment-expression )".  */
+  else if (type_uses_auto (type)
+  && (token->type != CPP_OPEN_PAREN
+  || vec_safe_length (initializer) != 1
+  || BRACE_ENCLOSED_INITIALIZER_P ((*initializer)[0])))
+{
+  error_at (token->location,
+   "initialization of new-expression for type % "
+   "requires exactly one parenthesized expression");
+  ret = error_mark_node;
+}
   else
 {
   /* Create a representation of the new-expression.  */
Index: testsuite/g++.dg/cpp0x/new-auto1.C
===
--- testsuite/g++.dg/cpp0x/new-auto1.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/new-auto1.C  (working copy)
@@ -0,0 +1,9 @@
+// PR c++/51911
+// { dg-do compile { target c++11 } }
+
+#include 
+
+int main()
+{
+  auto foo = new auto {3, 4, 5};  // { dg-error "initialization" }
+}


Re: pr66345.c size_t assumption bug

2015-06-24 Thread DJ Delorie

> OK.

Thanks, committed.


[PATCH] Do not constrain on REAL_TYPE

2015-06-24 Thread Aditya Kumar
From: Aditya Kumar 

gcc/ChangeLog:

2015-06-24  Aditya Kumar  
Sebastian Pop 

* graphite-sese-to-poly.c (parameter_index_in_region): Discard 
REAL_TYPE parameters.
(scan_tree_for_params): Handle REAL_CST in scan_tree_for_params.
(add_conditions_to_domain): Do not constrain on REAL_TYPE.

---
 gcc/graphite-sese-to-poly.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 271c499..5b37796 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -796,6 +796,9 @@ parameter_index_in_region (tree name, sese region)
 
   gcc_assert (SESE_ADD_PARAMS (region));
 
+  /* Cannot constrain on REAL_TYPE parameters.  */
+  if (TREE_CODE (TREE_TYPE (name)) == REAL_TYPE)
+return -1;
   i = SESE_PARAMS (region).length ();
   SESE_PARAMS (region).safe_push (name);
   return i;
@@ -915,6 +918,7 @@ scan_tree_for_params (sese s, tree e)
 
 case INTEGER_CST:
 case ADDR_EXPR:
+case REAL_CST:
   break;
 
default:
@@ -1194,6 +1198,10 @@ add_conditions_to_domain (poly_bb_p pbb)
   {
   case GIMPLE_COND:
  {
+/* Don't constrain on REAL_TYPE.  */
+   if (TREE_CODE (TREE_TYPE (gimple_cond_lhs (stmt))) == REAL_TYPE)
+  break;
+
gcond *cond_stmt = as_a  (stmt);
enum tree_code code = gimple_cond_code (cond_stmt);
 
-- 
2.1.0.243.g30d45f7



[PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Segher Boessenkool
A few define_split's in the i386 backend modify RTL in place.  This does
not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

Bootstrapped and tested; no regressions.  Is this okay for trunk?

Hrm, this wants the testcase in that PR added I suppose.  Will send
it separately.


Segher


2015-06-24  Segher Boessenkool  

* config/i386/i386.md (various splitters): Use copy_rtx before
doing PUT_MODE on operands.

---
 gcc/config/i386/i386.md |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d75b2e1..5425cec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10865,7 +10865,10 @@
(const_int 0)))]
   ""
   [(set (match_dup 0) (match_dup 1))]
-  "PUT_MODE (operands[1], QImode);")
+{
+  operands[1] = copy_rtx (operands[1]);
+  PUT_MODE (operands[1], QImode);
+})
 
 (define_split
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand"))
@@ -10874,7 +10877,10 @@
(const_int 0)))]
   ""
   [(set (match_dup 0) (match_dup 1))]
-  "PUT_MODE (operands[1], QImode);")
+{
+  operands[1] = copy_rtx (operands[1]);
+  PUT_MODE (operands[1], QImode);
+})
 
 (define_split
   [(set (match_operand:QI 0 "nonimmediate_operand")
@@ -11031,7 +11037,10 @@
(if_then_else (match_dup 0)
  (label_ref (match_dup 1))
  (pc)))]
-  "PUT_MODE (operands[0], VOIDmode);")
+{
+  operands[0] = copy_rtx (operands[0]);
+  PUT_MODE (operands[0], VOIDmode);
+})
 
 (define_split
   [(set (pc)
@@ -17298,6 +17307,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
   if (GET_CODE (operands[3]) != ASHIFT)
 operands[2] = gen_lowpart (SImode, operands[2]);
+  operands[3] = copy_rtx (operands[3]);
   PUT_MODE (operands[3], SImode);
 })
 
-- 
1.7.10.4



[patch committed SH] Fix PR target/66563

2015-06-24 Thread Kaz Kojima
The attached patch is to fix PR target/66563 which is a 4.9/5/6
regression.  These newer compilers can CSE some expressions on
the sequences for getting GOT.  The target should make sure it
won't happen.  See PR target/66563 for details.
Tested on sh4-unknown-linux-gnu and committed on trunk.
I'll backport it to 5 later and to 4.9 when it reopens.

Regards,
kaz
--
2015-06-24  Kaz Kojima  

PR target/66563
* config/sh/sh.md (GOTaddr2picreg): Add a new operand for
an additional element of the unspec vector.  Modify indices
of operands.
(builtin_setjmp_receiver): Pass const0_rtx to gen_GOTaddr2picreg.
* config/sh/sh.c (prepare_move_operands): Pass incremented
const_int to gen_GOTaddr2picreg.
(sh_expand_prologue): Pass const0_rtx to gen_GOTaddr2picreg.

diff --git a/config/sh/sh.c b/config/sh/sh.c
index 6f03206..2c247b1 100644
--- a/config/sh/sh.c
+++ b/config/sh/sh.c
@@ -1845,12 +1845,13 @@ prepare_move_operands (rtx operands[], machine_mode 
mode)
  || tls_kind == TLS_MODEL_LOCAL_DYNAMIC
  || tls_kind == TLS_MODEL_INITIAL_EXEC))
{
+ static int got_labelno;
  /* Don't schedule insns for getting GOT address when
 the first scheduling is enabled, to avoid spill
 failures for R0.  */
  if (flag_schedule_insns)
emit_insn (gen_blockage ());
- emit_insn (gen_GOTaddr2picreg ());
+ emit_insn (gen_GOTaddr2picreg (GEN_INT (++got_labelno)));
  emit_use (gen_rtx_REG (SImode, PIC_REG));
  if (flag_schedule_insns)
emit_insn (gen_blockage ());
@@ -7958,7 +7959,7 @@ sh_expand_prologue (void)
 }
 
   if (flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
-emit_insn (gen_GOTaddr2picreg ());
+emit_insn (gen_GOTaddr2picreg (const0_rtx));
 
   if (SHMEDIA_REGS_STACK_ADJUST ())
 {
diff --git a/config/sh/sh.md b/config/sh/sh.md
index e88d249..43cd949 100644
--- a/config/sh/sh.md
+++ b/config/sh/sh.md
@@ -10592,12 +10592,18 @@ label:
   [(set_attr "in_delay_slot" "no")
(set_attr "type" "arith")])
 
+;; Loads of the GOTPC relocation values must not be optimized away
+;; by e.g. any kind of CSE and must stay as they are.  Although there
+;; are other various ways to ensure this, we use an artificial counter
+;; operand to generate unique symbols.
 (define_expand "GOTaddr2picreg"
   [(set (reg:SI R0_REG)
-   (unspec:SI [(const:SI (unspec:SI [(match_dup 1)] UNSPEC_PIC))]
-  UNSPEC_MOVA))
-   (set (match_dup 0) (const:SI (unspec:SI [(match_dup 1)] UNSPEC_PIC)))
-   (set (match_dup 0) (plus:SI (match_dup 0) (reg:SI R0_REG)))]
+   (unspec:SI [(const:SI (unspec:SI [(match_dup 2)
+ (match_operand:SI 0 "" "")]
+UNSPEC_PIC))] UNSPEC_MOVA))
+   (set (match_dup 1)
+   (const:SI (unspec:SI [(match_dup 2) (match_dup 0)] UNSPEC_PIC)))
+   (set (match_dup 1) (plus:SI (match_dup 1) (reg:SI R0_REG)))]
   ""
 {
   if (TARGET_VXWORKS_RTP)
@@ -10608,8 +10614,8 @@ label:
   DONE;
 }
 
-  operands[0] = gen_rtx_REG (Pmode, PIC_REG);
-  operands[1] = gen_rtx_SYMBOL_REF (VOIDmode, GOT_SYMBOL_NAME);
+  operands[1] = gen_rtx_REG (Pmode, PIC_REG);
+  operands[2] = gen_rtx_SYMBOL_REF (VOIDmode, GOT_SYMBOL_NAME);
 
   if (TARGET_SHMEDIA)
 {
@@ -10618,23 +10624,23 @@ label:
   rtx lab = PATTERN (gen_call_site ());
   rtx insn, equiv;
 
-  equiv = operands[1];
-  operands[1] = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, operands[1], lab),
+  equiv = operands[2];
+  operands[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, operands[2], lab),
UNSPEC_PCREL_SYMOFF);
-  operands[1] = gen_rtx_CONST (Pmode, operands[1]);
+  operands[2] = gen_rtx_CONST (Pmode, operands[2]);
 
   if (Pmode == SImode)
{
- emit_insn (gen_movsi_const (pic, operands[1]));
+ emit_insn (gen_movsi_const (pic, operands[2]));
  emit_insn (gen_ptrel_si (tr, pic, copy_rtx (lab)));
}
   else
{
- emit_insn (gen_movdi_const (pic, operands[1]));
+ emit_insn (gen_movdi_const (pic, operands[2]));
  emit_insn (gen_ptrel_di (tr, pic, copy_rtx (lab)));
}
 
-  insn = emit_move_insn (operands[0], tr);
+  insn = emit_move_insn (operands[1], tr);
 
   set_unique_reg_note (insn, REG_EQUAL, equiv);
 
@@ -10688,7 +10694,7 @@ label:
   [(match_operand 0 "" "")]
   "flag_pic"
 {
-  emit_insn (gen_GOTaddr2picreg ());
+  emit_insn (gen_GOTaddr2picreg (const0_rtx));
   DONE;
 })
 


Re: [RS6000 1/7] Hide insns not needing to be public

2015-06-24 Thread David Edelsohn
On Tue, Jun 23, 2015 at 8:50 PM, Alan Modra  wrote:
> * config/rs6000/rs6000.md (addsi3_high, bswaphi2_internal,
> ashldi3_internal5, ashldi3_internal8): Prefix with '*'.

This patch is okay.

The rotate changes need to be discussed and coordinated with Segher.

The cost changes are okay in theory, but really should be applied in
conjunction with the rtx_cost improvements that you are discussing
with Jeff.

Thanks, David


[patch] PR debug/66653: avoid late_global_decl on decl_type_context()s

2015-06-24 Thread Aldy Hernandez
The problem here is that we are trying to call 
dwarf2out_late_global_decl() on a static variable in a template which 
has a type of TEMPLATE_TYPE_PARM:


template  class A
{
  static __thread T a;
};

We are calling late_global_decl because we are about to remove the 
unused static from the symbol table:


  /* See if the debugger can use anything before the DECL
 passes away.  Perhaps it can notice a DECL that is now a
 constant and can tag the early DIE with an appropriate
 attribute.

 Otherwise, this is the last chance the debug_hooks have
 at looking at optimized away DECLs, since
 late_global_decl will subsequently be called from the
 contents of the now pruned symbol table.  */
  if (!decl_function_context (node->decl))
(*debug_hooks->late_global_decl) (node->decl);

Since gen_type_die_with_usage() cannot handle TEMPLATE_TYPE_PARMs we ICE.

I think we need to avoid calling late_global_decl on DECL's for which 
decl_type_context() is true, similarly to what we do for the call to 
early_global_decl in rest_of_decl_compilation:


  && !decl_function_context (decl)
  && !current_function_decl
  && DECL_SOURCE_LOCATION (decl) != BUILTINS_LOCATION
  && !decl_type_context (decl))
(*debug_hooks->early_global_decl) (decl);

Presumably the old code did not run into this problem because the 
TEMPLATE_TYPE_PARAMs had been lowered by the time dwarf2out_decl was 
called, but here we are calling late_global_decl relatively early.


The attached patch fixes the problem.

Tested with --enable-languages=all.  Ada had other issues, so I skipped it.

OK for mainline?
commit 302f9976c53aa09e431bd54f37dbfeaa2c6b2acc
Author: Aldy Hernandez 
Date:   Wed Jun 24 20:04:09 2015 -0700

PR debug/66653
* cgraphunit.c (analyze_functions): Do not call
debug_hooks->late_global_decl when decl_type_context.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 066a155..d2974ad 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1149,7 +1149,8 @@ analyze_functions (bool first_time)
 at looking at optimized away DECLs, since
 late_global_decl will subsequently be called from the
 contents of the now pruned symbol table.  */
- if (!decl_function_context (node->decl))
+ if (!decl_function_context (node->decl)
+ && !decl_type_context (node->decl))
(*debug_hooks->late_global_decl) (node->decl);
 
  node->remove ();
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/pr66653.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/pr66653.C
new file mode 100644
index 000..bcaaf88
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/pr66653.C
@@ -0,0 +1,8 @@
+// PR debug/54508
+// { dg-do compile }
+// { dg-options "-g" }
+
+template  class A
+{
+  static __thread T a;
+};


Re: [05/13] Add nofree_ptr_hash

2015-06-24 Thread Jeff Law

On 06/24/2015 02:23 AM, Richard Sandiford wrote:

Jeff Law  writes:

So I'm holding off on approving this one pending further discussion of
the use of multiple inheritance for nofree_ptr_hash.


I thought that might be controversial. :-)  My two main defences are:

1) This is multiple inheritance of traits classes, which all just have
static member functions, rather than multiple inheritance of data-
carrying classes.  It's really just a union of two separate groups
of functions.
As I was thinking about this during review I almost convinced myself 
that multiple inheritance from traits classes ought to be acceptable.


As you state, they don't carry data and we're just getting a union of 
their functions.  One could probably even argue that traits classes by 
their nature are designed to be composed with other traits and classes.


I'm (obviously) not as well versed in this stuff as I ought to be, hence 
my conservatism.  It'd be real helpful if folks with more real world 
experience in this space could chime in on the pros/cons if this approach.


If we do go forward, ISTM updating our coding conventions to codify this 
exception to the "avoid MI" would be wise.  And my inclination is to go 
forward, but let's give other folks a chance to chime in.



Jeff


Re: [PATCH][RFC] Add FRE in pass_vectorize

2015-06-24 Thread Jeff Law

On 06/24/2015 01:59 AM, Richard Biener wrote:

Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same "enough" - the vectorizer can end up generating redundant huge
expressions for both.
Ah, so yes, this is a totally different issue than Alan and I are 
discussing.



RTL CSE is bloody expensive and so many times I wanted the ability to know a
bit about what the loop optimizer had done (or not done) so that I could
conditionally skip the second CSE pass.   We never built that, but it's
something I've wanted for decades.


Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.
And that's why we haven't pushed hard on this issue -- it doesn't scale 
and to make it scale requires rethinking the basics of the pass manager.




Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.
I had a long talk with some of the SGI compiler guys many years ago 
about region-based optimizations.  It was something they had been trying 
to bring into their compiler for years, but never got it working to a 
point where they were happy with it.  While they didn't show me the 
code, they indicated the changes were highly invasive -- and all the 
code had been #ifdef'd out because it just didn't work.  Naturally it 
was all bitrotting.









And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.
As you know we've gone back and forth on this in the past.  It's always 
a trade-off.  I still ponder from time to time putting the simple CSE 
and cprop bits back into the SSA rewriting phase to avoid generating all 
kinds of garbage that just needs to be cleaned up later -- particularly 
for incremental SSA updates.




Jeff


Re: [PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Jeff Law

On 06/24/2015 05:29 PM, Segher Boessenkool wrote:

A few define_split's in the i386 backend modify RTL in place.  This does
not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

Bootstrapped and tested; no regressions.  Is this okay for trunk?

Hrm, this wants the testcase in that PR added I suppose.  Will send
it separately.


Segher


2015-06-24  Segher Boessenkool  

* config/i386/i386.md (various splitters): Use copy_rtx before
doing PUT_MODE on operands.
Are the copies really needed?  If we're slamming a mode into an 
ix86_comparison_operator, we should be safe since those can't be shared. 
 Copying is just wasteful.


Jeff


Re: [PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Jeff Law

On 06/24/2015 09:40 PM, Jeff Law wrote:

On 06/24/2015 05:29 PM, Segher Boessenkool wrote:

A few define_split's in the i386 backend modify RTL in place.  This does
not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

Bootstrapped and tested; no regressions.  Is this okay for trunk?

Hrm, this wants the testcase in that PR added I suppose.  Will send
it separately.


Segher


2015-06-24  Segher Boessenkool  

* config/i386/i386.md (various splitters): Use copy_rtx before
doing PUT_MODE on operands.

Are the copies really needed?  If we're slamming a mode into an
ix86_comparison_operator, we should be safe since those can't be shared.
  Copying is just wasteful.
It might be worth verifying that something else hasn't created shared 
RTL in violation of the RTL sharing assumptions.


Jeff


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Jeff Law

On 06/23/2015 07:00 PM, Sandra Loosemore wrote:

On 06/18/2015 11:32 AM, Eric Botcazou wrote:

The attached patch teaches regrename to validate insns affected by each
register renaming before making the change.  I can see at least two
other ways to handle this -- earlier, by rejecting renamings that result
in invalid instructions when it's searching for the best renaming; or
later, by validating the entire set of renamings as a group instead of
incrementally for each one -- but doing it all in regname_do_replace
seems least disruptive and risky in terms of the existing code.


OK, but the patch looks incomplete, rename_chains should be adjusted
as well,
i.e. regrename_do_replace should now return a boolean.


Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
built for aarch64-linux-gnu and ran the gcc testsuite.

The c6x back end also calls regrename_do_replace.  I am not set up to
build or test on that target, and Bernd told me off-list that it would
never fail on that target anyway so I have left that code alone.

-Sandra

regrename-2.log


2015-06-23  Chung-Lin Tang
Sandra Loosemore

gcc/
* regrename.h (regrename_do_replace): Change to return bool.
* regrename.c (rename_chains): Check return value of
regname_do_replace.
(regrename_do_replace): Re-validate the modified insns and
return bool status.
* config/aarch64/cortex-a57-fma-steering.c (rename_single_chain):
Update to match rename_chains changes.
As Eric mentioned, please put an assert to verify that the call from the 
c6x backend never fails.


The regrename and ARM bits are fine.

Do you have a testcase that you can add to the suite?  If so it'd be 
appreciated if you could include that too.


Approved with the c6x assert if a testcase isn't available or 
exceedingly difficult to produce.


jeff



Re: [PATCH][GSoC] Extend shared_ptr to support arrays

2015-06-24 Thread Tim Shen
On Wed, Jun 24, 2015 at 10:33 AM, Fan You  wrote:
> Hi,
>
> Here is the revised patch including all the test case.
>
> This can also be seen at  on branch
> 
>
> Any comments?

I ran `git diff c7248656569bb0b4549f5c1ed347f7e028a15664
90aff5632fd9f3044d53ce190ae99fb69c41ce49`.

To systematically detect consecutive spaces (to convert them to tabs),
I'll just simply do:
`egrep "^\t* {8}" shared_ptr*`

-   = typename conditional::value, _Array_Deleter,
_Normal_Deleter>::type;
+   = typename conditional
+   ::value, _Array_Deleter, _Normal_Deleter>::type;
Tabs. Also, I personally prefer to put '::value' to the same line as
is_array<_Tp>.

-  using __base_type = __shared_ptr;
+  using __Base_type = __shared_ptr;
_Base_type, not __Base_type. Also, the mostly used is _Base:

...src/gcc/libstdc++-v3 % grep -r '_Base[a-zA-Z_0-9]*' . -o | grep
':.*$' -o|sort|uniq -c
   2350 :_Base
  1 :_Base_biteset
 62 :_Base_bitset
120 :_Base_const_iterator
 20 :_Base_const_local_iterator
  4 :_Based
177 :_Base_iterator
  1 :_Base_Iterator
  8 :_Base_local_iterator
 21 :_Base_manager
133 :_Base_ptr
  9 :_Base_ref
  2 :_BaseSequence
173 :_Base_type
  3 :_BaseType

-   : __base_type(__p, _Deleter_type())
+   : __Base_type(__p, _Deleter_type())
Please be aware of tabs.

-  template>
+  template>
__shared_ptr(__shared_ptr<__libfund_v1<_Tp1>, _Lp>&& __r) noexcept
-: __base_type(std::move(__r))
+: __Base_type(static_cast>::__Base_type&&>(std::move(__r)))
static_cast>::__Base_type&&>(__r)
is enough, since std::move is actually a static_cast to rvalue reference.

Alternatively, you may define a template alias for the static_cast, if
you find it too long.

-   operator=(const __shared_ptr<__libfund_v1<_Tp1>, _Lp>& __r) noexcept
+   operator=(const __shared_ptr<_Tp1, _Lp>& __r) noexcept
Why?

template
  inline bool
  operator<(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
- {
+ {
using _Tp_RE = typename remove_extent<_Tp>::type;
-   return std::less<_Tp_RE>()(__a.get(), nullptr);
+   return std::less<_Tp_RE>()(__a.get(), nullptr);
  }
using _Tp_RE = typename shared_ptr<_Tp>::element_type;



-- 
Regards,
Tim Shen


Re: [PATCH IRA] save a bitmap check

2015-06-24 Thread Jeff Law

On 06/24/2015 03:54 AM, Zhouyi Zhou wrote:


In function assign_hard_reg, checking the bit of conflict_a in
consideration_allocno_bitmap is unneccesary, because when retry_p is
false, conflicting objects are always inside of the same loop_node
(this is ensured in function process_bb_node_lives which marks the
living objects to death near the end of that function).



Bootstrap and regtest scheduled on x86_64 GNU/Linux
Signed-off-by: Zhouyi Zhou 
---
  gcc/ChangeLog   | 4 
  gcc/ira-color.c | 6 ++
  2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d1f82b2..07605ae 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-06-24  Zhouyi Zhou  
+
+   * ira-color.c (assign_hard_reg): save a bitmap check
My concern here is the invariant you're exploiting to eliminate the 
redundant bitmap check is far from obvious and there's no good way I can 
see to ensure that invariant remains invariant.


Without some solid performance data indicating this is a notable 
compile-time improvement, I don't think it's a wise idea.


If it does turn out that this is a noteworthy compile-time improvement, 
then you would need a comment before this conditional explaining in 
detail why we don't need to check for conflict's allocno in 
consideration_allocno_bitmap.


Jeff


Re: [Patch SRA] Fix PR66119 by calling get_move_ratio in SRA

2015-06-24 Thread Jeff Law

On 06/23/2015 09:42 AM, James Greenhalgh wrote:


On Tue, Jun 23, 2015 at 09:52:01AM +0100, Jakub Jelinek wrote:

On Tue, Jun 23, 2015 at 09:18:52AM +0100, James Greenhalgh wrote:

This patch fixes the issue by always calling get_move_ratio in the SRA
code, ensuring that an up-to-date value is used.

Unfortunately, this means we have to use 0 as a sentinel value for
the parameter - indicating no user override of the feature - and
therefore cannot use it to disable scalarization. However, there
are other ways to disable scalarazation (-fno-tree-sra) so this is not
a great loss.


You can handle even that.






   enum compiler_param param
 = optimize_function_for_size_p (cfun)
   ? PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE
   : PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED;
   unsigned max_scalarization_size = PARAM_VALUE (param) * BITS_PER_UNIT;
   if (!max_scalarization_size && !global_options_set.x_param_values[param])

Then it will handle explicit --param sra-max-scalarization-size-Os*=0
differently from implicit 0.


Ah hah! OK, I've respun the patch removing this extra justification in
the documentation and reshuffling the logic a little.


OT, shouldn't max_scalarization_size be at least unsigned HOST_WIDE_INT,
so that it doesn't overflow for larger values (0x4000 etc.)?
Probably need some cast in the multiplication to avoid UB in the compiler.


I've increased the size of max_scalarization_size to a UHWI in this spin.

Bootstrapped and tested on AArch64 and x86-64 with no issues and checked
to see the PR is fixed.

OK for trunk, and gcc-5 in a few days?

Thanks,
James

---
gcc/

2015-06-23  James Greenhalgh  

PR tree-optimization/66119
* toplev.c (process_options): Don't set up default values for
the sra_max_scalarization_size_{speed,size} parameters.
* tree-sra (analyze_all_variable_accesses): If no values
have been set for the sra_max_scalarization_size_{speed,size}
parameters, call get_move_ratio to get target defaults.

Any testcase for this change?

OK with a testcase.

jeff



Re: [PATCH v2] Rerun loop-header-copying just before vectorization

2015-06-24 Thread Jeff Law

On 06/19/2015 11:32 AM, Alan Lawrence wrote:

This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02139.html . Changes are:

* Separate the two passes by descending from a common base class,
allowing different predicates;
* Test flag_tree_vectorize, and loop->force_vectorize/dont_vectorize
- this fixes the test failing before;
* Simplify the check for "code after exit edge";
* Revert unnecessary changes to pass_tree_loop_init::execute;
* Revert change to slp-perm-7 test (following fix by Marc Glisse)
So FWIW, if you don't want to make this a separate pass, you'd probably 
want the code which allows us to run the phi-only propagator as a 
subroutine to propagate and eliminate those degenerate PHIs.  I posted 
it a year or two ago, but went a different direction to solve whatever 
issue I was looking at.


I'm comfortable with this as a separate pass and relying on cfg cleanups 
to handle this stuff for us as this implementation of your patch 
currently does.





Bootstrapped + check-gcc on aarch64 and x86_64 (linux).

gcc/ChangeLog:

 * tree-pass.h (make_pass_ch_vect): New.
 * passes.def: Add pass_ch_vect just before pass_if_conversion.

 * tree-ssa-loop-ch.c (pass_ch_base, pass_ch_vect, pass_data_ch_vect,
 pass_ch::process_loop_p): New.
 (pass_ch): Extend pass_ch_base.

 (pass_ch::execute): Move all but loop_optimizer_init/finalize to...
 (pass_ch_base::execute): ...here.

gcc/testsuite/ChangeLog:

 * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of
x,y,z,w.
 of unsigned
 * gcc.dg/vect/vect-ifcvt-11.c: New.
Can you add a function comment to ch_base::copy_headers.  I know it 
didn't have one before, but it really should have one.


I'd also add a comment to the execute methods.  pass_ch initializes and 
finalizes loop structures while pass_ch_vect::execute assumes the loop 
structures are already initialized and finalization is assumed to be 
handled earlier in the call chain.


I'd also suggest a comment to the process_loop_p method.


+
+  /* Apply copying if the exit block looks to have code after it.  */
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, exit->src->succs)
+if (!loop_exit_edge_p (loop, e)
+   && e->dest != loop->header
+   && e->dest != loop->latch)
+  return true; /* Block with exit edge has code after it.  */
Don't put comments on the same line as code.  Instead I'd suggest 
describing the CFG pattern your looking for as part of the comment 
before the loop over the edges.



With those comment fixes, this is OK for the trunk.

jeff


Re: [patch 4/5] Remove cgraph.h dependence on hard-reg-set.h

2015-06-24 Thread Jeff Law

On 06/16/2015 11:20 AM, Andrew MacLeod wrote:

cgraph.h requires hard-reg-set.h in order to compile simply because the
cgraph_rtl_info structure contains a HARD_REG_SET element.

All accesses to this structure are already handled by returning a
pointer to the structure within the cgraph_node.  By moving the
defintion of struct cgraph_rtl_info into rtl.h and maintaining a pointer
to it instead of the structure within cgraph_node,  the compilation
requirement on hard-reg-set.h can be completely removed when including
cgraph.h.  This will hopefully help prevent bringing hard-reg-set and
tm.h into a number of source files.

  The structure in rtl.h is protected by  checking for HARD_CONST (which
many other things in rtl.h do). This is mostly so generator files won't
trip over them.  2 source files needed adjustment because they didn't
include hard-reg-set.h before rtl.h.  I guess they never referenced the
other things protected by HARD_CONST in the file.  This ordering issue
should shortly be resolved by an include grouping.

Bootstraps on x86_64-unknown-linux-gnu with no new regressions. Also
passes all the targets in config-list.mk

OK for trunk?

OK.
jeff



Re: [PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Segher Boessenkool
On Wed, Jun 24, 2015 at 09:40:28PM -0600, Jeff Law wrote:
> On 06/24/2015 05:29 PM, Segher Boessenkool wrote:
> >A few define_split's in the i386 backend modify RTL in place.  This does
> >not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

> > * config/i386/i386.md (various splitters): Use copy_rtx before
> > doing PUT_MODE on operands.
> Are the copies really needed?  If we're slamming a mode into an 
> ix86_comparison_operator, we should be safe since those can't be shared. 
>  Copying is just wasteful.

combine still holds pointers to the old rtx, which is what is causing
the problem in the PR (it does always unshare things in the end, but
it does not make copies while it's working).  Either those few splitters
need to do the copy (and some already do), or combine has to do the copy
always, which would be more wasteful.

It has always been this way as far as I see?  Am I missing something?

[ I see i386 also does PUT_CODE in a few more splitters, hrm. ]


Segher


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Kumar, Venkataramanan
Hi, 

If I understand correct, current implementation replaces 

fdiv 
fsqrt

 by  
 frsqrte
for i=0 to 3
fmul
frsqrts  
fmul

So I think gains depends latency of  frsqrts  insn.

I see patch has patterns for  vector versions of frsqrts, but does not enable 
them?

Regards,
Venkat.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
> Sent: Wednesday, June 24, 2015 10:22 PM
> To: Evandro Menezes
> Cc: Benedikt Huber; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
> 
> Evandro,
> 
> We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal
> sqrt.
> 
> Also, the “reciprocal divide” patches are floating around in various of our 
> git-
> tree, but aren’t ready for public consumption, yet… I’ll leave Benedikt to
> comment on potential timelines for getting that pushed out.
> 
> Best,
> Philipp.
> 
> > On 24 Jun 2015, at 18:42, Evandro Menezes 
> wrote:
> >
> > Benedikt,
> >
> > You beat me to it! :-)  Do you have the implementation for dividing
> > using the Newton series as well?
> >
> > I'm not sure that the series is always for all data types and on all
> > processors.  It would be useful to allow each AArch64 processor to
> > enable this or not depending on the data type.  BTW, do you have some
> > tests showing the speed up?
> >
> > Thank you,
> >
> > --
> > Evandro Menezes  Austin, TX
> >
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org
> >> [mailto:gcc-patches-ow...@gcc.gnu.org]
> > On
> >> Behalf Of Benedikt Huber
> >> Sent: Thursday, June 18, 2015 7:04
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: benedikt.hu...@theobroma-systems.com;
> philipp.tomsich@theobroma-
> >> systems.com
> >> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> >> estimation in -ffast-math
> >>
> >> arch64 offers the instructions frsqrte and frsqrts, for rsqrt
> >> estimation
> > and
> >> a Newton-Raphson step, respectively.
> >> There are ARMv8 implementations where this is faster than using fdiv
> >> and rsqrt.
> >> It runs three steps for double and two steps for float to achieve the
> > needed
> >> precision.
> >>
> >> There is one caveat and open question.
> >> Since -ffast-math enables flush to zero intermediate values between
> >> approximation steps will be flushed to zero if they are denormal.
> >> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
> >> The test cases pass, but it is unclear to me whether this is expected
> >> behavior with -ffast-math.
> >>
> >> The patch applies to commit:
> >> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> >>
> >> Please consider including this patch.
> >> Thank you and best regards,
> >> Benedikt Huber
> >>
> >> Benedikt Huber (1):
> >>  2015-06-15  Benedikt Huber   systems.com>
> >>
> >> gcc/ChangeLog|   9 +++
> >> gcc/config/aarch64/aarch64-builtins.c|  60 
> >> gcc/config/aarch64/aarch64-protos.h  |   2 +
> >> gcc/config/aarch64/aarch64-simd.md   |  27 
> >> gcc/config/aarch64/aarch64.c |  63 +
> >> gcc/config/aarch64/aarch64.md|   3 +
> >> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> >> +++
> >> 7 files changed, 277 insertions(+)
> >> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> >>
> >> --
> >> 1.9.1
> >