[PATCH] fix PR46029: reimplement if conversion of loads and stores [3nd submitted version of patch]

2015-07-09 Thread Abe

Below, please find the 3nd submitted version of this patch, now with some more 
issues resolved.

Regards,

Abe





From 87af575347e216672e322bbc1b4ae0a5ab93507f Mon Sep 17 00:00:00 2001
From: Abe 
Date: Mon, 18 May 2015 14:26:29 -0500
Subject: [PATCH] fix PR46029: reimplement if conversion of loads and stores

New relative to previous version of same patch:
  * addressed comments/suggestions from Alan and Richard.
  * disabled the test cases that are currently not if-converted correctly, 
pending improvements to the new if-converter

Passed regression testing and bootstrap on AMD64 GNU/Linux [AKA 
"x86_64-unknown-linux-gnu"].

2015-06-12  Sebastian Pop  
Abe Skolnik  

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
* gcc.dg/vect/pr61194.c: Disabled.
* gcc.dg/vect/vect-mask-load-1.c: Disabled.
* gcc.dg/vect/vect-mask-loadstore-1.c: Disabled.
* gcc.target/i386/avx2-gather-6.c: Disabled.
* gcc.target/i386/avx2-vect-aggressive-1.c: Disabled.
* gcc.target/i386/avx2-vect-aggressive.c: Disabled.
---
 gcc/ChangeLog  |  28 ++
 gcc/common.opt |   2 +-
 gcc/doc/invoke.texi|  29 +-
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
 gcc/testsuite/gcc.dg/vect/pr61194.c|   4 +-
 gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c   |   4 +-
 gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c  |   4 +-
 gcc/testsuite/gcc.target/i386/avx2-gather-6.c  |   4 +-
 .../gcc.target/i386/avx2-vect-aggressive-1.c   |   4 +-
 .../gcc.target/i386/avx2-vect-aggressive.c |   4 +-
 gcc/tree-data-ref.c|  13 +-
 gcc/tree-data-ref.h|  14 +
 gcc/tree-if-conv.c | 417 +
 gcc/tree-vect-stmts.c  |   2 +-
 23 files changed, 501 insertions(+), 273 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3dec6b1..70af07c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2015-05-18  Sebastian Pop  
+
+   PR tree-optimiza

Re: Merge DEF_GOACC_BUILTIN into DEF_GOMP_BUILTIN? (was: OpenACC middle end changes)

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 05:52:20PM +0200, Thomas Schwinge wrote:
> --- gcc/builtins.def
> +++ gcc/builtins.def
> @@ -182,7 +182,9 @@ along with GCC; see the file COPYING3.  If not see
>  #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
> false, true, true, ATTRS, false, \
> -(flag_openmp || flag_tree_parallelize_loops \
> +(flag_openmp \
> + || flag_tree_parallelize_loops > 1 \
> + || flag_cilkplus \
>   || flag_offload_abi != OFFLOAD_ABI_UNSET))
>  
>  /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
> 
> Before this patch, all DEF_GOMP_BUILTINs (erroneously) had always been
> available, due to flag_tree_parallelize_loops's default value of 1.
> 
> With gcc/omp-low.c:lower_reduction_clauses using a
> BUILT_IN_GOMP_ATOMIC_START/BUILT_IN_GOMP_ATOMIC_END sequence as a last
> resort, and that being chosen for some kind of OpenACC reduction clauses
> (which is present on gomp-4_0-branch only), we're then running into ICEs,
> as those two DEF_GOMP_BUILTINs are not available with plain -fopenacc.
> 
> Now, it there actually a good reason to have separate DEF_GOACC_BUILTIN
> and DEF_GOMP_BUILTIN directives (which I basically just initially did to
> be "least intrusive",
> ),
> or should I just add flag_openacc to DEF_GOMP_BUILTIN, and change all
> DEF_GOACC_BUILTIN instantiations to DEF_GOMP_BUILTIN?  Merging them
> definitely makes sense to me now, so OK to do the obvious?

Having DEF_GOMP_BUILTIN and DEF_GOACC_BUILTIN is nice, it tells you
if it is OpenMP or OpenACC builtin.  I'd say just add || flag_openacc
to DEF_GOMP_BUILTIN if you need it.  E.g. for -ftree-parallelize-loops
or -fcilkplus I doubt you want the OpenACC builtins ;).

Jakub


[C++ Patch] PR 61491 (aka DR 1206)

2015-07-09 Thread Paolo Carlini

Hi,

the DR got resolved in time for C++11 and Jonathan noticed that we 
should remove the pedwarn, not a big deal. Tested x86_64-linux.


Thanks,
Paolo.

//
/cp
2015-07-09  Paolo Carlini  

PR c++/61491
* pt.c (maybe_process_partial_specialization): Allow enum template
specializations.

/testsuite
2015-07-09  Paolo Carlini  

PR c++/61491
* g++.dg/cpp0x/enum30.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 225614)
+++ cp/pt.c (working copy)
@@ -970,11 +970,10 @@ maybe_process_partial_specialization (tree type)
 }
   else if (processing_specialization)
 {
-   /* Someday C++0x may allow for enum template specialization.  */
+  /* Under DR 1206 enum template specializations are allowed.  */
   if (cxx_dialect > cxx98 && TREE_CODE (type) == ENUMERAL_TYPE
  && CLASS_TYPE_P (context) && CLASSTYPE_USE_TEMPLATE (context))
-   pedwarn (input_location, OPT_Wpedantic, "template specialization "
-"of %qD not allowed by ISO C++", type);
+   ;
   else
{
  error ("explicit specialization of non-template %qT", type);
Index: testsuite/g++.dg/cpp0x/enum30.C
===
--- testsuite/g++.dg/cpp0x/enum30.C (revision 0)
+++ testsuite/g++.dg/cpp0x/enum30.C (working copy)
@@ -0,0 +1,20 @@
+// PR c++/61491
+// { dg-do compile { target c++11 } }
+
+template  struct Base 
+{ 
+  enum class E : unsigned; 
+  E e; 
+  Base(E e) : e(e) {} 
+}; 
+
+struct X; 
+
+template<> enum class Base::E : unsigned { a, b }; 
+
+struct X : Base 
+{ 
+  X() : Base(E::b) {} 
+};
+
+X x; 


Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.

2015-07-09 Thread Jeff Law

On 07/09/2015 07:56 AM, mliska wrote:

gcc/ChangeLog:

2015-07-09  Martin Liska  

* dbgcnt.def: Add new debug counter.
* ipa-icf-gimple.c (func_checker::compare_ssa_name): Add flag
for strict mode.
(func_checker::compare_memory_operand): Likewise.
(func_checker::compare_cst_or_decl): Handle if we are in
tail_merge_mode.
(func_checker::compare_operand): Pass strict flag properly.
(func_checker::stmt_local_def): New function.
(func_checker::compare_phi_node): Move from sem_function class.
(func_checker::compare_bb_tail_merge): New function.
(func_checker::compare_bb): Improve STMT iteration.
(func_checker::compare_gimple_call): Pass strict flag.
(func_checker::compare_gimple_assign): Likewise.
(func_checker::compare_gimple_label): Remove unused flag.
(ssa_names_set): New class.
(ssa_names_set::build): New function.
* ipa-icf-gimple.h (func_checker::gsi_next_nonlocal): New
function.
(ssa_names_set::contains): New function.
(ssa_names_set::add): Likewise.
* ipa-icf.c (sem_function::equals_private): Use transformed
function.
(sem_function::compare_phi_node): Move to func_checker class.
* ipa-icf.h: Add new declarations.
* tree-ssa-tail-merge.c (check_edges_correspondence): New
function.
(find_duplicate): Add usage of IPA ICF gimple infrastructure.
(find_clusters_1): Pass new sem_function argument.
(find_clusters): Likewise.
(tail_merge_optimize): Call IPA ICF comparison machinery.
So a general question.  We're passing in STRICT to several routines, 
which is fine.  But then we're also checking M_TAIL_MERGE_MODE.  What's 
the difference between the two?  Can they be unified?





-/* Verifies that trees T1 and T2 are equivalent from perspective of ICF.  */
+/* Verifies that trees T1 and T2 are equivalent from perspective of ICF.
+   If STRICT flag is true, versions must match strictly.  */

  bool
-func_checker::compare_ssa_name (tree t1, tree t2)
+func_checker::compare_ssa_name (tree t1, tree t2, bool strict)
This (and other) functions would seem to be used more than just ICF at 
this point.  A pass over the comments to update them as appropriate 
would be appreciated.



@@ -626,6 +648,136 @@ func_checker::parse_labels (sem_bb *bb)
  }
  }

+/* Return true if gimple STMT is just a local difinition in a
+   basic block.  Used SSA names are contained in SSA_NAMES_SET.  */

s/difinition/definition/

I didn't find this comment particularly useful in understanding what 
this function does.  AFAICT the function looks as the uses of the LHS of 
STMT and verifies they're all in the same block as STMT, right?


It also verifies that the none of the operands within STMT are part of 
SSA_NAMES_SET.


What role do those properties play in the meaning of "local definition"?





@@ -1037,4 +1205,60 @@ func_checker::compare_gimple_asm (const gasm *g1, const 
gasm *g2)
return true;
  }

+void
+ssa_names_set::build (basic_block bb)
Needs a function comment.  What are the "important" names we're 
collecting here?


Is a single forward and backward pass really sufficient to find all the 
important names?


In the backward pass, do you have to consider things like ASMs?  I guess 
it's difficult to understand what you need to look at because it's not 
entirely clear the set of SSA_NAMEs you're building.





@@ -149,12 +153,20 @@ public:
   mapping between basic blocks and labels.  */
void parse_labels (sem_bb *bb);

+  /* For given basic blocks BB1 and BB2 (from functions FUNC1 and FUNC),
+ true value is returned if phi nodes are semantically
+ equivalent in these blocks.  */
+  bool compare_phi_node (sem_bb *sem_bb1, sem_bb *sem_bb2);

Presumably in the case of tail merging, FUNC1 and FUNC will be the same :-)



/* Verifies that trees T1 and T2 are equivalent from perspective of ICF.  */
-  bool compare_ssa_name (tree t1, tree t2);
+  bool compare_ssa_name (tree t1, tree t2, bool strict = true);

/* Verification function for edges E1 and E2.  */
bool compare_edge (edge e1, edge e2);
@@ -204,7 +216,7 @@ public:
bool compare_tree_ssa_label (tree t1, tree t2);

/* Function compare for equality given memory operands T1 and T2.  */
-  bool compare_memory_operand (tree t1, tree t2);
+  bool compare_memory_operand (tree t1, tree t2, bool strict = true);

/* Function compare for equality given trees T1 and T2 which
   can be either a constant or a declaration type.  */
@@ -213,7 +225,7 @@ public:
/* Function responsible for comparison of various operands T1 and T2.
   If these components, from functions FUNC1 and FUNC2, are equal, true
   is returned.  */
-  bool compare_operand (tree t1, tree t2);
+  bool compare_operand (tree t1, tree t2, bool strict = false);
Is the default value for the parameter really needed in these methods? 
W

Re: [patch 0/9] Flattening and initial module rebuilding

2015-07-09 Thread Jeff Law

On 07/08/2015 08:42 PM, Andrew MacLeod wrote:

blah, not so trivial.  One of the primary things predict.h does is
create enum br_predictor by including predict,def.. so moving that enum
doesnt really make sense.

Fixing gimple,h isn't too bad, I could split the prediction stuff out
into gimple-predict.h and include it in the 4 places its needed (as you
might be able to tell, Ive tried this :-)

However, it doesn't do much good.  cfghooks.h is included by
basic-block.h.. which is needed virtually everywhere :-P
I don't guess we're approaching a world where the front-ends don't need 
basic-block (and thus cfghooks, predictions, etc etc).




The other option is to pull cfghooks.h out of basic-block.h and include
it seperately on its own.. What is the reason its in there now? It
appears to not have a cyclic dependency which is the usual reason to
have an include in the middle of a file.  Or perhaps the reason no
longer exists?  There is a comment at the top of cfghooks.h :
/* Only basic-block.h includes this.  */
but no rationale.
I don't recall.  It may have seemed to make sense at the time :-)  You'd 
have to do the archaeology and even if you did, you might not get an answer.




I moved it to the very bottom of the file and everything still seems to
compile fine   I can try flattening it out of basic-block.h and only
including it in places that need it... that should eliminate the need to
put predict.h in a lot of places I would think.
If it's not too much trouble, seems like it might be worth trying.  It 
just feels like the prediction bits shouldn't be that pervasive.


jeff


[PATCH] Remove duplicate graphite statistics printers

2015-07-09 Thread Bernhard Reutner-Fischer
graphite-scop-detection.c contained a copy of graphite.c
print_graphite_statistics() and print_graphite_scop_statistics().

The latter gained a parameter to distinguish
\nBefore limit_scops SCoP statistics (
from
\nSCoP statistics (

Note that previously the version in gimple.c was never called (AFAICT)
probably due to ICF or the like. Note that previously the 2 functions
where different due to using different strings so i'd have expected some
clever trick iff it really was ICF who substituted them. But that'd be
another bug or feature-request, maybe.

With the patch below a sample dump thus has the following diff:

@@ -541,35 +541,35 @@ fix_loop_structure: fixing up loops for

 Before limit_scops SCoP statistics (BBS:1, LOOPS:0, CONDITIONS:0, STMTS:0)

-Before limit_scops SCoP profiling statistics (BBS:0, LOOPS:0, CONDITIONS:0, 
STMTS:0)
+SCoP profiling statistics (BBS:0, LOOPS:0, CONDITIONS:0, STMTS:0)

 Before limit_scops SCoP statistics (BBS:1, LOOPS:0, CONDITIONS:0, STMTS:0)

-Before limit_scops SCoP profiling statistics (BBS:0, LOOPS:0, CONDITIONS:0, 
STMTS:0)
+SCoP profiling statistics (BBS:0, LOOPS:0, CONDITIONS:0, STMTS:0)

which looks plausible to me since the stats are dumped once before
limit_scops() and once afterwards and now this is reflected in the dumps.

Ok for trunk if bootstrap+regtest pass?

Thanks,

gcc/ChangeLog

2015-07-09  Bernhard Reutner-Fischer  

* graphite.h: New file.
(print_graphite_statistics): Extern declaration.
* graphite-scop-detection.c (print_graphite_scop_statistics,
print_graphite_statistics): Remove.
* graphite.c (print_graphite_statistics): Make public and add
PREFIX parameter. Adjust callers.
(print_graphite_scop_statistics): Add PREFIX parameter.

Signed-off-by: Bernhard Reutner-Fischer 
---
 gcc/graphite-scop-detection.c | 74 ++-
 gcc/graphite.c| 18 +++
 gcc/graphite.h| 27 
 3 files changed, 40 insertions(+), 79 deletions(-)
 create mode 100644 gcc/graphite.h

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index dac8eeb..2c1cc36 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #ifdef HAVE_isl
 #include "graphite-poly.h"
+#include "graphite.h"
 #include "graphite-scop-detection.h"
 
 /* Forward declarations.  */
@@ -1124,77 +1125,6 @@ contains_only_close_phi_nodes (basic_block bb)
   return true;
 }
 
-/* Print statistics for SCOP to FILE.  */
-
-static void
-print_graphite_scop_statistics (FILE* file, scop_p scop)
-{
-  long n_bbs = 0;
-  long n_loops = 0;
-  long n_stmts = 0;
-  long n_conditions = 0;
-  long n_p_bbs = 0;
-  long n_p_loops = 0;
-  long n_p_stmts = 0;
-  long n_p_conditions = 0;
-
-  basic_block bb;
-
-  FOR_ALL_BB_FN (bb, cfun)
-{
-  gimple_stmt_iterator psi;
-  loop_p loop = bb->loop_father;
-
-  if (!bb_in_sese_p (bb, SCOP_REGION (scop)))
-   continue;
-
-  n_bbs++;
-  n_p_bbs += bb->count;
-
-  if (EDGE_COUNT (bb->succs) > 1)
-   {
- n_conditions++;
- n_p_conditions += bb->count;
-   }
-
-  for (psi = gsi_start_bb (bb); !gsi_end_p (psi); gsi_next (&psi))
-   {
- n_stmts++;
- n_p_stmts += bb->count;
-   }
-
-  if (loop->header == bb && loop_in_sese_p (loop, SCOP_REGION (scop)))
-   {
- n_loops++;
- n_p_loops += bb->count;
-   }
-
-}
-
-  fprintf (file, "\nBefore limit_scops SCoP statistics (");
-  fprintf (file, "BBS:%ld, ", n_bbs);
-  fprintf (file, "LOOPS:%ld, ", n_loops);
-  fprintf (file, "CONDITIONS:%ld, ", n_conditions);
-  fprintf (file, "STMTS:%ld)\n", n_stmts);
-  fprintf (file, "\nBefore limit_scops SCoP profiling statistics (");
-  fprintf (file, "BBS:%ld, ", n_p_bbs);
-  fprintf (file, "LOOPS:%ld, ", n_p_loops);
-  fprintf (file, "CONDITIONS:%ld, ", n_p_conditions);
-  fprintf (file, "STMTS:%ld)\n", n_p_stmts);
-}
-
-/* Print statistics for SCOPS to FILE.  */
-
-static void
-print_graphite_statistics (FILE* file, vec scops)
-{
-  int i;
-  scop_p scop;
-
-  FOR_EACH_VEC_ELT (scops, i, scop)
-print_graphite_scop_statistics (file, scop);
-}
-
 /* We limit all SCoPs to SCoPs, that are completely surrounded by a loop.
 
Example:
@@ -1440,7 +1370,7 @@ build_scops (vec *scops)
   build_graphite_scops (regions, scops);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-print_graphite_statistics (dump_file, *scops);
+print_graphite_statistics (dump_file, "Before limit_scops ", *scops);
 
   limit_scops (scops);
   regions.release ();
diff --git a/gcc/graphite.c b/gcc/graphite.c
index ba8029a..820491a 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #ifdef HAVE_isl
 
 #include "graphite-poly.h"
+#include "graphite.h"
 #include "graphite-scop-detection.h"
 #include

Re: [C++ Patch] PR 61491 (aka DR 1206)

2015-07-09 Thread Jason Merrill

On 07/09/2015 12:09 PM, Paolo Carlini wrote:

the DR got resolved in time for C++11 and Jonathan noticed that we
should remove the pedwarn, not a big deal. Tested x86_64-linux.


How about adding the testcase from the DR as well?  OK with that change.

Jason




Re: [PATCH 3/3] Fix ubsan tests by disabling of an optimization.

2015-07-09 Thread Jeff Law

On 07/09/2015 09:41 AM, Jakub Jelinek wrote:

On Thu, Jul 09, 2015 at 09:34:25AM -0600, Jeff Law wrote:

On 07/09/2015 08:13 AM, Jakub Jelinek wrote:

On Thu, Jul 09, 2015 at 03:56:35PM +0200, mliska wrote:

---
  gcc/testsuite/g++.dg/ubsan/vptr-1.C | 2 +-
  gcc/testsuite/g++.dg/ubsan/vptr-2.C | 2 +-
  gcc/testsuite/g++.dg/ubsan/vptr-3.C | 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)


I'd actually think it would be better to give up on the
UBSAN_* internal calls in tail merging.  Those internal pass
arguments based on their gimple_location, so tail merging breaks
them.

So I think the larger question here is should differences in gimple
locations prevent tail merging?  I'd tend to think not, which then begs the


Generally no.


question, are the UBSAN calls special enough to warrant an exception?


ASAN internal calls too I suppose.  And yes, I believe they are special
enough to warrant an exception.  The speciality is in them being lowered
later on into a call that is passed as one argument a structure containing
file:line into derived from that location, and for those calls that info is
very important (by using -fsanitize=address or -fsanitize=undefined
the user already says that he doesn't care that much about generated code
quality, the extra overhead is already there).  Another option would be for
-fsanitize=address or undefined etc. to disable various optimization options
(it does already disable some like non-null optimizations, because it is
essential, but I believe there is no reason not to perform tail merging
even with those options, as long as there are none of the problematic
internal calls involved, or if the locus is the same.  One could consider
gimple_location as yet another parameter to those internal calls, not
emitted directly just to avoid wasting compiler memory.
I figured you'd say something along these lines :-)  Essentially the 
argument is the line numbers are absolutely core to what the sanitizers 
provide by way their diagnostics.  Getting them wrong because we tail 
merged is likely to cause huge amounts of confusion.


Martin -- if you could have the existence of ASAN or UBSAN calls inhibit 
tail merging.  I guess you could potentially check the location 
information and still allow if the location information on those calls 
matches, but I doubt that's worth the effort.


Jeff



Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-07-09 Thread Jeff Law

On 07/09/2015 03:19 AM, Tom de Vries wrote:

On 09/07/15 05:33, Jeff Law wrote:

On 07/07/2015 09:58 AM, Tom de Vries wrote:
[Big snip]



0001-Add-rewrite_virtuals_into_loop_closed_ssa.patch


Add rewrite_virtuals_into_loop_closed_ssa

2015-07-07  Tom de Vries

* tree-cfg.c (get_virtual_phi): New function.
* tree-cfg.h (get_virtual_phi): Declare.
* tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs)
(rewrite_virtuals_into_loop_closed_ssa): New function.
* tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa):
Declare.
* tree-parloops.c (replace_uses_in_bbs_by): Remove.
(transform_to_exit_first_loop_alt): Use
rewrite_virtuals_into_loop_closed_ssa.



So how is the testcase in 56113 affected by this change (compile-time
and memory usage?)  I didn't see any discussion of that in the thread
from June.



Hi Jeff,

rewrite_virtuals_into_loop_closed_ssa is only active for loops that are
transformed by parloops (which is off by default).

The example in PR56113 at n == 1000 only contains one loop, with
!single_dom_exit, so if parloops is switched on, it doesn't transform
the loop.
Opps.  Nevermind, obviously not appropriate since you're not rewriting 
into LCSSA in the general case, just those transformed by parloops ;-)


jeff


Re: [PATCH] Remove duplicate graphite statistics printers

2015-07-09 Thread Jeff Law

On 07/09/2015 10:30 AM, Bernhard Reutner-Fischer wrote:


Thanks,

gcc/ChangeLog

2015-07-09  Bernhard Reutner-Fischer  

* graphite.h: New file.
(print_graphite_statistics): Extern declaration.
* graphite-scop-detection.c (print_graphite_scop_statistics,
print_graphite_statistics): Remove.
* graphite.c (print_graphite_statistics): Make public and add
PREFIX parameter. Adjust callers.
(print_graphite_scop_statistics): Add PREFIX parameter.

Signed-off-by: Bernhard Reutner-Fischer 
---
-/* Print statistics for SCOPS to FILE.  */
+/* Print statistics for SCOPS to FILE. Prefix statistic headers with PREFIX.  
*/

Nit.  Two spaces after a period, even in comments.


With that nit fixed, OK for the trunk after a bootstrap and regression test.

jeff


Re: [PATCH 1/6] hash_set: add iterator and remove method.

2015-07-09 Thread Jeff Law

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* hash-set.h (remove): New function.
(iterator): New iteration class for hash_set.

OK.
jeff



Re: [patch 0/9] Flattening and initial module rebuilding

2015-07-09 Thread Andrew MacLeod

On 07/08/2015 10:42 PM, Andrew MacLeod wrote:

On 07/08/2015 06:43 PM, Jeff Law wrote:

predict.h is actually required by gimple.h for a few reasons, enum
be_predictor is used in parameter lists and a few inlines use the 
TAKEN,

NOT_TAKEN macros
Its also needed by cfghooks.h, and betwen those 2 files, its just 
needed

by  a very good chunk of the backend. .. 219 of the 263 files which
include backend.h need it.
We could move the 2 enums and TAKEN/NOT_TAKEN to coretypes or something
like that and it would probably cut the requirements for it by a *lot*.

Might be something for a follow-up (moving the enums).


blah, not so trivial.  One of the primary things predict.h does is 
create enum br_predictor by including predict,def.. so moving that 
enum doesnt really make sense.


Fixing gimple,h isn't too bad, I could split the prediction stuff out 
into gimple-predict.h and include it in the 4 places its needed (as 
you might be able to tell, Ive tried this :-)


However, it doesn't do much good.  cfghooks.h is included by 
basic-block.h.. which is needed virtually everywhere :-P


There are just 2 entries in the hooks table which require  'enum 
br_predictor', but I dont think it makes sense to move things out of 
the cfghook structure.. I suppose once could create a prediction_hooks 
structure.. and put it in predict_hooks.h...  but I think thats 
probably going to far to avoid including a file which makes some 
logical sense..


The other option is to pull cfghooks.h out of basic-block.h and 
include it seperately on its own.. What is the reason its in there 
now? It appears to not have a cyclic dependency which is the usual 
reason to have an include in the middle of a file.  Or perhaps the 
reason no longer exists?  There is a comment at the top of cfghooks.h :

   /* Only basic-block.h includes this.  */
but no rationale.

I moved it to the very bottom of the file and everything still seems 
to  compile fine   I can try flattening it out of basic-block.h and 
only including it in places that need it... that should eliminate the 
need to put predict.h in a lot of places I would think.




ok,  so the results are in. a bit painful to unravel :-)  these are the 
steps
- Splitting the prediction bits of gimple.h out to gimple-predict.h and 
putting that file where it matters (5 files as it turns out)
- Next split cfghooks.h out from basic-block.h and put it in the files 
that it is needed in.
- then I moved  predict.h out of backend.h  I added it as an include to 
cfghooks.h and gimple-predict.h and adjusted source files accordingly..

- Finally, try to remove the extraneous cfghooks.h and predict.h files.

caveat, I did no reductions on config/* files nor on languages beyond 
stage1 builds... havent gotten to enhancing to tool to deal with that 
yet.. its coming as a part of the general include reduction.. so I'll 
havdle these bits then.  .

Then result using numbers which exclude those caveat files:
predict.h ends up in 68 files out of the original 179 it was in by 
itself. . ( ie excluding files with cfghooks.h or gimple-predict.h)
cfghooks.h ends up in 89 of the original 268 that basic-block was 
present in.


The total result affect 227 files.

Now, predict.h is *still* more pervasive than it needs to be, but thats 
a different patch :-).  There are  a set of routines in there like  
optimize_{fucntion,loop,edge,bb}_for{speed,size}_p()  that are the 
reason half the files need it.   those should probably be moved 
somewhere else since they aren't really prediction related :-P Maybe a 
better spot will show up when the rest of the include reductions are done.


I've attached a patch.  It bootstraps on x86_64-unknown-linux-gnu, and 
I'm running regressions.  To be safe, I'll run config-list.mk overnight 
to be sure.

assuming its all fine, OK for trunk then?

Andrew





predict.patch.Z
Description: Unix compressed data


[Patch, MIPS] Fix SYSROOT_SUFFIX_SPEC for mips-mti-linux-gnu

2015-07-09 Thread Steve Ellcey
This patch enables builds with mips[32|64]r3 and mips[32|64]r5 in the
mips-mti-linux-gnu toolchain.  t-mti-linux uses MULTILIB_MATCHES to
map these to r2 but SYSROOT_SUFFIX_SPEC was not being set properly
to find the sysroot (the r2 one) for these architectures.  This patch
fixes that problem by updating MIPS_SYSVERSION_SPEC which is used
by SYSROOT_SUFFIX_SPEC.  It will only affect the mips-mti-linux-gnu
toolchain.

Tested with mips-mti-linux-gnu, OK to checkin?

Steve Ellcey
sell...@imgtec.com



2015-07-09  Steve Ellcey  

* config/mips/mti-linux.h (MIPS_SYSVERSION_SPEC): Update
to handle mips[32|64]r3 and mips[32|64]r5.


diff --git a/gcc/config/mips/mti-linux.h b/gcc/config/mips/mti-linux.h
index 03d1baa..b497625 100644
--- a/gcc/config/mips/mti-linux.h
+++ b/gcc/config/mips/mti-linux.h
@@ -17,10 +17,14 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-/* This target is a multilib target, specify the sysroot paths.  */
-#define MIPS_SYSVERSION_SPEC   \
-"%{mips32:r1}%{mips64:r1}%{mips32r2:r2}%{mips64r2:r2}" \
-"%{mips32r6:r6}%{mips64r6:r6}%{mips16:-mips16}"
+/* This target is a multilib target, specify the sysroot paths.
+   MIPS_SYSVERSION_SPEC defaults to 'r2' (mips32r2 or mips64r2) unless
+   'r1' or 'r6' are specifically given so that mips32r3, mips32r5,
+   mips64r3, and mips64r5 will all default to 'r2'.  See MULTILIB_MATCHES
+   definition in t-mti-linux.  */
+
+#define MIPS_SYSVERSION_SPEC \
+"%{mips32|mips64:r1;mips32r6|mips64r6:r6;:r2}%{mips16:-mips16}"
 
 #undef SYSROOT_SUFFIX_SPEC
 #define SYSROOT_SUFFIX_SPEC\


Re: [PATCH 2/6] Introduce new edge_summary class and replace ipa_edge_args_sum.

2015-07-09 Thread Jeff Law

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* cgraph.c (symbol_table::create_edge): Introduce summary_uid
for cgraph_edge.
* cgraph.h (struct GTY): Likewise.
* ipa-inline-analysis.c (estimate_function_body_sizes): Use
new data structure.
* ipa-profile.c (ipa_profile): Likewise.
* ipa-prop.c (ipa_print_node_jump_functions):
(ipa_propagate_indirect_call_infos): Likewise.
(ipa_free_edge_args_substructures): Likewise.
(ipa_free_all_edge_args): Likewise.
(ipa_edge_args_t::remove): Likewise.
(ipa_edge_removal_hook): Likewise.
(ipa_edge_args_t::duplicate): Likewise.
(ipa_register_cgraph_hooks): Likewise.
(ipa_unregister_cgraph_hooks): Likewise.
* ipa-prop.h (ipa_check_create_edge_args): Likewise.
(ipa_edge_args_info_available_for_edge_p): Likewise.
* symbol-summary.h (gt_ggc_mx): Indent properly.
(gt_pch_nx): Likewise.
(edge_summary): New class.
---
  gcc/cgraph.c  |   2 +
  gcc/cgraph.h  |   5 +-
  gcc/ipa-inline-analysis.c |   2 +-
  gcc/ipa-profile.c |   2 +-
  gcc/ipa-prop.c|  71 +++-
  gcc/ipa-prop.h|  44 ++
  gcc/symbol-summary.h  | 208 +-
  7 files changed, 252 insertions(+), 82 deletions(-)



}

+
+  /* Destruction method that can be called for GGT purpose.  */

GGT stands for?


OK for the trunk.

Thanks,
Jeff


Re: [PATCH 3/6] IPA inline: port inline_edge_summary to a new infrastructure.

2015-07-09 Thread Jeff Law

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* ipa-inline-analysis.c
(inline_edge_summaries): New data structure.
(redirect_to_unreachable): Use renamed
function get_inline_edge_summary.
(edge_set_predicate): Likewise.
(evaluate_properties_for_edge): Likewise.
(reset_inline_edge_summary): Likewise.
(inline_summary_t::duplicate): New function.
(inline_edge_summary_t::duplicate): Likewise.
(dump_inline_edge_summary): Use renamed function
get_inline_edge_summary.
(estimate_function_body_sizes): Likewise.
(compute_inline_parameters): Likewise.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time): Likewise.
(inline_update_callee_summaries): Likewise.
(remap_edge_change_prob): Likewise.
(remap_edge_summaries): Likewise.
(inline_merge_summary): Likewise.
(do_estimate_edge_time): Likewise.
(estimate_time_after_inlining): Likewise.
(estimate_size_after_inlining): Likewise.
(read_inline_edge_summary): Likewise.
(write_inline_edge_summary): Likewise.
(inline_summary_alloc): Remove symtab hook holders.
(inline_free_summary): Likewise.
* ipa-inline.c (can_inline_edge_p): Use get_inline_edge_summary.
(compute_inlined_call_time): Likewise.
(want_inline_small_function_p): Likewise.
(edge_badness): Likewise.
(early_inliner): Likewise.
* ipa-inline.h (get_inline_edge_summary): Renamed from
inline_edge_summary.
(estimate_edge_growth): Use the function.
* ipa-profile.c (ipa_propagate_frequency_1): Likewise.
* ipa-prop.c (ipa_make_edge_direct_to_target): Likewise.
(ipa_free_all_edge_args): Remove unused arguments.
* ipa-split.c (execute_split_functions): change guard to the
newly added class.
* ipa.c (symbol_table::remove_unreachable_nodes): Likewise.
---
  gcc/ipa-inline-analysis.c | 105 ++
  gcc/ipa-inline.c  |  18 
  gcc/ipa-inline.h  |  28 +
  gcc/ipa-profile.c |   2 +-
  gcc/ipa-prop.c|   7 +---
  gcc/ipa-split.c   |   3 +-
  gcc/ipa.c |   2 +-
  7 files changed, 73 insertions(+), 92 deletions(-)
OK.  FWIW, it might have been easier to have the renaming as a separate 
patch given it was so mechanical.


Jeff



[PATCH] ipa-icf.c: Fix typo in dump file

2015-07-09 Thread Bernhard Reutner-Fischer
gcc/ChangeLog

2015-07-09  Bernhard Reutner-Fischer  

* ipa-icf.c (sem_item_optimizer::do_congruence_step): Fix typo
in dump message.

Ok for trunk if testing passes?
Hmz, that's obvious, will commit tomorrow after the regstrap during
night.
---
 gcc/ipa-icf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 691c90d..82dcbc3 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -3205,7 +3205,7 @@ sem_item_optimizer::do_congruence_step (congruence_class 
*cls)
   EXECUTE_IF_SET_IN_BITMAP (usage, 0, i, bi)
   {
 if (dump_file && (dump_flags & TDF_DETAILS))
-  fprintf (dump_file, "  processing congruece step for class: %u, index: 
%u\n",
+  fprintf (dump_file, "  processing congruence step for class: %u, index: 
%u\n",
   cls->id, i);
 
 do_congruence_step_for_index (cls, i);
-- 
2.1.4



Re: [PATCH 5/6] Port IPA reference to function_summary infrastructure.

2015-07-09 Thread Jeff Law

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* ipa-reference.c (ipa_ref_opt_summary_t): New class.
(get_reference_optimization_summary): Use it.
(set_reference_optimization_summary): Likewise.
(ipa_init): Remove hook holders usage.
(ipa_reference_c_finalize): Likewise.
(ipa_ref_opt_summary_t::duplicate): New function.
(ipa_ref_opt_summary_t::remove): Likewise.
(propagate): Allocate the summary if does not exist.
(ipa_reference_read_optimization_summary): Likewise.
(struct ipa_reference_vars_info_d): Add new method.
(struct ipa_reference_optimization_summary_d): Likewise.
(get_reference_vars_info): Use new underlying container.
(set_reference_vars_info): Remove.
(init_function_info): Set up the container.



@@ -89,6 +84,13 @@ struct ipa_reference_global_vars_info_d

  struct ipa_reference_optimization_summary_d
  {
+  /* Return true if the data structure is empty.  */
+  inline bool
+  empty_p ()
+  {
+return statics_not_read == NULL && statics_not_written == NULL;
+  }
+
Presumably this is still POD, even with the inline function, so "struct" 
is still correct, right?





@@ -99,6 +101,14 @@ typedef struct ipa_reference_optimization_summary_d 
*ipa_reference_optimization_

  struct ipa_reference_vars_info_d
  {
+  /* Return true if the data structure is empty.  */
+  inline bool
+  empty_p ()
+  {
+return local.statics_read == NULL && local.statics_written == NULL
+  && global.statics_read == NULL && global.statics_written == NULL;
+  }
+

Similarly.

So please confirm those are still POD types.  If they are, then the 
patch is OK as-is.  If they're not PODs, then change them to classes and 
that patch is pre-approved.


jeff



Re: [PATCH] ipa-icf.c: Fix typo in dump file

2015-07-09 Thread Jeff Law

On 07/09/2015 11:32 AM, Bernhard Reutner-Fischer wrote:

gcc/ChangeLog

2015-07-09  Bernhard Reutner-Fischer  

* ipa-icf.c (sem_item_optimizer::do_congruence_step): Fix typo
in dump message.

Ok for trunk if testing passes?
Hmz, that's obvious, will commit tomorrow after the regstrap during
night.

Yes, it's obvious :-)

Testing is still warranted because someone might have be scanning dumps 
in the testsuite and have the same typo there.  I've been bitten by that 
myself in the past.


Jeff



Re: [PATCH 4/6] Port ipa-cp to use cgraph_edge summary.

2015-07-09 Thread Jeff Law

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* ipa-cp.c (struct edge_clone_summary): New structure.
(class edge_clone_summary_t): Likewise.
(edge_clone_summary_t::initialize): New method.
(edge_clone_summary_t::duplicate): Likewise.
(get_next_cgraph_edge_clone): Remove.
(get_info_about_necessary_edges): Refactor using the new
data structure.
(gather_edges_for_value): Likewise.
(perhaps_add_new_callers): Likewise.
(ipcp_driver): Allocate and deallocate newly added
instance.
---

OK.
Jeff




Re: [PATCH 6/6] Migrate ipa-pure-const to function_summary.

2015-07-09 Thread Jeff Law

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* ipa-pure-const.c (struct funct_state_d): New.
(funct_state_d::default_p): Likewise.
(has_function_state): Remove.
(get_function_state): Likewise.
(set_function_state): Likewise.
(add_new_function): Rename and port to ::insert.
(duplicate_node_data): Rename and port to ::duplicate.
(funct_state_summary_t::duplicate): New function.
(register_hooks): Remove hook registration.
(pure_const_generate_summary): Use new data structure.
(pure_const_write_summary): Likewise.
(pure_const_read_summary): Likewise.
(propagate_pure_const): Likewise.
(propagate_nothrow): Likewise.
(execute): Remove hook usage.
(pass_ipa_pure_const::pass_ipa_pure_const): Likewise.
---
@@ -84,6 +85,18 @@ const char *pure_const_names[3] = {"const", "pure", 
"neither"};
 decl.  */
  struct funct_state_d
  {
+  funct_state_d (): pure_const_state (IPA_NEITHER),
+state_previously_known (IPA_NEITHER), looping_previously_known (true),
+looping (true), can_throw (true), can_free (true) {}
+
+  funct_state_d (const funct_state_d &s): pure_const_state 
(s.pure_const_state),
+state_previously_known (s.state_previously_known),
+looping_previously_known (s.looping_previously_known),
+looping (s.looping), can_throw (s.can_throw), can_free (s.can_free) {}
+
+  /* Return true if the value is default.  */
+  bool default_p ();
+
/* See above.  */
enum pure_const_state_e pure_const_state;
/* What user set here; we can be always sure about this.  */

Doesn't this need to be a "class" rather then a "struct"?


OK with that change.

jeff


C++ PATCHes to template parameter parsing

2015-07-09 Thread Jason Merrill
The first patch is a cleanup hoisted from the c++-concepts branch: on 
the branch we want to be able to parse type/template in multiple places, 
so Andrew factored that out, but it's a good code cleanup on the trunk 
as well.


While looking at this and related code on the branch, I noticed that 
there was a lot of redundant code in cp_parser_template_parameter for 
dealing with parameter packs that we should have handled already in 
cp_parser_parameter_declaration.  I've dealt with that by taking over an 
unused flag in cp_parameter_declarator for passing back that the 
parameter is a pack.  And while I was there I tweaked a permerror that 
wasn't using the permerror function.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit bfb4b766089dedf5a51371cfcb64d2caffe41bb2
Author: Jason Merrill 
Date:   Thu Jul 9 12:11:58 2015 -0400

	* parser.c (cp_parser_default_type_template_argument)
	(cp_parser_default_template_template_argument): Factor out from
	cp_parser_type_parameter.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d64b227..686654c 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13330,6 +13330,69 @@ cp_parser_template_parameter_list (cp_parser* parser)
   return end_template_parm_list (parameter_list);
 }
 
+/* Parse a default argument for a type template-parameter.
+   Note that diagnostics are handled in cp_parser_template_parameter.  */
+
+static tree
+cp_parser_default_type_template_argument (cp_parser *parser)
+{
+  gcc_assert (cp_lexer_next_token_is (parser->lexer, CPP_EQ));
+
+  /* Consume the `=' token.  */
+  cp_lexer_consume_token (parser->lexer);
+
+  /* Parse the default-argument.  */
+  push_deferring_access_checks (dk_no_deferred);
+  tree default_argument = cp_parser_type_id (parser);
+  pop_deferring_access_checks ();
+
+  return default_argument;
+}
+
+/* Parse a default argument for a template template-parameter.  */
+
+static tree
+cp_parser_default_template_template_argument (cp_parser *parser)
+{
+  gcc_assert (cp_lexer_next_token_is (parser->lexer, CPP_EQ));
+
+  bool is_template;
+
+  /* Consume the `='.  */
+  cp_lexer_consume_token (parser->lexer);
+  /* Parse the id-expression.  */
+  push_deferring_access_checks (dk_no_deferred);
+  /* save token before parsing the id-expression, for error
+ reporting */
+  const cp_token* token = cp_lexer_peek_token (parser->lexer);
+  tree default_argument
+= cp_parser_id_expression (parser,
+   /*template_keyword_p=*/false,
+   /*check_dependency_p=*/true,
+   /*template_p=*/&is_template,
+   /*declarator_p=*/false,
+   /*optional_p=*/false);
+  if (TREE_CODE (default_argument) == TYPE_DECL)
+/* If the id-expression was a template-id that refers to
+   a template-class, we already have the declaration here,
+   so no further lookup is needed.  */
+;
+  else
+/* Look up the name.  */
+default_argument
+  = cp_parser_lookup_name (parser, default_argument,
+   none_type,
+   /*is_template=*/is_template,
+   /*is_namespace=*/false,
+   /*check_dependency=*/true,
+   /*ambiguous_decls=*/NULL,
+   token->location);
+  /* See if the default argument is valid.  */
+  default_argument = check_template_template_default_arg (default_argument);
+  pop_deferring_access_checks ();
+  return default_argument;
+}
+
 /* Parse a template-parameter.
 
template-parameter:
@@ -13552,11 +13615,8 @@ cp_parser_type_parameter (cp_parser* parser, bool *is_parameter_pack)
 	/* If the next token is an `=', we have a default argument.  */
 	if (cp_lexer_next_token_is (parser->lexer, CPP_EQ))
 	  {
-	/* Consume the `=' token.  */
-	cp_lexer_consume_token (parser->lexer);
-	/* Parse the default-argument.  */
-	push_deferring_access_checks (dk_no_deferred);
-	default_argument = cp_parser_type_id (parser);
+	default_argument
+	  = cp_parser_default_type_template_argument (parser);
 
 /* Template parameter packs cannot have default
arguments. */
@@ -13574,7 +13634,6 @@ cp_parser_type_parameter (cp_parser* parser, bool *is_parameter_pack)
   }
 	else if (check_for_bare_parameter_packs (default_argument))
 	  default_argument = error_mark_node;
-	pop_deferring_access_checks ();
 	  }
 	else
 	  default_argument = NULL_TREE;
@@ -13632,40 +13691,8 @@ cp_parser_type_parameter (cp_parser* parser, bool *is_parameter_pack)
 	   default-argument.  */
 	if (cp_lexer_next_token_is (parser->lexer, CPP_EQ))
 	  {
-	bool is_template;
-
-	/* Consume the `='.  */
-	cp_lexer_consume_token (parser->lexer);
-	/* Parse the id-expression.  */
-	push_deferring_access_checks (dk_no_deferred);
-	/* save token before parsin

Re: [PATCH] Remove duplicate graphite statistics printers

2015-07-09 Thread Bernhard Reutner-Fischer
On July 9, 2015 6:48:39 PM GMT+02:00, Jeff Law  wrote:
>On 07/09/2015 10:30 AM, Bernhard Reutner-Fischer wrote:
>>
>> Thanks,
>>
>> gcc/ChangeLog
>>
>> 2015-07-09  Bernhard Reutner-Fischer  
>>
>>  * graphite.h: New file.
>>  (print_graphite_statistics): Extern declaration.
>>  * graphite-scop-detection.c (print_graphite_scop_statistics,
>>  print_graphite_statistics): Remove.
>>  * graphite.c (print_graphite_statistics): Make public and add
>>  PREFIX parameter. Adjust callers.
>>  (print_graphite_scop_statistics): Add PREFIX parameter.
>>
>> Signed-off-by: Bernhard Reutner-Fischer 
>> ---
>> -/* Print statistics for SCOPS to FILE.  */
>> +/* Print statistics for SCOPS to FILE. Prefix statistic headers with
>PREFIX.  */
>Nit.  Two spaces after a period, even in comments.

Done.

I wasn't sure where to put the prototype, maybe a graphite.h for just that 
(ATM) is not warranted?
Maybe an extern decl in graphite-scop-detection.c would be sufficient?
>
>
>With that nit fixed, OK for the trunk after a bootstrap and regression
>test.

Thanks,
>
>jeff




Re: [patch 0/9] Flattening and initial module rebuilding

2015-07-09 Thread Jeff Law

On 07/09/2015 11:06 AM, Andrew MacLeod wrote:

On 07/08/2015 10:42 PM, Andrew MacLeod wrote:

On 07/08/2015 06:43 PM, Jeff Law wrote:

predict.h is actually required by gimple.h for a few reasons, enum
be_predictor is used in parameter lists and a few inlines use the
TAKEN,
NOT_TAKEN macros
Its also needed by cfghooks.h, and betwen those 2 files, its just
needed
by  a very good chunk of the backend. .. 219 of the 263 files which
include backend.h need it.
We could move the 2 enums and TAKEN/NOT_TAKEN to coretypes or something
like that and it would probably cut the requirements for it by a *lot*.

Might be something for a follow-up (moving the enums).



blah, not so trivial.  One of the primary things predict.h does is
create enum br_predictor by including predict,def.. so moving that
enum doesnt really make sense.

Fixing gimple,h isn't too bad, I could split the prediction stuff out
into gimple-predict.h and include it in the 4 places its needed (as
you might be able to tell, Ive tried this :-)

However, it doesn't do much good.  cfghooks.h is included by
basic-block.h.. which is needed virtually everywhere :-P

There are just 2 entries in the hooks table which require  'enum
br_predictor', but I dont think it makes sense to move things out of
the cfghook structure.. I suppose once could create a prediction_hooks
structure.. and put it in predict_hooks.h...  but I think thats
probably going to far to avoid including a file which makes some
logical sense..

The other option is to pull cfghooks.h out of basic-block.h and
include it seperately on its own.. What is the reason its in there
now? It appears to not have a cyclic dependency which is the usual
reason to have an include in the middle of a file.  Or perhaps the
reason no longer exists?  There is a comment at the top of cfghooks.h :
   /* Only basic-block.h includes this.  */
but no rationale.

I moved it to the very bottom of the file and everything still seems
to  compile fine   I can try flattening it out of basic-block.h and
only including it in places that need it... that should eliminate the
need to put predict.h in a lot of places I would think.



ok,  so the results are in. a bit painful to unravel :-)  these are the
steps
- Splitting the prediction bits of gimple.h out to gimple-predict.h and
putting that file where it matters (5 files as it turns out)
- Next split cfghooks.h out from basic-block.h and put it in the files
that it is needed in.
- then I moved  predict.h out of backend.h  I added it as an include to
cfghooks.h and gimple-predict.h and adjusted source files accordingly..
- Finally, try to remove the extraneous cfghooks.h and predict.h files.

caveat, I did no reductions on config/* files nor on languages beyond
stage1 builds... havent gotten to enhancing to tool to deal with that
yet.. its coming as a part of the general include reduction.. so I'll
havdle these bits then.  .
Then result using numbers which exclude those caveat files:
predict.h ends up in 68 files out of the original 179 it was in by
itself. . ( ie excluding files with cfghooks.h or gimple-predict.h)
cfghooks.h ends up in 89 of the original 268 that basic-block was
present in.

The total result affect 227 files.

Now, predict.h is *still* more pervasive than it needs to be, but thats
a different patch :-).  There are  a set of routines in there like
optimize_{fucntion,loop,edge,bb}_for{speed,size}_p()  that are the
reason half the files need it.   those should probably be moved
somewhere else since they aren't really prediction related :-P Maybe a
better spot will show up when the rest of the include reductions are done.

I've attached a patch.  It bootstraps on x86_64-unknown-linux-gnu, and
I'm running regressions.  To be safe, I'll run config-list.mk overnight
to be sure.
assuming its all fine, OK for trunk then?

OK assuming everything is fine with your overnight run.

BTW, you're showing your age -- a .Z file...  Thankfully gzip will 
handle that just fine.  (And yes, I realize the irony here that gzip is 
probably considered ancient as well given bzip2 and xz).



jeff


Re: [PATCH] Remove duplicate graphite statistics printers

2015-07-09 Thread Jeff Law

On 07/09/2015 11:48 AM, Bernhard Reutner-Fischer wrote:

On July 9, 2015 6:48:39 PM GMT+02:00, Jeff Law  wrote:

On 07/09/2015 10:30 AM, Bernhard Reutner-Fischer wrote:


Thanks,

gcc/ChangeLog

2015-07-09  Bernhard Reutner-Fischer  

* graphite.h: New file.
(print_graphite_statistics): Extern declaration.
* graphite-scop-detection.c (print_graphite_scop_statistics,
print_graphite_statistics): Remove.
* graphite.c (print_graphite_statistics): Make public and add
PREFIX parameter. Adjust callers.
(print_graphite_scop_statistics): Add PREFIX parameter.

Signed-off-by: Bernhard Reutner-Fischer 
---
-/* Print statistics for SCOPS to FILE.  */
+/* Print statistics for SCOPS to FILE. Prefix statistic headers with

PREFIX.  */
Nit.  Two spaces after a period, even in comments.


Done.

I wasn't sure where to put the prototype, maybe a graphite.h for just that 
(ATM) is not warranted?
Maybe an extern decl in graphite-scop-detection.c would be sufficient?
But it's more generic than that.  I also wouldn't be surprised if we 
found other things that belonged in graphite.h.


jeff


Small C++ PATCH to instantiation_dependent_r for TRAIT_EXPR

2015-07-09 Thread Jason Merrill
There's no reason we should have the logic for whether a TRAIT_EXPR is 
value-dependent in two places.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit fcd3802b2b88251b6c0dea63b38e6f269be7713d
Author: Jason Merrill 
Date:   Tue Jul 7 23:49:21 2015 -0400

	* pt.c (instantiation_dependent_r) [TRAIT_EXPR]: Call
	value_dependent_expression_p.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4911096..b5f1af8 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -20788,7 +20788,7 @@ cp_parser_member_declaration (cp_parser* parser)
 	  decl = cp_parser_alias_declaration (parser);
 	  /* Note that if we actually see the '=' token after the
 	 identifier, cp_parser_alias_declaration commits the
-	 tentative parse.  In that case, we really expects an
+	 tentative parse.  In that case, we really expect an
 	 alias-declaration.  Otherwise, we expect a using
 	 declaration.  */
 	  alias_decl_expected =
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index d3e9d31..63907ce 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20131,7 +20131,7 @@ template_for_substitution (tree decl)
 }
 
 /* Returns true if we need to instantiate this template instance even if we
-   know we aren't going to emit it..  */
+   know we aren't going to emit it.  */
 
 bool
 always_instantiate_p (tree decl)
@@ -21684,9 +21684,7 @@ instantiation_dependent_r (tree *tp, int *walk_subtrees,
   }
 
 case TRAIT_EXPR:
-  if (dependent_type_p (TRAIT_EXPR_TYPE1 (*tp))
-	  || (TRAIT_EXPR_TYPE2 (*tp)
-	  && dependent_type_p (TRAIT_EXPR_TYPE2 (*tp
+  if (value_dependent_expression_p (*tp))
 	return *tp;
   *walk_subtrees = false;
   return NULL_TREE;


Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-09 Thread Steve Kargl
On Thu, Jul 09, 2015 at 12:25:18PM +0200, Andre Vehreschild wrote:
> 
> I need your help on how to interpret the standard(s) or how to
> implement handling an allocatable function's result, when that
> result is not allocated by the function.  Imagine the simple
> (albeit artificial) case:
> 
> integer function read_input()
>   read_input = ...
> end function
> 
> integer function getNext()
>   allocatable :: getNext
>   if (more_input_available ()) getNext = read_input()
> end function
> 
> where the function getNext () returns an (automatically)
> allocated result when more_input_available() returns .true..
> Otherwise getNext () returns an unallocated object, i.e.,
> the result's allocation status is .false.. I don't want to
> argue about this design's quality (considering it poor myself). I
> suppose that this code is legal, right?

Code is both valid and invalid.  As you point out, it
depends on the return value of more_input_available().
Also, note, it is always invalid under -std=f95 as it
is using automatic (re-)allocation of the LHS.

> Unfortunately gfortran can not handle it currently.

Whatever gfortran does is "correct", because the code is
invalid in the more_input_available() = .false. case.  It is
the responsible of the programmer to ensure that getNext() has
an allocated and assigned value before it returns.  IHMO,
I think that gfortran should not try to guess what the
programmer might have intended.

Yes, the compiled code may dereference a possibly invalid pointer.
The compiled program should segfault, and the programmer should
fix the Fortran code.

function getNext()
   allocatable :: getNext
   if (more_input_available ())
  getNext = read_input()
   else
  allocate(getNext, source=some_error_code?)
   end if
end function

or

function getNext()
   allocatable :: getNext
   allocate(getNext, source=some_error_code?)
   if (more_input_available ()) getNext = read_input()
end function

-- 
Steve


Re: [patch 0/9] Flattening and initial module rebuilding

2015-07-09 Thread Andrew MacLeod

On 07/09/2015 01:49 PM, Jeff Law wrote:

On 07/09/2015 11:06 AM, Andrew MacLeod wrote:


The total result affect 227 files.

Now, predict.h is *still* more pervasive than it needs to be, but thats
a different patch :-).  There are  a set of routines in there like
optimize_{fucntion,loop,edge,bb}_for{speed,size}_p()  that are the
reason half the files need it.   those should probably be moved
somewhere else since they aren't really prediction related :-P Maybe a
better spot will show up when the rest of the include reductions are 
done.


I've attached a patch.  It bootstraps on x86_64-unknown-linux-gnu, and
I'm running regressions.  To be safe, I'll run config-list.mk overnight
to be sure.
assuming its all fine, OK for trunk then?

OK assuming everything is fine with your overnight run.

BTW, you're showing your age -- a .Z file...  Thankfully gzip will 
handle that just fine.  (And yes, I realize the irony here that gzip 
is probably considered ancient as well given bzip2 and xz).




Ha. I just type compress and move on. :-)  but yeah, lookitthat. bzip2 
is about 40% the size :-)ok, maybe next time :-)   If i remember...

Andrew


[PATCH] Adjust variable shift costs for IA MCU

2015-07-09 Thread H.J. Lu
We reduce code size for IA MCU by adjusting variable shift costs for IA
MCU.  OK for trunk?

Thanks.


H.J.
---
PR target/66821
* config/i386/i386.c (iamcu_cost): Adjust variable shift costs.
---
 gcc/config/i386/i386.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 85e59a8..e4e505c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -420,7 +420,7 @@ static const
 struct processor_costs iamcu_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,   /* cost of a lea instruction */
-  COSTS_N_INSNS (4),   /* variable shift costs */
+  COSTS_N_INSNS (1),   /* variable shift costs */
   COSTS_N_INSNS (1),   /* constant shift costs */
   {COSTS_N_INSNS (11), /* cost of starting multiply for QI */
COSTS_N_INSNS (11), /*   HI */
-- 
2.4.3



Re: [gomp4.1] depend(sink) and depend(source) parsing for C

2015-07-09 Thread Jakub Jelinek
Hi!

On Thu, Jul 09, 2015 at 11:24:44AM -0700, Aldy Hernandez wrote:

Thanks for working on it.

> +   wide_int offset = wi::neg (addend, &overflow);
> +   addend = wide_int_to_tree (TREE_TYPE (addend), offset);
> +   if (overflow)
> + warning_at (c_parser_peek_token (parser)->location,
> + OPT_Woverflow,
> + "possible overflow in % offset");

possible overflow looks weird.  Shouldn't it complain the same
as it does if you do:
int c = - (-2147483648);
?

> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -12489,6 +12489,11 @@ c_finish_omp_clauses (tree clauses, bool 
> declare_simd)
> == OMP_CLAUSE_DEPEND_SOURCE);
> break;
>   }
> +   if (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK)
> + {
> +   gcc_assert (TREE_CODE (t) == TREE_LIST);
> +   break;
> + }
> if (TREE_CODE (t) == TREE_LIST)
>   {
> if (handle_omp_array_sections (c))

Won't this ICE if somebody uses depend(sink:) ? or depend(sink:.::) or
similar garbage?  Make sure you don't create OMP_CLAUSE_DEPEND in that
case.

> diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
> index f0e2c67..ba79977 100644
> --- a/gcc/gimple-walk.c
> +++ b/gcc/gimple-walk.c
> @@ -327,6 +327,10 @@ walk_gimple_op (gimple stmt, walk_tree_fn callback_op,
>}
>break;
>  
> +case GIMPLE_OMP_ORDERED:
> +  /* Ignore clauses.  */
> +  break;
> +

I'm not convinced you don't want to walk the clauses.

> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 6057ea0..e33fe1e 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -527,6 +527,17 @@ struct GTY((tag("GSS_OMP_CRITICAL")))
>tree name;
>  };
>  
> +/* GIMPLE_OMP_ORDERED */
> +
> +struct GTY((tag("GSS_OMP_ORDERED")))
> +  gomp_ordered : public gimple_statement_omp
> +{
> +  /* [ WORD 1-7 ] : base class */
> +
> +  /* [ WORD 8 ]  */
> +  tree clauses;
> +};

I would have expected to use 
struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
  gomp_ordered : public gimple_statement_omp_single_layout
{
/* No extra fields; adds invariant:
 stmt->code == GIMPLE_OMP_ORDERED.  */
};
instead (like gomp_single, gomp_teams, ...).

> @@ -149,6 +149,9 @@ struct gimplify_omp_ctx
>struct gimplify_omp_ctx *outer_context;
>splay_tree variables;
>hash_set *privatized_types;
> +  /* Iteration variables in an OMP_FOR.  */
> +  tree *iter_vars;
> +  int niter_vars;

Wonder if it wouldn't be better to use a vec instead.
Then the size would be there as vec_length.

> @@ -8169,6 +8185,19 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p)
>return GS_ALL_DONE;
>  }
>  
> +/* Verify the validity of the depend(sink:...) variable VAR.
> +   Return TRUE if everything is OK, otherwise return FALSE.  */
> +
> +static bool
> +verify_sink_var (location_t loc, tree var)
> +{
> +  for (int i = 0; i < gimplify_omp_ctxp->niter_vars; ++i)
> +if (var == gimplify_omp_ctxp->iter_vars[i])
> +  return true;
> +  error_at (loc, "variable %qE is not an iteration variable", var);
> +  return false;

I believe what we want to verify is that ith variable in the OMP_CLAUSE_DECL
vector is iter_vars[i], so not just some random permutation etc.

> @@ -3216,7 +3218,51 @@ check_omp_nesting_restrictions (gimple stmt, 
> omp_context *ctx)
>   break;
> }
>break;
> +case GIMPLE_OMP_TASK:
> +  for (c = gimple_omp_task_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
> + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
> + && (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SOURCE
> + || OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK))
> +   {
> + error_at (OMP_CLAUSE_LOCATION (c),
> +   "depend(%s) is only available in 'omp ordered'",

Please avoid using ' in diagnostics, it should be % instead.

> +   OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SOURCE
> +   ? "source" : "sink");
> + return false;
> +   }
> +  break;

This will eventually be needed also for GIMPLE_OMP_TARGET and
GIMPLE_OMP_ENTER/EXIT_DATA.  But as that isn't really supported right now,
can wait.

>  case GIMPLE_OMP_ORDERED:
> +  for (c = gimple_omp_ordered_clauses (as_a  (stmt));
> +c; c = OMP_CLAUSE_CHAIN (c))
> + {
> +   enum omp_clause_depend_kind kind = OMP_CLAUSE_DEPEND_KIND (c);
> +   if (kind == OMP_CLAUSE_DEPEND_SOURCE
> +   || kind == OMP_CLAUSE_DEPEND_SINK)
> + {
> +   bool have_ordered = false;
> +   /* Look for containing ordered(N) loop.  */
> +   for (omp_context *ctx_ = ctx; ctx_; ctx_ = ctx_->outer)

Please use octx or something similar, I don't like the trailing _ ;)

> +   if (!have_ordered)
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + "depend clause is not within an ordered loop"

Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-09 Thread Andre Vehreschild
Hi Steve,

Thanks for your knowledge. Can you support your statement that an allocatable 
function has to return an allocated object by a part of the standard? I totally 
agree with you that this code is ill-designed, but IMO is it not the task of 
the compiler to address ill design. The compiler has to comply to the standard 
and the standard allows allocatable objects to be unallocated. So why has the 
result of a function be allocated always?

Regards,
Andre

Am 9. Juli 2015 19:50:47 MESZ, schrieb Steve Kargl 
:
>On Thu, Jul 09, 2015 at 12:25:18PM +0200, Andre Vehreschild wrote:
>> 
>> I need your help on how to interpret the standard(s) or how to
>> implement handling an allocatable function's result, when that
>> result is not allocated by the function.  Imagine the simple
>> (albeit artificial) case:
>> 
>> integer function read_input()
>>   read_input = ...
>> end function
>> 
>> integer function getNext()
>>   allocatable :: getNext
>>   if (more_input_available ()) getNext = read_input()
>> end function
>> 
>> where the function getNext () returns an (automatically)
>> allocated result when more_input_available() returns .true..
>> Otherwise getNext () returns an unallocated object, i.e.,
>> the result's allocation status is .false.. I don't want to
>> argue about this design's quality (considering it poor myself). I
>> suppose that this code is legal, right?
>
>Code is both valid and invalid.  As you point out, it
>depends on the return value of more_input_available().
>Also, note, it is always invalid under -std=f95 as it
>is using automatic (re-)allocation of the LHS.
>
>> Unfortunately gfortran can not handle it currently.
>
>Whatever gfortran does is "correct", because the code is
>invalid in the more_input_available() = .false. case.  It is
>the responsible of the programmer to ensure that getNext() has
>an allocated and assigned value before it returns.  IHMO,
>I think that gfortran should not try to guess what the
>programmer might have intended.
>
>Yes, the compiled code may dereference a possibly invalid pointer.
>The compiled program should segfault, and the programmer should
>fix the Fortran code.
>
>function getNext()
>   allocatable :: getNext
>   if (more_input_available ())
>  getNext = read_input()
>   else
>  allocate(getNext, source=some_error_code?)
>   end if
>end function
>
>or
>
>function getNext()
>   allocatable :: getNext
>   allocate(getNext, source=some_error_code?)
>   if (more_input_available ()) getNext = read_input()
>end function

-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Mail: ve...@gmx.de * Tel.: +49 241 9291018


[PATCH] PR target/66824: -miamcu doesn't load FP constant into register directly

2015-07-09 Thread H.J. Lu
ix86_split_long_move can optimize floating point constant move, which
can be used to optimize SFmode move for IA MCU.

OK for trunk if there is no regression?


H.J.
---
gcc/

PR target/66824
* config/i386/i386.c (ix86_split_to_parts): Allow SFmode move
for IA MCU.
(ix86_split_long_move): Support single move.
* config/i386/i386.md (FP splitter): Allow SFmode for IA MCU.

gcc/testsuite/

PR target/66824
* gcc.target/i386/pr66824.c: New test.
---
 gcc/config/i386/i386.c  | 14 +-
 gcc/config/i386/i386.md |  3 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e4e505c..aceeb13 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22747,7 +22747,9 @@ ix86_split_to_parts (rtx operand, rtx *parts, 
machine_mode mode)
 size = (GET_MODE_SIZE (mode) + 4) / 8;
 
   gcc_assert (!REG_P (operand) || !MMX_REGNO_P (REGNO (operand)));
-  gcc_assert (size >= 2 && size <= 4);
+  /* For IA MCU, we can optimize SFmode load.  */
+  gcc_assert ((size >= 2 || (TARGET_IAMCU && mode == SFmode))
+ && size <= 4);
 
   /* Optimize constant pool reference to immediates.  This is used by fp
  moves, that force all constants to memory to allow combining.  */
@@ -22825,10 +22827,14 @@ ix86_split_to_parts (rtx operand, rtx *parts, 
machine_mode mode)
case DFmode:
  REAL_VALUE_TO_TARGET_DOUBLE (r, l);
  break;
+   case SFmode:
+ REAL_VALUE_TO_TARGET_SINGLE (r, l[0]);
+ goto part0;
default:
  gcc_unreachable ();
}
  parts[1] = gen_int_mode (l[1], SImode);
+part0:
  parts[0] = gen_int_mode (l[0], SImode);
}
  else
@@ -22935,6 +22941,12 @@ ix86_split_long_move (rtx operands[])
   nparts = ix86_split_to_parts (operands[1], part[1], GET_MODE (operands[0]));
   ix86_split_to_parts (operands[0], part[0], GET_MODE (operands[0]));
 
+  if (nparts == 1)
+{
+  emit_move_insn (part[0][0], part[1][0]);
+  return;
+}
+
   /* When emitting push, take care for source operands on the stack.  */
   if (push && MEM_P (operands[1])
   && reg_overlap_mentioned_p (stack_pointer_rtx, operands[1]))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1d43aaf..e723eab 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3507,7 +3507,8 @@
   "reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
-   || GET_MODE (operands[0]) == DFmode)
+   || GET_MODE (operands[0]) == DFmode
+   || (TARGET_IAMCU && GET_MODE (operands[0]) == SFmode))
&& !(ANY_FP_REG_P (operands[0]) || ANY_FP_REG_P (operands[1]))"
   [(const_int 0)]
   "ix86_split_long_move (operands); DONE;")
-- 
2.4.3



[gomp4.1] depend(sink) and depend(source) parsing for C

2015-07-09 Thread Aldy Hernandez
The following patch goes along with Jakub's parsing of ordered(n) loops. 
 With it, we can now parse his testcase, along with a variety of other 
tests with appropriate diagnostics.


The lowering to gimple is still not done, as we should agree on what 
needs to be emitted first.


I'll follow up with the C++ version once we agree on C.

How does this look?
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index cd3bd5a..98a6b02 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11701,6 +11701,92 @@ c_parser_omp_clause_simdlen (c_parser *parser, tree 
list)
   return c;
 }
 
+/* OpenMP 4.1:
+   vec:
+ identifier [+/- integer]
+ vec , identifier [+/- integer]
+*/
+
+static tree
+c_parser_omp_clause_depend_sink (c_parser *parser, location_t clause_loc,
+tree list)
+{
+  tree vec = NULL;
+  if (c_parser_next_token_is_not (parser, CPP_NAME)
+  || c_parser_peek_token (parser)->id_kind != C_ID_ID)
+c_parser_error (parser, "expected identifier");
+
+  while (c_parser_next_token_is (parser, CPP_NAME)
+&& c_parser_peek_token (parser)->id_kind == C_ID_ID)
+{
+  tree t = lookup_name (c_parser_peek_token (parser)->value);
+  tree addend = NULL;
+
+  if (t == NULL_TREE)
+   {
+ undeclared_variable (c_parser_peek_token (parser)->location,
+  c_parser_peek_token (parser)->value);
+ t = error_mark_node;
+   }
+
+  c_parser_consume_token (parser);
+
+  if (t != error_mark_node)
+   {
+ bool neg;
+
+ if (c_parser_next_token_is (parser, CPP_MINUS))
+   neg = true;
+ else if (c_parser_next_token_is (parser, CPP_PLUS))
+   neg = false;
+ else
+   {
+ addend = integer_zero_node;
+ goto add_to_vector;
+   }
+ c_parser_consume_token (parser);
+
+ if (c_parser_next_token_is_not (parser, CPP_NUMBER))
+   {
+ c_parser_error (parser, "expected %");
+ return list;
+   }
+
+ addend = c_parser_peek_token (parser)->value;
+ if (TREE_CODE (addend) != INTEGER_CST)
+   {
+ c_parser_error (parser, "expected %");
+ return list;
+   }
+ if (neg)
+   {
+ bool overflow;
+ wide_int offset = wi::neg (addend, &overflow);
+ addend = wide_int_to_tree (TREE_TYPE (addend), offset);
+ if (overflow)
+   warning_at (c_parser_peek_token (parser)->location,
+   OPT_Woverflow,
+   "possible overflow in % offset");
+   }
+ c_parser_consume_token (parser);
+
+   add_to_vector:
+ vec = tree_cons (t, addend, vec);
+
+ if (c_parser_next_token_is_not (parser, CPP_COMMA))
+   break;
+
+ c_parser_consume_token (parser);
+   }
+}
+
+  tree u = build_omp_clause (clause_loc, OMP_CLAUSE_DEPEND);
+  OMP_CLAUSE_DEPEND_KIND (u) = OMP_CLAUSE_DEPEND_SINK;
+  OMP_CLAUSE_DECL (u) = nreverse (vec);
+  OMP_CLAUSE_CHAIN (u) = list;
+  return u;
+}
+
 /* OpenMP 4.0:
depend ( depend-kind: variable-list )
 
@@ -11708,10 +11794,9 @@ c_parser_omp_clause_simdlen (c_parser *parser, tree 
list)
  in | out | inout
 
OpenMP 4.1:
-   depend ( depend-loop-kind [ : vec ] )
+   depend ( source )
 
-   depend-loop-kind:
- source | sink  */
+   depend ( sink  : vec )  */
 
 static tree
 c_parser_omp_clause_depend (c_parser *parser, tree list)
@@ -11754,16 +11839,19 @@ c_parser_omp_clause_depend (c_parser *parser, tree 
list)
   return c;
 }
 
-  /* FIXME: Handle OMP_CLAUSE_DEPEND_SINK.  */
-
   if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
 goto resync_fail;
 
-  nl = c_parser_omp_variable_list (parser, clause_loc,
-  OMP_CLAUSE_DEPEND, list);
+  if (kind == OMP_CLAUSE_DEPEND_SINK)
+nl = c_parser_omp_clause_depend_sink (parser, clause_loc, list);
+  else
+{
+  nl = c_parser_omp_variable_list (parser, clause_loc,
+  OMP_CLAUSE_DEPEND, list);
 
-  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
-OMP_CLAUSE_DEPEND_KIND (c) = kind;
+  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
+   OMP_CLAUSE_DEPEND_KIND (c) = kind;
+}
 
   c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
   return nl;
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 469cd88..0b332e8 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12489,6 +12489,11 @@ c_finish_omp_clauses (tree clauses, bool declare_simd)
  == OMP_CLAUSE_DEPEND_SOURCE);
  break;
}
+ if (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK)
+   {
+ gcc_assert (TREE_CODE (t) == TREE_LIST);
+ break;
+   }
  if (TREE_CODE (t) == TREE_LIST)
{
  if (h

Re: [C++ Patch] PR 61491 (aka DR 1206)

2015-07-09 Thread Paolo Carlini

Hi,

On 07/09/2015 06:33 PM, Jason Merrill wrote:

On 07/09/2015 12:09 PM, Paolo Carlini wrote:

the DR got resolved in time for C++11 and Jonathan noticed that we
should remove the pedwarn, not a big deal. Tested x86_64-linux.


How about adding the testcase from the DR as well?  OK with that change.
Thanks. That revealed three interesting details. First, I suppose we 
want to use the testcase in the amended form appearing in C++11, thus 
with char as underlying type for the last two lines, instead of int 
(otherwise we reject both for the wrong reason, likewise all the other 
compilers I have at hand). That said, the following two lines of the 
testcase trigger another pedwarn, in the parser, about enumeration 
templates:


template enum A::E : T { eT };
template enum class A::S : T { sT };

Finally, we do *not* reject, as we should, the line:

template<> enum A::E : char { echar };

Then, overall, is it Ok to simply suppress the pedwarn in C++11, and 
xfail for now the error? Should I open a new, separate bug report about 
the latter? (note that the issue, failing to reject an explicit 
specialization after instantiation, doesn't sound new to me and seems 
more general than enum-related issues...)


Thanks!
Paolo.

/





Index: cp/parser.c
===
--- cp/parser.c (revision 225614)
+++ cp/parser.c (working copy)
@@ -15827,8 +15827,9 @@ cp_parser_enum_specifier (cp_parser* parser)
  type = TREE_TYPE (name);
  if (TREE_CODE (type) == TYPENAME_TYPE)
{
- /* Are template enums allowed in ISO? */
- if (template_parm_scope_p ())
+ /* Template enums are allowed in C++11.  */
+ if (template_parm_scope_p ()
+ && cxx_dialect < cxx11)
pedwarn (type_start_token->location, OPT_Wpedantic,
 "%qD is an enumeration template", name);
  /* ignore a typename reference, for it will be solved by name
Index: cp/pt.c
===
--- cp/pt.c (revision 225614)
+++ cp/pt.c (working copy)
@@ -970,11 +970,10 @@ maybe_process_partial_specialization (tree type)
 }
   else if (processing_specialization)
 {
-   /* Someday C++0x may allow for enum template specialization.  */
+  /* Under DR 1206 enum template specializations are allowed.  */
   if (cxx_dialect > cxx98 && TREE_CODE (type) == ENUMERAL_TYPE
  && CLASS_TYPE_P (context) && CLASSTYPE_USE_TEMPLATE (context))
-   pedwarn (input_location, OPT_Wpedantic, "template specialization "
-"of %qD not allowed by ISO C++", type);
+   ;
   else
{
  error ("explicit specialization of non-template %qT", type);
Index: testsuite/g++.dg/cpp0x/enum30.C
===
--- testsuite/g++.dg/cpp0x/enum30.C (revision 0)
+++ testsuite/g++.dg/cpp0x/enum30.C (working copy)
@@ -0,0 +1,20 @@
+// PR c++/61491, DR 1206
+// { dg-do compile { target c++11 } }
+
+template  struct Base 
+{ 
+  enum class E : unsigned; 
+  E e; 
+  Base(E e) : e(e) {} 
+}; 
+
+struct X; 
+
+template<> enum class Base::E : unsigned { a, b }; 
+
+struct X : Base 
+{ 
+  X() : Base(E::b) {} 
+};
+
+X x; 
Index: testsuite/g++.dg/cpp0x/enum31.C
===
--- testsuite/g++.dg/cpp0x/enum31.C (revision 0)
+++ testsuite/g++.dg/cpp0x/enum31.C (working copy)
@@ -0,0 +1,14 @@
+// DR 1206, PR c++/61491
+// { dg-do compile { target c++11 } }
+
+template struct A {
+  enum E : T;
+  enum class S : T;
+};
+
+template<> enum A::E : int { eint };
+template<> enum class A::S : int { sint };
+template enum A::E : T { eT };
+template enum class A::S : T { sT };
+template<> enum A::E : char { echar };  // { dg-error "after 
instantiation" "" { xfail *-*-* } }
+template<> enum class A::S : char { schar };


RE: [Patch, MIPS] Fix SYSROOT_SUFFIX_SPEC for mips-mti-linux-gnu

2015-07-09 Thread Matthew Fortune
> 2015-07-09  Steve Ellcey  
> 
>   * config/mips/mti-linux.h (MIPS_SYSVERSION_SPEC): Update
>   to handle mips[32|64]r3 and mips[32|64]r5.

OK, thanks.

Matthew


Re: [PATCH, i386]: Fix PR 66814, ICE: gcc.target/i386/avx512f-klogic-2.c

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 05:22:33PM +0200, Uros Bizjak wrote:
> Hello!
> 
> This ICE was caused by a peephole2 pattern that allowed non-general
> regs arguments.
> 
> 2015-07-08  Uros Bizjak  
> 
> PR target/66814
> * config/i386/predicates.md (nonimmediate_gr_operand): New predicate.
> * config/i386/i386.md (not peephole2): Use nonimmediate_gr_operand.
> (varous peephole2s): Use {GENERAL,SSE,MMX}_REGNO_P instead of
> {GENERAL_SSE_MMX}_REG_P where appropriate.

This patch breaks bootstrap with rtl checking on x86_64-linux.

../../gcc/tree-vect-loop.c: In function ‘void 
calc_vec_perm_mask_for_shift(machine_mode, unsigned int, unsigned char*)’:
../../gcc/tree-vect-loop.c:3190:1: internal compiler error: RTL check: expected 
code 'reg', have 'subreg' in rhs_regno, at rtl.h:1782
 }
 ^
0x11a44ef rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
../../gcc/rtl.c:786
0x19e917c rhs_regno
../../gcc/rtl.h:1782
0x1d20eea peephole2_10
../../gcc/config/i386/i386.md:17699
0x1d2ccc8 peephole2_insns(rtx_def*, rtx_insn*, int*)
../../gcc/config/i386/i386.md:5050
0x11190e0 peephole2_optimize
../../gcc/recog.c:3627
0x111a81f rest_of_handle_peephole2
../../gcc/recog.c:3807
0x111a8d4 execute
../../gcc/recog.c:3841

register_operand allows a SUBREG even after a reload, as long as
it is a SUBREG of a REG_P.

insn in question is:
(insn 75 71 76 2 (set (subreg:SI (reg:DI 2 cx) 0)
(const_int -1 [0x])) ../../gcc/tree-vect-loop.c:3189 -1
 (nil))

and has been created by the:
;; After splitting up read-modify operations, array accesses with memory
;; operands might end up in form:
;;  sall$2, %eax
;;  movl4(%esp), %edx
;;  addl%edx, %eax
;; instead of pre-splitting:
;;  sall$2, %eax
;;  addl4(%esp), %eax
;; Turn it into:
;;  movl4(%esp), %edx
;;  leal(%edx,%eax,4), %eax
peephole2.
Wonder if
  if (mode != word_mode)
operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
  if (op1mode != word_mode)
operands[5] = gen_rtx_SUBREG (op1mode, operands[5], 0);  
should not have been gen_lowpart instead of gen_rtx_SUBREG.
In any case, it wouldn't surprise me if there aren't many other ways
how to get a SUBREG in there.

Jakub


Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-09 Thread Steve Kargl
On Thu, Jul 09, 2015 at 08:59:08PM +0200, Andre Vehreschild wrote:
> Hi Steve,
> 
> Thanks for your knowledge. Can you support your statement that an allocatable 
> function has to return an allocated object by a part of the standard? I 
> totally agree with you that this code is ill-designed, but IMO is it not the 
> task of the compiler to address ill design. The compiler has to comply to the 
> standard and the standard allows allocatable objects to be unallocated. So 
> why has the result of a function be allocated always?
> 
> Regards,
> Andre
> 

I think the following excerpts from F2008 are the relevant
clauses, especially the 2nd to last sentence in the excerpt
from 12.6.2.2.

!  12.5.3
!
!  When execution of the function is complete, the value of
!  the function result is available for use in the expression
!  that caused the function to be invoked.
!
!  12.6.2.2
!
!  If RESULT appears, the name of the result variable of the
!  function is result-name and all occurrences of the function
!  name in execution-part statements in its scope refer to the
!  function itself.  If RESULT does not appear, the name of the
!  result variable is function-name and all occurrences of the
!  function name in execution-part statements in its scope are
!  references to the result variable.  The characteristics (12.3.3)
!  of the function result are those of the result variable.  On
!  completion of execution of the function, the value returned is
!  that of its result variable.  If the function result is a pointer,
!  the shape of the value returned by the function is determined by
!  the shape of the result variable when the execution of the function
!  is completed.  If the result variable is not a pointer, its value
!  shall be defined by the function.  If the function result is a
!  pointer, on return the pointer association status of the result
!  variable shall not be undefined.

-- 
Steve


Re: [PATCH, PR target/65103, 2/3] Propagate address constants into loops for i386

2015-07-09 Thread Jeff Law

On 06/01/2015 06:26 AM, Ilya Enkovich wrote:

2015-05-29 1:15 GMT+03:00 Jeff Law :


Right, but you're blindly propagating.  The right thing to do is look at
some kind of metric to estimate when it's profitable to propagate the
constant back in vs leave it hoisted out.


No, the patch is not to blindly propagate but to let loop invariant to
be propagated into address. Existing propagation gain estimation
(should_replace_address) still applies.
Agreed.  I missed the costing metric in should_replace_address.  Sorry 
for not looking more deeply at that and delaying progress on this issue.


So looking more closely at the patch itself...

AFAICT, your change to that condition potentially allows fwprop to 
propagate any constants to any use, regardless of context (mem vs non-mem).


You're largely getting away with that because you're checking for the 
rtx code CONST -- which filters out everything except symbolic 
constants.  ie, you're filtering out CONST_INT, CONST_DOUBLE, CONST_*.


So the first thing you need to do is clarify that comment a bit.   Perhaps:

/* Do not propagate loop invariant definitions into a different loop,
   except for symbolic constants which may propagate, subject to cost
   testing.  */


The second issue is I think the testcase has been compromised.  If I 
compile the test on the trunk, I don't see any references to GOTOFF.  So 
I think you need to create a new testcase.  One way would be to 
instrument a build, detecting anytime you were able to propagate a 
symbolic constant into a different loop nest -- if you can discover one, 
then use creduce or multidelta to produce a simplified testcase.


Because of the CONST vs CONST_INT stuff, you're not running the risk of 
regressing 65768, so no need to build an arm cross a test that :-)


With the comment update and a new testcase this is OK for the trunk.

Again, sorry for delaying progress on this.

Jeff


Re: [PATCH, i386]: Fix PR 66814, ICE: gcc.target/i386/avx512f-klogic-2.c

2015-07-09 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 9:38 PM, Jakub Jelinek  wrote:
> On Thu, Jul 09, 2015 at 05:22:33PM +0200, Uros Bizjak wrote:
>> Hello!
>>
>> This ICE was caused by a peephole2 pattern that allowed non-general
>> regs arguments.
>>
>> 2015-07-08  Uros Bizjak  
>>
>> PR target/66814
>> * config/i386/predicates.md (nonimmediate_gr_operand): New predicate.
>> * config/i386/i386.md (not peephole2): Use nonimmediate_gr_operand.
>> (varous peephole2s): Use {GENERAL,SSE,MMX}_REGNO_P instead of
>> {GENERAL_SSE_MMX}_REG_P where appropriate.
>
> This patch breaks bootstrap with rtl checking on x86_64-linux.
>
> ../../gcc/tree-vect-loop.c: In function ‘void 
> calc_vec_perm_mask_for_shift(machine_mode, unsigned int, unsigned char*)’:
> ../../gcc/tree-vect-loop.c:3190:1: internal compiler error: RTL check: 
> expected code 'reg', have 'subreg' in rhs_regno, at rtl.h:1782
>  }
>  ^
> 0x11a44ef rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
> char const*)
> ../../gcc/rtl.c:786
> 0x19e917c rhs_regno
> ../../gcc/rtl.h:1782
> 0x1d20eea peephole2_10
> ../../gcc/config/i386/i386.md:17699
> 0x1d2ccc8 peephole2_insns(rtx_def*, rtx_insn*, int*)
> ../../gcc/config/i386/i386.md:5050
> 0x11190e0 peephole2_optimize
> ../../gcc/recog.c:3627
> 0x111a81f rest_of_handle_peephole2
> ../../gcc/recog.c:3807
> 0x111a8d4 execute
> ../../gcc/recog.c:3841
>
> register_operand allows a SUBREG even after a reload, as long as
> it is a SUBREG of a REG_P.
>
> insn in question is:
> (insn 75 71 76 2 (set (subreg:SI (reg:DI 2 cx) 0)
> (const_int -1 [0x])) ../../gcc/tree-vect-loop.c:3189 
> -1
>  (nil))
>
> and has been created by the:
> ;; After splitting up read-modify operations, array accesses with memory
> ;; operands might end up in form:
> ;;  sall$2, %eax
> ;;  movl4(%esp), %edx
> ;;  addl%edx, %eax
> ;; instead of pre-splitting:
> ;;  sall$2, %eax
> ;;  addl4(%esp), %eax
> ;; Turn it into:
> ;;  movl4(%esp), %edx
> ;;  leal(%edx,%eax,4), %eax
> peephole2.
> Wonder if
>   if (mode != word_mode)
> operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
>   if (op1mode != word_mode)
> operands[5] = gen_rtx_SUBREG (op1mode, operands[5], 0);
> should not have been gen_lowpart instead of gen_rtx_SUBREG.
> In any case, it wouldn't surprise me if there aren't many other ways
> how to get a SUBREG in there.

I was under impression that peephole2 pass doesn't see subregs of hard
regs (all x86 predicates are written in this way). Even documentation
somehow agrees with this:

'(subreg:M1 REG:M2 BYTENUM)'

'subreg' expressions are used to refer to a register in a machine
mode other than its natural one, or to refer to one register of a
multi-part 'reg' that actually refers to several registers.

Each pseudo register has a natural mode.  If it is necessary to
operate on it in a different mode, the register must be enclosed in
a 'subreg'.

There are currently three supported types for the first operand of
a 'subreg':

[...]

   * hard registers It is seldom necessary to wrap hard registers
 in 'subreg's; such registers would normally reduce to a single
 'reg' rtx.  This use of 'subreg's is discouraged and may not
 be supported in the future.

So, I'd say that generating naked SUBREG after reload should be
avoided and gen_lowpart should be used in the code above.

Uros.


> Jakub


Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 05:23:36PM +0300, Maxim Blumental wrote:
>  Now the patch is corrected (according to the previous letter) and
> ready to be reviewed. I'm looking forward to your feedback.

Ok, I guess while the filenames are still different, it is quite easily
possible to find corresponding test in the Examples git repo.

So, what remains is write proper ChangeLog entry, for all the renames and
other changes, like:
* testsuite/libgomp.c/examples-4/e.55.1.c: Renamed to...
* testsuite/libgomp.c/examples-4/async_target-1.c: ... this.
* testsuite/libgomp.c/examples-4/e.55.2.c: Renamed to...
* testsuite/libgomp.c/examples-4/async_target-2.c: ... this.
(vec_mult_ref): Remove v1 and v2 arguments, turn them into
local variables.
(vec_mult): Likewise.  Add #pragma omp taskwait.
(main): Adjust caller.
...
* testsuite/libgomp.c/examples-4/simd-1.c: New file.
* testsuite/libgomp.c/examples-4/simd-2.c: New file.
...
etc.  Also, in all dg-do run tests that use #pragma omp declare simd please
use { dg-do run { target vect_simd_clones } } instead of
{ dg-do run }, so that it does not fail miserably on darwin or other OSes
with less capable assemblers?

Have you reported that
!$omp for simd collapse(2) private(tmp)
->
!$omp simd collapse(2) private(tmp)
to omp-lang (or created ticket for it etc.)?

Jakub


Re: [PATCH, i386]: Fix PR 66814, ICE: gcc.target/i386/avx512f-klogic-2.c

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 10:13:49PM +0200, Uros Bizjak wrote:
> I was under impression that peephole2 pass doesn't see subregs of hard
> regs (all x86 predicates are written in this way). Even documentation
> somehow agrees with this:
> 
> '(subreg:M1 REG:M2 BYTENUM)'
> 
> 'subreg' expressions are used to refer to a register in a machine
> mode other than its natural one, or to refer to one register of a
> multi-part 'reg' that actually refers to several registers.
> 
> Each pseudo register has a natural mode.  If it is necessary to
> operate on it in a different mode, the register must be enclosed in
> a 'subreg'.
> 
> There are currently three supported types for the first operand of
> a 'subreg':
> 
> [...]
> 
>* hard registers It is seldom necessary to wrap hard registers
>  in 'subreg's; such registers would normally reduce to a single
>  'reg' rtx.  This use of 'subreg's is discouraged and may not
>  be supported in the future.
> 
> So, I'd say that generating naked SUBREG after reload should be
> avoided and gen_lowpart should be used in the code above.

There is also:
  emit_insn (gen_sse2_loadld (operands[3], CONST0_RTX (V4SImode),
  gen_rtx_SUBREG (SImode, operands[1], 0)));
  emit_insn (gen_sse2_loadld (operands[4], CONST0_RTX (V4SImode),
  gen_rtx_SUBREG (SImode, operands[1], 4)));
in some splitters (also post-reload).

Jakub


Proposal to postpone release of 5.2 for a week [Was: Re: patch to fix PR66782]

2015-07-09 Thread Uros Bizjak
Hello!

> The patch was bootstrapped and tested on x86/x86-64.
>
> Committed as rev. 225618.
>
> 2015-07-09  Vladimir Makarov  
>
> PR rtl-optimization/66782
> * lra-int.h (struct lra_insn_recog_data): Add comment about
> clobbered hard regs for arg_hard_regs.
> * lra.c (lra_set_insn_recog_data): Add clobbered hard regs.
> * lra-lives.c (process_bb_lives): Process clobbered hard regs.
> Add condition for processing used hard regs.
> * lra-constraints.c (update_ebb_live_info, inherit_in_ebb):
> Process clobbered hard regs.

I would like to nominate this patch for gcc-5.2 release. According to
downstream bugreport [1], gcc-5.1 is unusable for 64-bit wine:

"Breaks all of wine, no easy workaround -> blocker."

Due to severity of this bug, and importance of Wine, I'd like to
postpone the 5.2 release for a week, so this bug gets some testing in
the mainline, before it is backported to gcc-5 branch

[1] https://bugs.winehq.org/show_bug.cgi?id=38653

Uros.


Re: [PATCH, i386]: Fix PR 66814, ICE: gcc.target/i386/avx512f-klogic-2.c

2015-07-09 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 10:17 PM, Jakub Jelinek  wrote:
> On Thu, Jul 09, 2015 at 10:13:49PM +0200, Uros Bizjak wrote:
>> I was under impression that peephole2 pass doesn't see subregs of hard
>> regs (all x86 predicates are written in this way). Even documentation
>> somehow agrees with this:
>>
>> '(subreg:M1 REG:M2 BYTENUM)'
>>
>> 'subreg' expressions are used to refer to a register in a machine
>> mode other than its natural one, or to refer to one register of a
>> multi-part 'reg' that actually refers to several registers.
>>
>> Each pseudo register has a natural mode.  If it is necessary to
>> operate on it in a different mode, the register must be enclosed in
>> a 'subreg'.
>>
>> There are currently three supported types for the first operand of
>> a 'subreg':
>>
>> [...]
>>
>>* hard registers It is seldom necessary to wrap hard registers
>>  in 'subreg's; such registers would normally reduce to a single
>>  'reg' rtx.  This use of 'subreg's is discouraged and may not
>>  be supported in the future.
>>
>> So, I'd say that generating naked SUBREG after reload should be
>> avoided and gen_lowpart should be used in the code above.
>
> There is also:
>   emit_insn (gen_sse2_loadld (operands[3], CONST0_RTX (V4SImode),
>   gen_rtx_SUBREG (SImode, operands[1], 0)));
>   emit_insn (gen_sse2_loadld (operands[4], CONST0_RTX (V4SImode),
>   gen_rtx_SUBREG (SImode, operands[1], 4)));
> in some splitters (also post-reload).

These won't hit the peephole2, changed by the patch, but let's fix
these before they start to break.

Uros.


Re: [PATCH] Adjust variable shift costs for IA MCU

2015-07-09 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 8:05 PM, H.J. Lu  wrote:
> We reduce code size for IA MCU by adjusting variable shift costs for IA
> MCU.  OK for trunk?

IMO, tuning patches should fall into "obvious" category. I don't have
any data to to do any meaningful review of a cost metric for a new
target.

So, instead of rubberstamping them again and again, these kind of
patches are pre-approved for all non-algorithmic tuning changes for
IAMCU target.

Uros.

> Thanks.
>
>
> H.J.
> ---
> PR target/66821
> * config/i386/i386.c (iamcu_cost): Adjust variable shift costs.
> ---
>  gcc/config/i386/i386.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 85e59a8..e4e505c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -420,7 +420,7 @@ static const
>  struct processor_costs iamcu_cost = {
>COSTS_N_INSNS (1),   /* cost of an add instruction */
>COSTS_N_INSNS (1) + 1,   /* cost of a lea instruction */
> -  COSTS_N_INSNS (4),   /* variable shift costs */
> +  COSTS_N_INSNS (1),   /* variable shift costs */
>COSTS_N_INSNS (1),   /* constant shift costs */
>{COSTS_N_INSNS (11), /* cost of starting multiply for QI */
> COSTS_N_INSNS (11), /*   HI */
> --
> 2.4.3
>


Re: [PATCH] Adjust variable shift costs for IA MCU

2015-07-09 Thread H.J. Lu
On Thu, Jul 9, 2015 at 1:30 PM, Uros Bizjak  wrote:
> On Thu, Jul 9, 2015 at 8:05 PM, H.J. Lu  wrote:
>> We reduce code size for IA MCU by adjusting variable shift costs for IA
>> MCU.  OK for trunk?
>
> IMO, tuning patches should fall into "obvious" category. I don't have
> any data to to do any meaningful review of a cost metric for a new
> target.
>
> So, instead of rubberstamping them again and again, these kind of
> patches are pre-approved for all non-algorithmic tuning changes for
> IAMCU target.
>

I will keep it in mind.

Thanks.

-- 
H.J.


Re: [PATCH 2/6] Introduce new edge_summary class and replace ipa_edge_args_sum.

2015-07-09 Thread Martin Liška

On 07/09/2015 07:15 PM, Jeff Law wrote:

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* cgraph.c (symbol_table::create_edge): Introduce summary_uid
for cgraph_edge.
* cgraph.h (struct GTY): Likewise.
* ipa-inline-analysis.c (estimate_function_body_sizes): Use
new data structure.
* ipa-profile.c (ipa_profile): Likewise.
* ipa-prop.c (ipa_print_node_jump_functions):
(ipa_propagate_indirect_call_infos): Likewise.
(ipa_free_edge_args_substructures): Likewise.
(ipa_free_all_edge_args): Likewise.
(ipa_edge_args_t::remove): Likewise.
(ipa_edge_removal_hook): Likewise.
(ipa_edge_args_t::duplicate): Likewise.
(ipa_register_cgraph_hooks): Likewise.
(ipa_unregister_cgraph_hooks): Likewise.
* ipa-prop.h (ipa_check_create_edge_args): Likewise.
(ipa_edge_args_info_available_for_edge_p): Likewise.
* symbol-summary.h (gt_ggc_mx): Indent properly.
(gt_pch_nx): Likewise.
(edge_summary): New class.
---
  gcc/cgraph.c  |   2 +
  gcc/cgraph.h  |   5 +-
  gcc/ipa-inline-analysis.c |   2 +-
  gcc/ipa-profile.c |   2 +-
  gcc/ipa-prop.c|  71 +++-
  gcc/ipa-prop.h|  44 ++
  gcc/symbol-summary.h  | 208 +-
  7 files changed, 252 insertions(+), 82 deletions(-)



}

+
+  /* Destruction method that can be called for GGT purpose.  */

GGT stands for?


Hi.

Should be 'GGC', fixed in attached patch.

Thanks,
Martin




OK for the trunk.

Thanks,
Jeff


>From 86a698b6fe9a0bbb55a0e2fa090d74cb02677d2f Mon Sep 17 00:00:00 2001
From: mliska 
Date: Thu, 9 Jul 2015 11:13:52 +0200
Subject: [PATCH 2/6] Introduce new edge_summary class and replace
 ipa_edge_args_sum.

gcc/ChangeLog:

2015-07-03  Martin Liska  

	* cgraph.c (symbol_table::create_edge): Introduce summary_uid
	for cgraph_edge.
	* cgraph.h (struct GTY): Likewise.
	* ipa-inline-analysis.c (estimate_function_body_sizes): Use
	new data structure.
	* ipa-profile.c (ipa_profile): Likewise.
	* ipa-prop.c (ipa_print_node_jump_functions):
	(ipa_propagate_indirect_call_infos): Likewise.
	(ipa_free_edge_args_substructures): Likewise.
	(ipa_free_all_edge_args): Likewise.
	(ipa_edge_args_t::remove): Likewise.
	(ipa_edge_removal_hook): Likewise.
	(ipa_edge_args_t::duplicate): Likewise.
	(ipa_register_cgraph_hooks): Likewise.
	(ipa_unregister_cgraph_hooks): Likewise.
	* ipa-prop.h (ipa_check_create_edge_args): Likewise.
	(ipa_edge_args_info_available_for_edge_p): Likewise.
	* symbol-summary.h (gt_ggc_mx): Indent properly.
	(gt_pch_nx): Likewise.
	(edge_summary): New class.
---
 gcc/cgraph.c  |   2 +
 gcc/cgraph.h  |   5 +-
 gcc/ipa-inline-analysis.c |   2 +-
 gcc/ipa-profile.c |   2 +-
 gcc/ipa-prop.c|  71 +++-
 gcc/ipa-prop.h|  44 ++
 gcc/symbol-summary.h  | 208 +-
 7 files changed, 252 insertions(+), 82 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index d13bcd3..d7b6257 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -851,6 +851,8 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
   edge->uid = edges_max_uid++;
 }
 
+  edge->summary_uid = edge_max_summary_uid++;
+
   edges_count++;
 
   edge->aux = NULL;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0fe58e1..ef71958 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1593,6 +1593,8 @@ struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
   int frequency;
   /* Unique id of the edge.  */
   int uid;
+  /* Not recycled unique id of the node.  */
+  int summary_uid;
   /* Whether this edge was made direct by indirect inlining.  */
   unsigned int indirect_inlining_edge : 1;
   /* Whether this edge describes an indirect call with an undetermined
@@ -1874,7 +1876,7 @@ public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
-  symbol_table (): cgraph_max_summary_uid (1)
+  symbol_table (): cgraph_max_summary_uid (1), edge_max_summary_uid (1)
   {
   }
 
@@ -2078,6 +2080,7 @@ public:
 
   int edges_count;
   int edges_max_uid;
+  int edge_max_summary_uid;
 
   symtab_node* GTY(()) nodes;
   asm_node* GTY(()) asmnodes;
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index d5dbfbd..c11dc9c 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -2852,7 +2852,7 @@ estimate_function_body_sizes (struct cgraph_node *node, bool early)
 {
   if (!early)
 loop_optimizer_finalize ();
-  else if (!ipa_edge_args_vector)
+  else if (!ipa_edge_args_sum)
 	ipa_free_all_node_params ();
   free_dominance_info (CDI_DOMINATORS);
 }
diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index 698729b..6d6afb3 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -627,7 +627,7 @@ ipa_profile (void)
  "Not speculating: target is overwritable "
  "and can be discarded.\

Re: [PATCH 5/6] Port IPA reference to function_summary infrastructure.

2015-07-09 Thread Martin Liška

On 07/09/2015 07:35 PM, Jeff Law wrote:

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* ipa-reference.c (ipa_ref_opt_summary_t): New class.
(get_reference_optimization_summary): Use it.
(set_reference_optimization_summary): Likewise.
(ipa_init): Remove hook holders usage.
(ipa_reference_c_finalize): Likewise.
(ipa_ref_opt_summary_t::duplicate): New function.
(ipa_ref_opt_summary_t::remove): Likewise.
(propagate): Allocate the summary if does not exist.
(ipa_reference_read_optimization_summary): Likewise.
(struct ipa_reference_vars_info_d): Add new method.
(struct ipa_reference_optimization_summary_d): Likewise.
(get_reference_vars_info): Use new underlying container.
(set_reference_vars_info): Remove.
(init_function_info): Set up the container.



@@ -89,6 +84,13 @@ struct ipa_reference_global_vars_info_d

  struct ipa_reference_optimization_summary_d
  {
+  /* Return true if the data structure is empty.  */
+  inline bool
+  empty_p ()
+  {
+return statics_not_read == NULL && statics_not_written == NULL;
+  }
+

Presumably this is still POD, even with the inline function, so "struct" is 
still correct, right?




@@ -99,6 +101,14 @@ typedef struct ipa_reference_optimization_summary_d 
*ipa_reference_optimization_

  struct ipa_reference_vars_info_d
  {
+  /* Return true if the data structure is empty.  */
+  inline bool
+  empty_p ()
+  {
+return local.statics_read == NULL && local.statics_written == NULL
+  && global.statics_read == NULL && global.statics_written == NULL;
+  }
+

Similarly.

So please confirm those are still POD types.  If they are, then the patch is OK 
as-is.  If they're not PODs, then change them to classes and that patch is 
pre-approved.

jeff



Hi.

Yes, that are POD types.

Martin


Re: [PATCH 6/6] Migrate ipa-pure-const to function_summary.

2015-07-09 Thread Martin Liška

On 07/09/2015 07:44 PM, Jeff Law wrote:

On 07/09/2015 03:13 AM, mliska wrote:

gcc/ChangeLog:

2015-07-03  Martin Liska  

* ipa-pure-const.c (struct funct_state_d): New.
(funct_state_d::default_p): Likewise.
(has_function_state): Remove.
(get_function_state): Likewise.
(set_function_state): Likewise.
(add_new_function): Rename and port to ::insert.
(duplicate_node_data): Rename and port to ::duplicate.
(funct_state_summary_t::duplicate): New function.
(register_hooks): Remove hook registration.
(pure_const_generate_summary): Use new data structure.
(pure_const_write_summary): Likewise.
(pure_const_read_summary): Likewise.
(propagate_pure_const): Likewise.
(propagate_nothrow): Likewise.
(execute): Remove hook usage.
(pass_ipa_pure_const::pass_ipa_pure_const): Likewise.
---
@@ -84,6 +85,18 @@ const char *pure_const_names[3] = {"const", "pure", 
"neither"};
 decl.  */
  struct funct_state_d
  {
+  funct_state_d (): pure_const_state (IPA_NEITHER),
+state_previously_known (IPA_NEITHER), looping_previously_known (true),
+looping (true), can_throw (true), can_free (true) {}
+
+  funct_state_d (const funct_state_d &s): pure_const_state 
(s.pure_const_state),
+state_previously_known (s.state_previously_known),
+looping_previously_known (s.looping_previously_known),
+looping (s.looping), can_throw (s.can_throw), can_free (s.can_free) {}
+
+  /* Return true if the value is default.  */
+  bool default_p ();
+
/* See above.  */
enum pure_const_state_e pure_const_state;
/* What user set here; we can be always sure about this.  */

Doesn't this need to be a "class" rather then a "struct"?


OK with that change.

jeff


Yeah.

'class' will be more appropriate. As I'm going to be AFK for Friday and 
upcoming weekend,
I will install these patches on Monday.

Thanks,
Martin
>From b20c6ceaf82210c564465be1ab92f6586fc706e4 Mon Sep 17 00:00:00 2001
From: mliska 
Date: Thu, 9 Jul 2015 11:13:55 +0200
Subject: [PATCH 6/6] Migrate ipa-pure-const to function_summary.

gcc/ChangeLog:

2015-07-03  Martin Liska  

	* ipa-pure-const.c (struct funct_state_d): New.
	(funct_state_d::default_p): Likewise.
	(has_function_state): Remove.
	(get_function_state): Likewise.
	(set_function_state): Likewise.
	(add_new_function): Rename and port to ::insert.
	(duplicate_node_data): Rename and port to ::duplicate.
	(funct_state_summary_t::duplicate): New function.
	(register_hooks): Remove hook registration.
	(pure_const_generate_summary): Use new data structure.
	(pure_const_write_summary): Likewise.
	(pure_const_read_summary): Likewise.
	(propagate_pure_const): Likewise.
	(propagate_nothrow): Likewise.
	(execute): Remove hook usage.
	(pass_ipa_pure_const::pass_ipa_pure_const): Likewise.
---
 gcc/ipa-pure-const.c | 180 +++
 1 file changed, 68 insertions(+), 112 deletions(-)

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index f0373e6..f978950f 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "opts.h"
 #include "varasm.h"
+#include "symbol-summary.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
being const, then may drop to pure and then neither depending on
@@ -82,8 +83,21 @@ const char *pure_const_names[3] = {"const", "pure", "neither"};
 
 /* Holder for the const_state.  There is one of these per function
decl.  */
-struct funct_state_d
+class funct_state_d
 {
+public:
+  funct_state_d (): pure_const_state (IPA_NEITHER),
+state_previously_known (IPA_NEITHER), looping_previously_known (true),
+looping (true), can_throw (true), can_free (true) {}
+
+  funct_state_d (const funct_state_d &s): pure_const_state (s.pure_const_state),
+state_previously_known (s.state_previously_known),
+looping_previously_known (s.looping_previously_known),
+looping (s.looping), can_throw (s.can_throw), can_free (s.can_free) {}
+
+  /* Return true if the value is default.  */
+  bool default_p ();
+
   /* See above.  */
   enum pure_const_state_e pure_const_state;
   /* What user set here; we can be always sure about this.  */
@@ -105,10 +119,16 @@ struct funct_state_d
   bool can_free;
 };
 
-/* State used when we know nothing about function.  */
-static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
-
+bool
+funct_state_d::default_p ()
+{
+  return pure_const_state == IPA_NEITHER
+&& state_previously_known == IPA_NEITHER
+&& looping_previously_known
+&& looping
+&& can_throw
+&& can_free;
+}
 
 typedef struct funct_state_d * funct_state;
 
@@ -116,9 +136,19 @@ typedef struct funct_state_d * funct_state;
possibility that it may be desirable to move this to the cgraph
local info.  */
 
-/* Array, indexed by cgraph node uid, of function states.  */
+class funct_state_summary_

Re: [C++ Patch] PR 61491 (aka DR 1206)

2015-07-09 Thread Jason Merrill

On 07/09/2015 03:36 PM, Paolo Carlini wrote:

Finally, we do *not* reject, as we should, the line:

template<> enum A::E : char { echar };

Then, overall, is it Ok to simply suppress the pedwarn in C++11, and
xfail for now the error? Should I open a new, separate bug report about
the latter? (note that the issue, failing to reject an explicit
specialization after instantiation, doesn't sound new to me and seems
more general than enum-related issues...)


I don't think it's a more general issue.  I note that the specialization 
seems to append to the enumerator list somehow:


int i = A::eT + A::echar; // accepted?!?

I think we should fix this before we remove the pedwarn.

Jason



Re: [PATCH] Fix PR c++/30044

2015-07-09 Thread Jason Merrill

OK, thanks.

Jason


Re: [PATCH] PR target/66824: -miamcu doesn't load FP constant into register directly

2015-07-09 Thread H.J. Lu
On Thu, Jul 09, 2015 at 12:13:38PM -0700, H.J. Lu wrote:
> ix86_split_long_move can optimize floating point constant move, which
> can be used to optimize SFmode move for IA MCU.
> 
> OK for trunk if there is no regression?
> 
> 
> H.J.
> ---
> gcc/
> 
>   PR target/66824
>   * config/i386/i386.c (ix86_split_to_parts): Allow SFmode move
>   for IA MCU.
>   (ix86_split_long_move): Support single move.
>   * config/i386/i386.md (FP splitter): Allow SFmode for IA MCU.
> 
> gcc/testsuite/
> 
>   PR target/66824
>   * gcc.target/i386/pr66824.c: New test.
> ---


I missed the testcase.  Here is the updated patch.

H.J.
---
---
 gcc/config/i386/i386.c  | 14 +-
 gcc/config/i386/i386.md |  3 ++-
 gcc/testsuite/gcc.target/i386/pr66824.c | 23 +++
 3 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66824.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a18c733..d2925bc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22744,7 +22744,9 @@ ix86_split_to_parts (rtx operand, rtx *parts, 
machine_mode mode)
 size = (GET_MODE_SIZE (mode) + 4) / 8;
 
   gcc_assert (!REG_P (operand) || !MMX_REGNO_P (REGNO (operand)));
-  gcc_assert (size >= 2 && size <= 4);
+  /* For IA MCU, we can optimize SFmode move.  */
+  gcc_assert ((size >= 2 || (TARGET_IAMCU && mode == SFmode))
+ && size <= 4);
 
   /* Optimize constant pool reference to immediates.  This is used by fp
  moves, that force all constants to memory to allow combining.  */
@@ -22822,10 +22824,14 @@ ix86_split_to_parts (rtx operand, rtx *parts, 
machine_mode mode)
case DFmode:
  REAL_VALUE_TO_TARGET_DOUBLE (r, l);
  break;
+   case SFmode:
+ REAL_VALUE_TO_TARGET_SINGLE (r, l[0]);
+ goto part0;
default:
  gcc_unreachable ();
}
  parts[1] = gen_int_mode (l[1], SImode);
+part0:
  parts[0] = gen_int_mode (l[0], SImode);
}
  else
@@ -22932,6 +22938,12 @@ ix86_split_long_move (rtx operands[])
   nparts = ix86_split_to_parts (operands[1], part[1], GET_MODE (operands[0]));
   ix86_split_to_parts (operands[0], part[0], GET_MODE (operands[0]));
 
+  if (nparts == 1)
+{
+  emit_move_insn (part[0][0], part[1][0]);
+  return;
+}
+
   /* When emitting push, take care for source operands on the stack.  */
   if (push && MEM_P (operands[1])
   && reg_overlap_mentioned_p (stack_pointer_rtx, operands[1]))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1d43aaf..e723eab 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3507,7 +3507,8 @@
   "reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
-   || GET_MODE (operands[0]) == DFmode)
+   || GET_MODE (operands[0]) == DFmode
+   || (TARGET_IAMCU && GET_MODE (operands[0]) == SFmode))
&& !(ANY_FP_REG_P (operands[0]) || ANY_FP_REG_P (operands[1]))"
   [(const_int 0)]
   "ix86_split_long_move (operands); DONE;")
diff --git a/gcc/testsuite/gcc.target/i386/pr66824.c 
b/gcc/testsuite/gcc.target/i386/pr66824.c
new file mode 100644
index 000..0966736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66824.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -miamcu" } */
+/* { dg-final { scan-assembler-not "\.LC\[0-9\]" } } */
+
+double foo (float);
+
+double
+f1 (void)
+{
+  return foo (1.0);
+}
+
+double
+f2 (void)
+{
+  return foo (0.0);
+}
+
+void
+f3 (float *x, float t)
+{
+  *x = 0.0 + t;
+}
-- 
2.4.3



Re: [patch] testsuite enable PIE tests on DragonFly

2015-07-09 Thread Andreas Tobler

On 09.07.15 17:30, Jeff Law wrote:

On 07/09/2015 07:53 AM, John Marino wrote:

DragonFly supports PIE but the tests for it are disabled.
The attached patch for the trunk will enable these checks on DragonFly.
Thanks for considering this!

John

p.s. suggested gcc/testsuite/changelog entry:

2015-07-XX  John Marino  

* lib/target-supports.exp (check_effective_target_pie):
Add *-*-dragonfly*


OK for the trunk.


Committed, r225636.

Cheers,
Andreas


Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 11:21:01AM +0200, Jakub Jelinek wrote:
> > Hmm, this looks like it could be split out to a function.
> 
> Ok, will try that.

Here is what I've committed after another bootstrap/regtest on x86_64-linux
and i686-linux.  You're right, the separate function cleaned stuff up.

2015-07-09  Jakub Jelinek  

PR tree-optimization/66718
* tree-vect-stmts.c (struct simd_call_arg_info): Add simd_lane_linear
field.
(vect_simd_lane_linear): New function.
(vectorizable_simd_clone_call): Support using linear arguments for
addresses of arrays elements indexed by GOMP_SIMD_LANE result.

--- gcc/tree-vect-stmts.c.jj2015-07-08 19:49:26.352199590 +0200
+++ gcc/tree-vect-stmts.c   2015-07-09 18:03:26.206455588 +0200
@@ -2629,8 +2629,79 @@ struct simd_call_arg_info
   enum vect_def_type dt;
   HOST_WIDE_INT linear_step;
   unsigned int align;
+  bool simd_lane_linear;
 };
 
+/* Helper function of vectorizable_simd_clone_call.  If OP, an SSA_NAME,
+   is linear within simd lane (but not within whole loop), note it in
+   *ARGINFO.  */
+
+static void
+vect_simd_lane_linear (tree op, struct loop *loop,
+  struct simd_call_arg_info *arginfo)
+{
+  gimple def_stmt = SSA_NAME_DEF_STMT (op);
+
+  if (!is_gimple_assign (def_stmt)
+  || gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR
+  || !is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt)))
+return;
+
+  tree base = gimple_assign_rhs1 (def_stmt);
+  HOST_WIDE_INT linear_step = 0;
+  tree v = gimple_assign_rhs2 (def_stmt);
+  while (TREE_CODE (v) == SSA_NAME)
+{
+  tree t;
+  def_stmt = SSA_NAME_DEF_STMT (v);
+  if (is_gimple_assign (def_stmt))
+   switch (gimple_assign_rhs_code (def_stmt))
+ {
+ case PLUS_EXPR:
+   t = gimple_assign_rhs2 (def_stmt);
+   if (linear_step || TREE_CODE (t) != INTEGER_CST)
+ return;
+   base = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (base), base, t);
+   v = gimple_assign_rhs1 (def_stmt);
+   continue;
+ case MULT_EXPR:
+   t = gimple_assign_rhs2 (def_stmt);
+   if (linear_step || !tree_fits_shwi_p (t) || integer_zerop (t))
+ return;
+   linear_step = tree_to_shwi (t);
+   v = gimple_assign_rhs1 (def_stmt);
+   continue;
+ CASE_CONVERT:
+   t = gimple_assign_rhs1 (def_stmt);
+   if (TREE_CODE (TREE_TYPE (t)) != INTEGER_TYPE
+   || (TYPE_PRECISION (TREE_TYPE (v))
+   < TYPE_PRECISION (TREE_TYPE (t
+ return;
+   if (!linear_step)
+ linear_step = 1;
+   v = t;
+   continue;
+ default:
+   return;
+ }
+  else if (is_gimple_call (def_stmt)
+  && gimple_call_internal_p (def_stmt)
+  && gimple_call_internal_fn (def_stmt) == IFN_GOMP_SIMD_LANE
+  && loop->simduid
+  && TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME
+  && (SSA_NAME_VAR (gimple_call_arg (def_stmt, 0))
+  == loop->simduid))
+   {
+ if (!linear_step)
+   linear_step = 1;
+ arginfo->linear_step = linear_step;
+ arginfo->op = base;
+ arginfo->simd_lane_linear = true;
+ return;
+   }
+}
+}
+
 /* Function vectorizable_simd_clone_call.
 
Check if STMT performs a function call that can be vectorized
@@ -2713,6 +2784,7 @@ vectorizable_simd_clone_call (gimple stm
   thisarginfo.linear_step = 0;
   thisarginfo.align = 0;
   thisarginfo.op = NULL_TREE;
+  thisarginfo.simd_lane_linear = false;
 
   op = gimple_call_arg (stmt, i);
   if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo,
@@ -2735,21 +2807,24 @@ vectorizable_simd_clone_call (gimple stm
 
   /* For linear arguments, the analyze phase should have saved
 the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
-  if (i * 2 + 3 <= STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length ()
- && STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2])
+  if (i * 3 + 4 <= STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length ()
+ && STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2])
{
  gcc_assert (vec_stmt);
  thisarginfo.linear_step
-   = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2]);
+   = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2]);
  thisarginfo.op
-   = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 1];
+   = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 1];
+ thisarginfo.simd_lane_linear
+   = (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 3]
+  == boolean_true_node);
  /* If loop has been peeled for alignment, we need to adjust it.  */
  tree n1 = LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
  tree n2 = LOOP_VINFO_NIT

[PATCH] Fix OpenMP ICE due to fold_stmt (PR middle-end/66820)

2015-07-09 Thread Jakub Jelinek
Hi!

As discussed on IRC, we have a problem because fold_stmt can call
force_gimple_operand_1, which in turn does
push_gimplify_context/pop_gimplify_context (NULL), even when inside
of some gimplify_ctxp and thus pushes the decls into function context
rather than the current gimplify_ctxp.

So far I've bootstrapped/regtested this version, which supposedly could be
useful for the 5 branch.  Other options are to avoid
push_gimplify_context/pop_gimplify_context in force_gimple_operand_1, if
gimplify_ctxp is already non-NULL when it is called (the problem with that
solution is that gimplify_ctxp is now private in gimplify.c and
force_gimple_operand_1 lives somewhere else, so we'd need some accessor
function for that or something).  Or avoid force_gimple_operand* from
fold_stmt.

2015-07-09  Jakub Jelinek  

PR middle-end/66820
* gimplify.c (maybe_fold_stmt): Don't fold in ORT_PARALLEL
or ORT_TASK contexts.
* omp-low.c (lower_omp): Call fold_stmt even if taskreg_nesting_level
is non-zero.

* gcc.dg/gomp/pr66820.c: New test.

--- gcc/gimplify.c.jj   2015-07-08 19:50:04.0 +0200
+++ gcc/gimplify.c  2015-07-09 16:36:58.336609740 +0200
@@ -2245,16 +2245,17 @@ gimplify_arg (tree *arg_p, gimple_seq *p
   return gimplify_expr (arg_p, pre_p, NULL, test, fb);
 }
 
-/* Don't fold inside offloading regions: it can break code by adding decl
-   references that weren't in the source.  We'll do it during omplower pass
-   instead.  */
+/* Don't fold inside offloading or taskreg regions: it can break code by
+   adding decl references that weren't in the source.  We'll do it during
+   omplower pass instead.  */
 
 static bool
 maybe_fold_stmt (gimple_stmt_iterator *gsi)
 {
   struct gimplify_omp_ctx *ctx;
   for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx->outer_context)
-if (ctx->region_type == ORT_TARGET)
+if (ctx->region_type == ORT_TARGET
+   || (ctx->region_type & (ORT_PARALLEL | ORT_TASK)) != 0)
   return false;
   return fold_stmt (gsi);
 }
--- gcc/omp-low.c.jj2015-07-08 19:49:39.0 +0200
+++ gcc/omp-low.c   2015-07-09 16:37:43.169942838 +0200
@@ -11890,8 +11890,8 @@ lower_omp (gimple_seq *body, omp_context
   for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (&gsi))
 lower_omp_1 (&gsi, ctx);
   /* During gimplification, we haven't folded statments inside offloading
- regions (gimplify.c:maybe_fold_stmt); do that now.  */
-  if (target_nesting_level)
+ or taskreg regions (gimplify.c:maybe_fold_stmt); do that now.  */
+  if (target_nesting_level || taskreg_nesting_level)
 for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (&gsi))
   fold_stmt (&gsi);
   input_location = saved_location;
--- gcc/testsuite/gcc.dg/gomp/pr66820.c.jj  2015-04-23 09:27:31.812958582 
+0200
+++ gcc/testsuite/gcc.dg/gomp/pr66820.c 2015-07-09 17:16:34.762225577 +0200
@@ -0,0 +1,18 @@
+/* PR middle-end/66820 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+void bar (char *);
+
+void
+foo (char **x)
+{
+#pragma omp parallel for
+  for (int i = 0; i < 16; i++)
+{
+  char y[50];
+  __builtin_strcpy (y, x[i]);
+  __builtin_strcat (y, "foo");
+  bar (y);
+}
+}

Jakub


Re: [patch] Fix PR middle-end/66633

2015-07-09 Thread Jakub Jelinek
On Mon, Jun 29, 2015 at 11:23:24AM +0200, Eric Botcazou wrote:
> > Don't you need to handle convert_nonlocal_omp_clauses similarly (need_chain
> > in that case)?
> > At least looking at your r211308 commit, for !optimize you force not just
> > the frame, but also chain.
> 
> You're very likely right, although I didn't manage to write a Fortran 
> testcase 
> with my limited knowledge of the language (I get a "Bad statement code" ICE).
> 
> In fact the conditions should also mimic those of the aforementioned commit.
> I initially didn't realize it, because I didn't quite grasp how the GOMP 
> stuff 
> was interacting with the nesting stuff, but I guess that the code ought not 
> to 
> do anything if there aren't pre-existing nested functions in the sources.

After looking into this some more, because the PR got reopened, there were
two issues: 1) __builtin_adjust_trampoline call needing a frame or chain (as
can be seen on the new testcases) that wasn't added to parallel/task/target
clauses 2) for !optimize, there is code to add those, when frame or chain
is needed for calls, but the problem was that as !optimize didn't clear
DECL_STATIC_CHAIN initially it wasn't discovered.

Both issues fixed in the following patch, which made the earlier change
unnecessary.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
Should go to 5 branch eventually.

2015-07-09  Jakub Jelinek  

PR middle-end/66633
* tree-nested.c (get_static_chain): Or in a flag into
info->static_chain_added.
(get_frame_field, get_nonlocal_debug_decl): Likewise.
(convert_nonlocal_omp_clauses, convert_local_omp_clauses): Revert
2015-07-01 changes.
(convert_tramp_reference_stmt): If a frame_decl or chain_decl
is needed newly inside of GIMPLE_OMP_{PARALLEL,TASK,TARGET} body,
add it to clauses.

* gcc.dg/gomp/pr66633-1.c: New test.
* gcc.dg/gomp/pr66633-2.c: New test.
* gcc.dg/gomp/pr66633-3.c: New test.
* gcc.dg/gomp/pr66633-4.c: New test.

--- gcc/tree-nested.c.jj2015-07-08 19:50:09.0 +0200
+++ gcc/tree-nested.c   2015-07-09 19:33:51.116923514 +0200
@@ -767,10 +767,12 @@ get_static_chain (struct nesting_info *i
   if (info->context == target_context)
 {
   x = build_addr (info->frame_decl, target_context);
+  info->static_chain_added |= 1;
 }
   else
 {
   x = get_chain_decl (info);
+  info->static_chain_added |= 2;
 
   for (i = info->outer; i->context != target_context; i = i->outer)
{
@@ -802,10 +804,12 @@ get_frame_field (struct nesting_info *in
   /* Make sure frame_decl gets created.  */
   (void) get_frame_type (info);
   x = info->frame_decl;
+  info->static_chain_added |= 1;
 }
   else
 {
   x = get_chain_decl (info);
+  info->static_chain_added |= 2;
 
   for (i = info->outer; i->context != target_context; i = i->outer)
{
@@ -851,10 +855,12 @@ get_nonlocal_debug_decl (struct nesting_
   (void) get_frame_type (info);
   x = info->frame_decl;
   i = info;
+  info->static_chain_added |= 1;
 }
   else
 {
   x = get_chain_decl (info);
+  info->static_chain_added |= 2;
   for (i = info->outer; i->context != target_context; i = i->outer)
{
  field = get_chain_field (i);
@@ -1061,9 +1067,7 @@ static bool
 convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 {
   struct nesting_info *const info = (struct nesting_info *) wi->info;
-  /* If not optimizing, we will force the creation of the CHAIN object in
- convert_all_function_calls, so we need to take it into account here.  */
-  bool need_chain = info->outer && !optimize, need_stmts = false;
+  bool need_chain = false, need_stmts = false;
   tree clause, decl;
   int dummy;
   bitmap new_suppress;
@@ -1691,9 +1695,7 @@ static bool
 convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 {
   struct nesting_info *const info = (struct nesting_info *) wi->info;
-  /* If not optimizing, we will force the creation of the FRAME object in
- convert_all_function_calls, so we need to take it into account here.  */
-  bool need_frame = info->inner && !optimize, need_stmts = false;
+  bool need_frame = false, need_stmts = false;
   tree clause, decl;
   int dummy;
   bitmap new_suppress;
@@ -2292,17 +2294,55 @@ convert_tramp_reference_stmt (gimple_stm
 case GIMPLE_OMP_PARALLEL:
 case GIMPLE_OMP_TASK:
   {
-   tree save_local_var_chain;
+   tree save_local_var_chain = info->new_local_var_chain;
 walk_gimple_op (stmt, convert_tramp_reference_op, wi);
-   save_local_var_chain = info->new_local_var_chain;
info->new_local_var_chain = NULL;
+   char save_static_chain_added = info->static_chain_added;
+   info->static_chain_added = 0;
 walk_body (convert_tramp_reference_stmt, convert_tramp_reference_op,
   info, gimple_omp_bo

Re: [C++ Patch] PR 61491 (aka DR 1206)

2015-07-09 Thread Paolo Carlini

Hi,

On 07/09/2015 10:53 PM, Jason Merrill wrote:

On 07/09/2015 03:36 PM, Paolo Carlini wrote:

Finally, we do *not* reject, as we should, the line:

template<> enum A::E : char { echar };

Then, overall, is it Ok to simply suppress the pedwarn in C++11, and
xfail for now the error? Should I open a new, separate bug report about
the latter? (note that the issue, failing to reject an explicit
specialization after instantiation, doesn't sound new to me and seems
more general than enum-related issues...)


I don't think it's a more general issue.  I note that the 
specialization seems to append to the enumerator list somehow:


int i = A::eT + A::echar; // accepted?!?
Annoying :( Indeed, I can see that the append already happened by the 
time maybe_process_partial_specialization is called the fourth time. 
Let's see if I can make some progress on that.


Thanks,
Paolo.


Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-09 Thread Martin Liška

On 07/03/2015 06:18 PM, Richard Sandiford wrote:

Hi Martin,

Martin Liška  writes:

On 07/03/2015 03:07 PM, Richard Sandiford wrote:

Martin Jambor  writes:

On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote:

Trevor Saunders  writes:

On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:

Martin Liška  writes:

diff --git a/gcc/asan.c b/gcc/asan.c
index e89817e..dabd6f1 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -362,20 +362,20 @@ struct asan_mem_ref
/* Pool allocation new operator.  */
inline void *operator new (size_t)
{
-return pool.allocate ();
+return ::new (pool.allocate ()) asan_mem_ref ();
}

/* Delete operator utilizing pool allocation.  */
inline void operator delete (void *ptr)
{
-pool.remove ((asan_mem_ref *) ptr);
+pool.remove (ptr);
}

/* Memory allocation pool.  */
-  static pool_allocator pool;
+  static pool_allocator pool;
  };


I'm probably going over old ground/wounds, sorry, but what's the benefit
of having this sort of pattern?  Why not simply have object_allocators
and make callers use pool.allocate () and pool.remove (x) (with
pool.remove
calling the destructor) instead of new and delete?  It feels wrong to me
to tie the data type to a particular allocation object like this.


Well the big question is what does allocate() do about construction?  if
it seems wierd for it to not call the ctor, but I'm not sure we can do a
good job of forwarding args to allocate() with C++98.


If you need non-default constructors then:

   new (pool) type (aaa, bbb)...;

doesn't seem too bad.  I agree object_allocator's allocate () should call
the constructor.



but then the pool allocator must not call placement new on the
allocated memory itself because that would result in double
construction.


But we're talking about two different methods.  The "normal" allocator
object_allocator ::allocate () would use placement new and return a
pointer to the new object while operator new (size_t, object_allocator  &)
wouldn't call placement new and would just return a pointer to the memory.


And using the pool allocator functions directly has the nice property
that you can tell when a delete/remove isn't necessary because the pool
itself is being cleared.


Well, all these cases involve a pool with static storage lifetime right?
so actually if you don't delete things in these pool they are
effectively leaked.


They might have a static storage lifetime now, but it doesn't seem like
a good idea to hard-bake that into the interface


Does that mean that operators new and delete are considered evil?


Not IMO.  Just that static load-time-initialized caches are not
necessarily a good thing.  That's effectively what the pool
allocator is.


(by saying that for
these types you should use new and delete, but for other pool-allocated
types you should use object_allocators).


Depending on what kind of pool allocator you use, you will be forced
to either call placement new or not, so the inconsistency will be
there anyway.


But how we handle argument-taking constructors is a problem that needs
to be solved for the pool-allocated objects that don't use a single
static type-specific pool.  And once we solve that, we get consistency
across all pools:

- if you want a new object and argumentless construction is OK,
   use "pool.allocate ()"

- if you want a new object and need to pass arguments to the constructor,
   use "new (pool) some_type (arg1, arg2, ...)"


Maybe I just have bad memories
from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
of state that was "obviously" static in the old days, but that needed
to become non-static to support vaguely-efficient switching between
different subtargets.  The same kind of thing is likely to happen again.
I assume things like the jit would prefer not to have new global state
with load-time construction.


I'm not sure I follow this branch of the discussion, the allocators of
any kind surely can dynamically allocated themselves?


Sure, but either (a) you keep the pools as a static part of the class
and some initialisation and finalisation code that has tendrils into
all such classes or (b) you move the static pool outside of the
class to some new (still global) state.  Explicit pool allocation,
like in the C days, gives you the option of putting the pool whereever
it needs to go without relying on the principle that you can get to
it from global state.

Thanks,
Richard



Ok Richard.

I've just finally understood your suggestions and I would suggest following:

+ I will add a new method to object_allocator that will return an
allocated memory (void*)
(w/o calling any construction)
+ object_allocator::allocate will call placement new with for a
parameterless ctor
+ I will remove all overwritten operators new/delete on e.g. et_forest, ...
+ For these classes, I will add void* operator new (size_t,
object_allocator &)


I was thinking we'd simply use allocate () for cases w

Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-09 Thread Kugan


On 08/07/15 00:41, Jeff Law wrote:
> On 07/07/2015 06:50 AM, Kugan wrote:
>>
>> Thanks for the review. I have addressed your comments above in the
>> attached patch.
>>
>> I have one question with respect to unary operation. For generic unary
>> operation with INTEGER_CST, do we skip this or do we have to perform the
>> inverse operation so that the conversion after PHI will restore it? Is
>> there any easy way to do this safely?
> I think we'd have to invert the operation -- some of which are trivial,
> such as BIT_NOT_EXPR.
> 
> NEGATE_EXPR is trivial once you filter out the cases where inversion
> will create signed overflow (ie INT_MIN and like when arg1 is an
> INTEGER_CST).
> 
> Similarly ABS_EXPR is trivial once you filter out cases where arg1 is a
> negative INTEGER_CST.
> 
> If you want to try and handle those cases, I'm certainly comfortable
> with that as a follow-up.  Obviously we'll want to testcases for them,
> including the cases where we don't want to make the transformation for
> NEGATE_EXPR and ABS_EXPR.
> 
> There may be other special cases we need to handle for other unary
> operations.  I haven't walked through the full list.

Thanks Jeff for the review.As you said later, I will skip generic unary
in this patch and work on that as an addition on top of this.


> 
>>
>> Bootstrapped and regression tested the attached patch on
>> x86-64-none-linux-gnu with no new regressions.
>>
>> Thanks,
>> Kugan
>>
>>
>> p.txt
>>
>>
>> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
>> index 92b4ab0..1d6de9b 100644
>> --- a/gcc/tree-ssa-phiopt.c
>> +++ b/gcc/tree-ssa-phiopt.c
>> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
>>   static unsigned int tree_ssa_phiopt_worker (bool, bool);
>>   static bool conditional_replacement (basic_block, basic_block,
>>edge, edge, gphi *, tree, tree);
>> +static bool factor_out_conditional_conversion (edge, edge, gphi *,
>> tree, tree);
>>   static int value_replacement (basic_block, basic_block,
>> edge, edge, gimple, tree, tree);
>>   static bool minmax_replacement (basic_block, basic_block,
>> @@ -335,6 +336,17 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool
>> do_hoist_loads)
>>node.  */
>> gcc_assert (arg0 != NULL && arg1 != NULL);
>>
>> +  if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1))
>> +{
>> +  /* Update arg0 and arg1.  */
>> +  phis = phi_nodes (bb2);
>> +  phi = single_non_singleton_phi_for_edges (phis, e1, e2);
>> +  gcc_assert (phi);
>> +  arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
>> +  arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
>> +  gcc_assert (arg0 != NULL && arg1 != NULL);
>> +}
>> +
> So small comment before this block of code indicating why we're
> recomputing these values.  Something like this perhaps:
> 
> /* factor_out_conditional_conversion may create a new PHI in BB2 and
>eliminate an existing PHI in BB2.  Recompute values that may be
>affected by that change.  */
> 
> 
> Or something along those lines.

Done.

> 
> 
>> /* Do the replacement of conditional if it can be done.  */
>> if (conditional_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
>>   cfgchanged = true;
>> @@ -410,6 +422,133 @@ replace_phi_edge_with_variable (basic_block
>> cond_block,
>> bb->index);
>>   }
>>
>> +/* PR66726: Factor conversion out of COND_EXPR.  If the arguments of
>> the PHI
>> +   stmt are CONVERT_STMT, factor out the conversion and perform the
>> conversion
>> +   to the result of PHI stmt.  */
>> +
>> +static bool
>> +factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
>> +   tree arg0, tree arg1)
>> +{
>> +  gimple arg0_def_stmt = NULL, arg1_def_stmt = NULL, new_stmt;
>> +  tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE;
>> +  tree temp, result;
>> +  gphi *newphi;
>> +  gimple_stmt_iterator gsi, gsi_for_def;
>> +  source_location locus = gimple_location (phi);
>> +  enum tree_code convert_code;
>> +
>> +  /* Handle only PHI statements with two arguments.  TODO: If all
>> + other arguments to PHI are INTEGER_CST, we can handle more
>> + than two arguments too.  */
>> +  if (gimple_phi_num_args (phi) != 2)
>> +return false;
> Similarly we can handle more than one SSA_NAME if their defining
> statement all have the same unary operation.  Might as well add that to
> the comment too.  While handling > 2 arguments is certainly possible, I
> really wonder how often those cases occur in practice.
> 
Comment added. I will work on the implementation  after this patch.

>> +
>> +  /* If arg0 is an SSA_NAME and the stmt which defines arg0 is
>> + a conversion.  */
>> +  arg0_def_stmt = SSA_NAME_DEF_STMT (arg0);
>> +  if (!is_gimple_assign (arg0_def_stmt)
>> +  || (!gimple_assign_cast_p (arg0_def_stmt)
>> +  && (get_gimple_rhs_class (gimple_assign_rhs_code (arg0_def_stmt

Re: [gomp] Move openacc vector& worker single handling to RTL

2015-07-09 Thread Nathan Sidwell
This is the patch I committed.  Bernd pointed out that I didn't need to be so 
coy about the branches in the middle of blocks at that point of the compilation 
anyway.  So we remove a couple of unneeded insn patterns.


nathan

2015-07-09  Nathan Sidwell  

Infrastructure:
* gimple.h (gimple_call_internal_unique_p): Declare.
* gimple.c (gimple_call_same_target_p): Add check for
gimple_call_internal_unique_p.
* internal-fn.c (gimple_call_internal_unique_p): New.
* omp-low.h (OACC_LOOP_MASK): Define here...
* omp-low.c (OACC_LOOP_MASK): ... not here.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts):
Add check for gimple_call_internal_unique_p.
* tree-ssa-tail-merge.c (same_succ_def::equal): Add EQ check for
the gimple statements.

Additions:
* internal-fn.def (GOACC_FORK, GOACC_JOIN): New.
* internal-fn.c (gimple_call_internal_unique_p): Add check for
IFN_GOACC_FORK, IFN_GOACC_JOIN.
(expand_GOACC_FORK, expand_GOACC_JOIN): New.
* omp-low.c (gen_oacc_fork, gen_oacc_join): New.
(expand_omp_for_static_nochunk): Add oacc loop fork & join calls.
(expand_omp_for_static_chunk): Likewise.
* config/nvptx/nvptx-protos.h (nvptx_expand_oacc_fork,
nvptx_expand_oacc_join): Declare.
* config/nvptx/nvptx.md (UNSPEC_BIT_CONV, UNSPEC_BROADCAST,
UNSPEC_BR_UNIFIED): New unspecs.
(UNSPECV_FORK, UNSPECV_FORKED, UNSPECV_JOINING, UNSPECV_JOIN): New.
(BITS, BITD): New mode iterators.
(br_true_uni, br_false_uni): New unified branches.
(nvptx_fork, nvptx_forked, nvptx_joining, nvptx_join): New insns.
(oacc_fork, oacc_join): New expand
(nvptx_broadcast): New insn.
(unpacksi2, packsi2): New insns.
(worker_load, worker_store): New insns.
(nvptx_barsync): Renamed from ...
(threadbarrier_insn): ... here.
* config/nvptx/nvptx.c: Include hash-map,h, dominance.h, cfg.h &
omp-low.h.
(worker_bcast_hwm, worker_bcast_align, worker_bcast_name,
worker_bcast_sym): New.
(nvptx_option_override): Initialize worker_bcast_sym.
(nvptx_expand_oacc_fork, nvptx_expand_oacc_join): New.
(nvptx_gen_unpack, nvptx_gen_pack): New.
(struct wcast_data_t, propagate_mask): New types.
(nvptx_gen_vcast, nvptx_gen_wcast): New.
(struct parallel): New structs.
(parallel::parallel, parallel::~parallel): Ctor & dtor.
(bb_insn_map_t): New map.
(insn_bb_t, insn_bb_vec_t): New tuple & vector of.
(nvptx_split_blocks, nvptx_discover_pre): New.
(bb_par_t, bb_par_vec_t); New tuple & vector of.
(nvptx_dump_pars,nvptx_discover_pars): New.
(nvptx_propagate): New.
(vprop_gen, nvptx_vpropagate)@ New.
(wprop_gen, nvptx_wpropagate): New.
(nvptx_wsync): New.
(nvptx_single, nvptx_skip_par): New.
(nvptx_process_pars): New.
(nvptx_neuter_pars): New.
(nvptx_reorg): Add liveness DF problem.  Call nvptx_split_blocks,
nvptx_discover_pars, nvptx_process_pars & nvptx_neuter_pars.
(nvptx_cannot_copy_insn): Check for broadcast, sync, fork & join insns.
(nvptx_file_end): Output worker broadcast array definition.

Deletions:
* builtins.c (expand_oacc_thread_barrier): Delete.
(expand_oacc_thread_broadcast): Delete.
(expand_builtin): Adjust.
* gimple.c (struct gimple_statement_omp_parallel_layout): Remove
broadcast_array member.
(gimple_omp_target_broadcast_array): Delete.
(gimple_omp_target_set_broadcast_array): Delete.
* omp-low.c (omp_region): Remove broadcast_array member.
(oacc_broadcast): Delete.
(build_oacc_threadbarrier): Delete.
(oacc_loop_needs_threadbarrier_p): Delete.
(oacc_alloc_broadcast_storage): Delete.
(find_omp_target_region): Remove call to
gimple_omp_target_broadcast_array.
(enclosing_target_region, required_predication_mask,
generate_vector_broadcast, generate_oacc_broadcast,
make_predication_test, predicate_bb, find_predicatable_bbs,
predicate_omp_regions): Delete.
(use, gen, live_in): Delete.
(populate_loop_live_in, oacc_populate_live_in_1,
oacc_populate_live_in, populate_loop_use, oacc_broadcast_1,
oacc_broadcast): Delete.
(execute_expand_omp): Remove predicate_omp_regions call.
(lower_omp_target): Remove oacc_alloc_broadcast_storage call.
Remove gimple_omp_target_set_broadcast_array call.
(make_gimple_omp_edges): Remove oacc_loop_needs_threadbarrier_p
check.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Remove
BUILT_IN_GOACC_THREADBARRIER.
* omp-builtins.def (BUILT_IN_GOACC_THREAD_BROADCAST,
BUILT_IN_GOACC_THREAD_BROADCAST_LL,
BUI

Re: [PATCH 3/7] Fix trinary op

2015-07-09 Thread Mikhail Maltsev
On 08.07.2015 13:55, Ian Lance Taylor wrote:
> I don't know of anybody who actually uses the DMGL_TYPES support.  I
> don't know why anybody would.
> 
> Ian
Thanks for pointing that out. I updated the testcases, so that now they
don't depend on DMGL_TYPES being used.

> But better still is to consider the larger context.  We want the
> demangler to work the same on all hosts, if at all possible.
> d_identifier is called exactly once.  Change it to take a parameter of
> type long.  Don't worry about changing d_source_name.
Fixed.

> Then look at the fact that d_number does not check for overflow.  We
> should consider changing d_number to limit itself to 32-bit integers,
> and to return an error indication on overflow.  From a quick glance I
> don't see any need for the demangler to support numbers larger than 32
> bits.  I think it's OK if we fail to demangle symbol names that are
> more than 2 billion characters long.
OK, but I think it'll be better to fix that in a separate patch.

The attached patch includes the changes mentioned above, there is also a
small change: I moved the comment for CHECK_DEMANGLER macro to
cp-demangle.c (it already contains a comment for other similar macros)
and replaced __builtin_abort() with abort(). For some reason I thought
that it might need an additional #include, but in reality libiberty (and
the demangler too) already use abort().
The changelog is also attached. OK for trunk after regtest?

-- 
Regards,
Mikhail Maltsev
libiberty/ChangeLog:

2015-07-10  Mikhail Maltsev  

* cp-demangle.c (d_dump): Fix syntax error.
(d_identifier): Adjust type of len to match d_source_name.
(d_expression_1): Fix out-of-bounds access.  Check code variable for
NULL before dereferencing it.
(d_find_pack): Do not recurse for FIXED_TYPE, DEFAULT_ARG and NUMBER.
(d_print_comp_inner): Add NULL pointer check.
* cp-demangle.h (d_peek_next_char): Define as inline function when
CHECK_DEMANGLER is defined.
(d_advance): Likewise.
* testsuite/demangle-expected: Add new testcases.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 2988b6b..8254100 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -93,7 +93,11 @@
CP_DEMANGLE_DEBUG
   If defined, turns on debugging mode, which prints information on
   stdout about the mangled string.  This is not generally useful.
-*/
+
+   CHECK_DEMANGLER
+  If defined, additional sanity checks will be performed.  It will
+  cause some slowdown, but will allow to catch out-of-bound access
+  errors earlier.  This macro is intended for testing and debugging.  */
 
 #if defined (_AIX) && !defined (__GNUC__)
  #pragma alloca
@@ -419,7 +423,7 @@ static struct demangle_component *d_source_name (struct 
d_info *);
 
 static long d_number (struct d_info *);
 
-static struct demangle_component *d_identifier (struct d_info *, int);
+static struct demangle_component *d_identifier (struct d_info *, long);
 
 static struct demangle_component *d_operator_name (struct d_info *);
 
@@ -715,7 +719,7 @@ d_dump (struct demangle_component *dc, int indent)
 case DEMANGLE_COMPONENT_FIXED_TYPE:
   printf ("fixed-point type, accum? %d, sat? %d\n",
   dc->u.s_fixed.accum, dc->u.s_fixed.sat);
-  d_dump (dc->u.s_fixed.length, indent + 2)
+  d_dump (dc->u.s_fixed.length, indent + 2);
   break;
 case DEMANGLE_COMPONENT_ARGLIST:
   printf ("argument list\n");
@@ -1656,7 +1660,7 @@ d_number_component (struct d_info *di)
 /* identifier ::= <(unqualified source code identifier)>  */
 
 static struct demangle_component *
-d_identifier (struct d_info *di, int len)
+d_identifier (struct d_info *di, long len)
 {
   const char *name;
 
@@ -1677,7 +1681,7 @@ d_identifier (struct d_info *di, int len)
   /* Look for something which looks like a gcc encoding of an
  anonymous namespace, and replace it with a more user friendly
  name.  */
-  if (len >= (int) ANONYMOUS_NAMESPACE_PREFIX_LEN + 2
+  if (len >= (long) ANONYMOUS_NAMESPACE_PREFIX_LEN + 2
   && memcmp (name, ANONYMOUS_NAMESPACE_PREFIX,
 ANONYMOUS_NAMESPACE_PREFIX_LEN) == 0)
 {
@@ -3166,6 +3170,8 @@ d_expression_1 (struct d_info *di)
   struct demangle_component *type = NULL;
   if (peek == 't')
type = cplus_demangle_type (di);
+  if (!d_peek_next_char (di))
+   return NULL;
   d_advance (di, 2);
   return d_make_comp (di, DEMANGLE_COMPONENT_INITIALIZER_LIST,
  type, d_exprlist (di, 'E'));
@@ -3240,6 +3246,8 @@ d_expression_1 (struct d_info *di)
struct demangle_component *left;
struct demangle_component *right;
 
+   if (code == NULL)
+ return NULL;
if (op_is_new_cast (op))
  left = cplus_demangle_type (di);
else
@@ -3267,7 +3275,9 @@ d_expression_1 (struct d_info *di)
struct demangle_componen

Re: [PATCH] Simple optimization for MASK_STORE.

2015-07-09 Thread Jeff Law

On 06/18/2015 08:32 AM, Yuri Rumyantsev wrote:

Richard,

Here is updated patch which does not include your proposal related to
the target hook deletion.
You wrote:

I still don't understand why you need the new target hook.  If we have a masked
load/store then the mask is computed by an assignment with a VEC_COND_EXPR
(in your example) and thus a test for a zero mask is expressible as just


  > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... })


or am I missing something?

Such vector compare produces vector and does not set up cc flags
required for conditional branch (GIMPLE_COND).
If we use such comparison for GIMPLE_COND we got error message, so I
used target hook which does set up cc flags aka ptest instruction and
I left this part.
I think we need to look for something better.  I really don't like the 
idea of using a target hook like this within the gimple optimizers 
unless it's absolutely necessary.


How are conditionals on vectors usually handled?  You should try to 
mimick that and report, with detail, why that's not working.


I'm also not happy with the mechanisms to determine whether or not we 
should make this transformation.  I'm willing to hash through other 
details and come back to this issue once we've got some of the other 
stuff figured out.  I guess a new flag with the target adjusting is the 
fallback if we can't come up with some reasonable way to select this on 
or off.


The reason I don't like having the target files make this kind of 
decision is it makes more gimple transformations target dependent. 
Based on the history with RTL, that ultimately leads to an unwieldy mess.


And yes, I know gimple isn't 100% target independent -- but that doesn't 
mean we should keep adding more target dependencies.  Each one we add 
needs to be well vetted.



patch.3

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 44a8624..e90de32 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -100,6 +100,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
So ideally we'll figure out why you're unable to generate a suitable 
conditional at the gimple level and the need for the x86 backend to 
generate the vector zero test will go away.  And if it does go away, we 
want those #includes to disappear.


Additionally, instead of including stringpool.h & tree-ssanames.h, 
include "ssa.h" -- as a general rule.



 static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
@@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree 
mem_vectype,

   return ix86_get_builtin (code);
 }

+/* Returns true if SOURCE type is supported by builtin ptest.
+   NAME is lhs of created ptest call.  All created statements are added
+   to GS.  */
+
+static bool
+ix86_vectorize_build_zero_vector_test (tree source, tree name,

Given the stated goal of not doing this in the target files, I'm not 
going to actually look at this routine or any of the infrastructure for 
this target hook.



diff --git a/gcc/params.def b/gcc/params.def
index 3e4ba3a..9e8de11 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
   "Maximum number of conditional store pairs that can be sunk",
   2, 0, 0)

+/* Enable inserion test on zero mask for masked stores if non-zero.  */
s/inserion/insertion/

+DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK,
+ "zero-test-for-store-mask",
+ "Enable insertion of test on zero mask for masked stores",
+ 1, 0, 1)
I'm resisting the temptation to bike-shed...  I don't like the name, but 
I don't have a better one yet.  Similarly for the short description.



+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 
"vect" } } */

+/* { dg-final { cleanup-tree-dump "vect" } } */
cleanup-tree-dump is no longer needed.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 7ba0d8f..e31479b 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt, 
gimple_stmt_iterator *gsi,

 {
   tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
   prev_stmt_info = NULL;
+  LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
   for (i = 0; i < ncopies; i++)
If we need to keep this field, can you initialize it just after the 
STMT_VINFO_GATHER_P test after the /** Transform **/ comment.  It seems 
kind-of buried and hard to find in this location.


I'm not really sure why Richi has objected to the field.  Yea we can 
re-analyze stuff in the vectorizer, but we'd be repeating a fair number 
of tests (presumably we'd refactor the start of 
vectorizable_mask_load_store to avoid duplication).  That seems more 
wasteful than another field.




{
  unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/t

Re: [PATCH] fold builtin_tolower, builtin_toupper

2015-07-09 Thread Ondřej Bílka
On Thu, Jul 09, 2015 at 03:46:08PM +0200, Richard Biener wrote:
> On Thu, 9 Jul 2015, Bernhard Reutner-Fischer wrote:
> 
> > gcc/ChangeLog
> > 
> > 2015-07-09  Bernhard Reutner-Fischer  
> > 
> > * builtins.c (fold_builtin_tolower, fold_builtin_toupper): New
> > static functions.
> > (fold_builtin_1): Handle BUILT_IN_TOLOWER, BUILT_IN_TOUPPER.
> 
> As I read it you fold tolower (X) to (X) >= target_char_set ('A')
> && (X) <= target_char_set ('Z') ? (X) - target_char_set ('A') + 
> target_char_set ('a');
> 
> I don't think this can be correct for all locales which need not
> have a lower-case character for all upper-case ones nor do
> all letters having one need to be in the range of 'A' to 'Z'.
> 
> Joseph will surely correct me if I am wrong.
> 
Thats correct as this doesn't handle toupper('č') with appropriate
single byte locale. You cannot even rely on fact that if x<128 then only
conversion is happens in 'A'..'Z' range, there are locales where that
doesn't hold and we need to check _NL_CTYPE_NONASCII_CASE. We don't
export that so you would need to check that while constructing table with 256 
entries.

Also your example is invalid as you used __builtin_tolower instead
tolower. As usual gcc builtins are slow, you will get better performance
with following.

#include 
int foo(char *c)
{
 int i;
 for(i=0;i<1000;i++)
   c[i]=tolower(c[i]);
}


As your example first problem is that it doesn't work with utf8 due
multibyte characters.

Second problem is that sse4.2 doesn't help at all as generating masks
with it is quite slow. Using just sse2 is faster here.

It could be possible to add such function to libc. For vectorization you
would need to use following after checking that _NL_CTYPE_NONASCII_CASE
didn't happen. I didn't finished or tested that, you need set up
char128, a, z to to tests 128 <= x[i], 'A' <= x[i] and x[i] <= 'Z'

void c16(char *_x, char *y)
{
__m128i x = _mm_loadu_si128(_x);

int mask = _mm_movemask_epi8(_mm_cmpgt_epi8(x, char128);
x=_mm_or_si128(x, _mm_and_si128(tolower_bit, 
_mm_and_si128 (_mm_cmpgt_epi8(a,x), _mm_cmpgt_epi8(x,z;
_mm_storeu_si128(y, x);
while (mask)
{
  int i = ffs(mask);
  y[i] = tolower(y[i]); 
  mask = mask & (mask - 1);
}
}


Re: [PATCH] S390: Support -mtune=native and -march=native.

2015-07-09 Thread DJ Delorie

> Sorry about that.  Does the attached Patch fix the problem?

Yup.  Thanks!


Re: [PATCH] Fix PR66794

2015-07-09 Thread Richard Biener
On Wed, 8 Jul 2015, H.J. Lu wrote:

> On Wed, Jul 8, 2015 at 4:46 AM, Richard Biener  wrote:
> >
> > Passes do not expect post-dominators being around and thus forget
> > to invalidate them properly.  Thus passes computing them have to
> > free them.  The patch fixes path-isolation and adds an assert so
> > this doesn't happen again.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >
> > Richard.
> >
> > 2015-07-08  Richard Biener  
> >
> > PR tree-optimization/66794
> > * passes.c (execute_function_todo): Assert that post-dominators
> > are not computed.
> > * gimple-ssa-isolate-paths.c (gimple_ssa_isolate_erroneous_paths):
> > Free post-dominators.
> >
> 
> This caused:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66807

Fixes as obvious with

2015-07-09  Richard Biener  

PR tree-optimization/66807
* tree-chkp-opt.c (chkp_opt_fini): Free post dominator info.

Index: gcc/tree-chkp-opt.c
===
--- gcc/tree-chkp-opt.c (revision 225546)
+++ gcc/tree-chkp-opt.c (working copy)
@@ -1282,6 +1282,8 @@ static void
 chkp_opt_fini (void)
 {
   chkp_fix_cfg ();
+
+  free_dominance_info (CDI_POST_DOMINATORS);
 }
 
 /* Checker optimization pass function.  */


Re: [PATCH] Limit alignment on error_mark_node variable

2015-07-09 Thread Richard Biener
On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
> There is no need to try different alignment on variable of
> error_mark_node.
>
> OK for trunk if there is no regression?

Can't we avoid calling align_variable on error_mark_node type decls
completely?  That is, punt earlier when we try to emit it.

> Thanks.
>
> H.J.
> --
> gcc/
>
> PR target/66810
> * varasm.c (align_variable): Don't try different alignment on
> variable of error_mark_node.
>
> gcc/testsuite/
>
> PR target/66810
> * gcc.target/i386/pr66810.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr66810.c | 10 ++
>  gcc/varasm.c|  3 ++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66810.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr66810.c 
> b/gcc/testsuite/gcc.target/i386/pr66810.c
> new file mode 100644
> index 000..4778b37
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr66810.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-mno-sse -mno-mmx -miamcu" } */
> +
> +int vv;
> +
> +void
> +i (void)
> +{
> +  static int a[vv]; /* { dg-error "storage size" } */
> +}
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index b69b3a3..be33cb4 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -1016,7 +1016,8 @@ align_variable (tree decl, bool dont_output_data)
>align = MAX_OFILE_ALIGNMENT;
>  }
>
> -  if (! DECL_USER_ALIGN (decl))
> +  /* Don't try different alignment for error_mark_node.  */
> +  if (! DECL_USER_ALIGN (decl) && TREE_TYPE (decl) != error_mark_node)
>  {
>  #ifdef DATA_ABI_ALIGNMENT
>unsigned int data_abi_align
> --
> 2.4.3
>


Re: darwin fix for gcc-5 (RM please)

2015-07-09 Thread Richard Biener
On Wed, Jul 8, 2015 at 6:37 PM, Mike Stump  wrote:
> I’d like to merge in the fix from https://gcc.gnu.org/PR66523 into the 
> gcc-5-branch.
>
> RM Ok?

Ok.

Richard.

> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35773:
> diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
> index 40804b8..0080299 100644
> --- a/gcc/config/darwin.c
> +++ b/gcc/config/darwin.c
> @@ -1259,6 +1259,11 @@ darwin_encode_section_info (tree decl, rtx rtl, int 
> first ATTRIBUTE_UNUSED)
>  void
>  darwin_mark_decl_preserved (const char *name)
>  {
> +  /* Actually we shouldn't mark any local symbol this way, but for now
> + this only happens with ObjC meta-data.  */
> +  if (darwin_label_is_anonymous_local_objc_name (name))
> +return;
> +
>fprintf (asm_out_file, "\t.no_dead_strip ");
>assemble_name (asm_out_file, name);
>fputc ('\n', asm_out_file);
>


Re: [gomp4] Handle Fortran deviceptr clause.

2015-07-09 Thread Thomas Schwinge
Hi Jim!

On Wed, 8 Jul 2015 13:00:16 -0500, James Norris  
wrote:
> This patch adds handling of the deviceptr clause when
> used within a Fortran program.

Please motivate such non-obvious code changes by a test case.  At least
to me, it's not at all obvious what's going on here...

Is that the occasion where you asked me about how to add a test case to
the libgomp testsuite, that'd consist of both a C and a Fortran part, so
you'd need to run both compilers?  (Unfortunately, I have not yet thought
about a solution for this.)  But even then, please at least post such
test cases with patch submissions.

> Committed to gomp-4_0-branch

> + * oacc-parallel.c (GOACC_parallel GOACC_data_start): Handle Fortran
> + deviceptr clause.

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -211,6 +211,21 @@ GOACC_parallel (int device, void (*fn) (void *),
>thr = goacc_thread ();
>acc_dev = thr->dev;
>  
> +  for (i = 0; i < mapnum; i++)
> +{
> +  unsigned short kind1 = kinds[i] & 0xff;
> +  unsigned short kind2 = kinds[i+1] & 0xff;
> +
> +  if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
> +&& (sizes[i + 1] == 0)
> +&& (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
> + {
> +   kinds[i+1] = kinds[i];
> +   sizes[i+1] = sizeof (void *);
> +   hostaddrs[i] = NULL;
> + }
> +}

Ugh.  That loop should be bounded by mapnum - 1 to avoid out-of-bounds
array accesses.  And, such "voodoo" code constructs do need a comment,
please.  Why does this processing need to happen at run-time, in libgomp?
Should something else be done during OMP lowering, for example?

> @@ -263,8 +278,13 @@ GOACC_parallel (int device, void (*fn) (void *),
>  
>devaddrs = gomp_alloca (sizeof (void *) * mapnum);
>for (i = 0; i < mapnum; i++)
> -devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
> - + tgt->list[i]->tgt_offset);
> +{
> +  if (tgt->list[i] != NULL)
> + devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
> + + tgt->list[i]->tgt_offset);
> +  else
> + devaddrs[i] = NULL;
> +}
>  
>acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, 
> kinds,
> num_gangs, num_workers, vector_length, async,

> @@ -305,6 +326,21 @@ GOACC_data_start (int device, size_t mapnum,
>struct goacc_thread *thr = goacc_thread ();
>struct gomp_device_descr *acc_dev = thr->dev;
>  
> +  for (i = 0; i < mapnum; i++)
> +{
> +  unsigned short kind1 = kinds[i] & 0xff;
> +  unsigned short kind2 = kinds[i+1] & 0xff;
> +
> +  if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
> +&& (sizes[i + 1] == 0)
> +&& (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
> + {
> +   kinds[i+1] = kinds[i];
> +   sizes[i+1] = sizeof (void *);
> +   hostaddrs[i] = NULL;
> + }
> +}

The same code being repeated here for the OpenACC data construct --
should this be moved to a common place?


Grüße,
 Thomas


signature.asc
Description: PGP signature


[gomp4.1] Support C++ "this" in OpenMP directives

2015-07-09 Thread Jakub Jelinek
On Wed, Jun 10, 2015 at 11:38:25AM +, Joseph Myers wrote:
> This patch, for gomp-4_0-branch, adds support for C++ "this" in
> OpenACC directives.  (This patch does not do anything to handle OpenMP
> differently from OpenACC; that - bug 66053 - will need to be resolved
> for mainline, either deciding these cases should be accepted for
> OpenMP or making the parsing only accept them in OpenACC directives
> and not OpenMP ones.)

So, OpenMP 4.1 is going to accept this only in linear/uniform/aligned
clauses and only in #pragma omp declare simd construct.

This patch implements this, but as there is no oacc argument to
finish_omp_clauses on the trunk/gomp-4_1-branch, I haven't used that
to allow it in OpenACC directives, so that will need to be resolved
when the branches are merged to trunk.

Note I'm afraid your gomp-4_0-branch patch will not do the right thing
if you use this outside of non-static member functions (which is why I've
added the finish_this_expr for diagnostics).

Anyway, committed to gomp4.1 branch.

2015-07-09  Jakub Jelinek  

* parser.c (cp_parser_omp_var_list_no_open): Parse this.
* cp-tree.h (finish_omp_declare_simd_methods): New prototype.
* semantics.c (handle_omp_array_sections_1): Disallow this based
array sections for OpenMP.
(finish_omp_declare_simd_methods): New function.
(finish_omp_clauses): Don't attempt to adjust linear step of
this if it points to TYPE_BEING_DEFINED.  Disallow this in
all clauses expecting variable lists, except for declare simd
linear/uniform/aligned clauses.
(finish_struct_1): Call finish_omp_declare_simd_methods.

* g++.dg/vect/simd-clone-2.cc: New test.
* g++.dg/vect/simd-clone-2.h: New file.
* g++.dg/vect/simd-clone-3.cc: New test.
* g++.dg/vect/simd-clone-4.cc: New test.
* g++.dg/vect/simd-clone-4.h: New file.
* g++.dg/vect/simd-clone-5.cc: New test.
* g++.dg/gomp/this-1.C: New test.
* g++.dg/gomp/this-2.C: New test.

--- gcc/cp/parser.c.jj  2015-07-06 13:42:22.0 +0200
+++ gcc/cp/parser.c 2015-07-08 11:33:17.255688547 +0200
@@ -27881,15 +27881,26 @@ cp_parser_omp_var_list_no_open (cp_parse
   tree name, decl;
 
   token = cp_lexer_peek_token (parser->lexer);
-  name = cp_parser_id_expression (parser, /*template_p=*/false,
- /*check_dependency_p=*/true,
- /*template_p=*/NULL,
- /*declarator_p=*/false,
- /*optional_p=*/false);
-  if (name == error_mark_node)
-   goto skip_comma;
+  if (current_class_ptr && cp_parser_is_keyword (token, RID_THIS))
+   {
+ decl = finish_this_expr ();
+ if (TREE_CODE (decl) == NON_LVALUE_EXPR
+ || CONVERT_EXPR_P (decl))
+   decl = TREE_OPERAND (decl, 0);
+ cp_lexer_consume_token (parser->lexer);
+   }
+  else
+   {
+ name = cp_parser_id_expression (parser, /*template_p=*/false,
+ /*check_dependency_p=*/true,
+ /*template_p=*/NULL,
+ /*declarator_p=*/false,
+ /*optional_p=*/false);
+ if (name == error_mark_node)
+   goto skip_comma;
 
-  decl = cp_parser_lookup_name_simple (parser, name, token->location);
+ decl = cp_parser_lookup_name_simple (parser, name, token->location);
+   }
   if (decl == error_mark_node)
cp_parser_name_lookup_error (parser, name, decl, NLE_NULL,
 token->location);
--- gcc/cp/cp-tree.h.jj 2015-07-01 12:50:49.0 +0200
+++ gcc/cp/cp-tree.h2015-07-08 17:27:02.787472523 +0200
@@ -5998,6 +5998,7 @@ extern void note_decl_for_pch (tree);
 extern tree omp_reduction_id   (enum tree_code, tree, tree);
 extern tree cp_remove_omp_priv_cleanup_stmt(tree *, int *, void *);
 extern void cp_check_omp_declare_reduction (tree);
+extern void finish_omp_declare_simd_methods(tree);
 extern tree finish_omp_clauses (tree, bool, bool = false);
 extern tree push_omp_privatization_clauses (bool);
 extern void pop_omp_privatization_clauses  (tree);
--- gcc/cp/semantics.c.jj   2015-07-06 19:20:51.0 +0200
+++ gcc/cp/semantics.c  2015-07-08 19:26:01.837813436 +0200
@@ -4390,6 +4390,15 @@ handle_omp_array_sections_1 (tree c, tre
  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  return error_mark_node;
}
+  else if (TREE_CODE (t) == PARM_DECL
+  && DECL_ARTIFICIAL (t)
+  && DECL_NAME (t) == this_identifier)
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "% allowed in OpenMP only in %"
+   " clauses");
+ retu

Re: [gomp4] Handle deviceptr from an outer directive

2015-07-09 Thread Thomas Schwinge
Hi Jim!

On Tue, 7 Jul 2015 10:19:39 -0500, James Norris  
wrote:
> This patch fixes an issue where the deviceptr clause in an outer
> directive was being ignored during implicit variable definition
> on a nested directive.

> Committed to gomp-4_0-branch.

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c

I'm sorry, have not yet tried very hard; but I can't claim to understand
the logic here -- why is the OpenACC deviceptr clause so special?  :-|

> @@ -116,6 +116,9 @@ enum gimplify_omp_var_data
>/* Gang-local OpenACC variable.  */
>GOVD_GANGLOCAL = (1 << 16),
>  
> +  /* OpenACC deviceptr clause.  */
> +  GOVD_USE_DEVPTR = (1 << 17),
> +
>GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
>  | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
>  | GOVD_LOCAL)
> @@ -6274,7 +6277,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   }
> break;
>   }
> +
> flags = GOVD_MAP | GOVD_EXPLICIT;
> +   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
> + flags |= GOVD_USE_DEVPTR;
> goto do_add;
>  
>   case OMP_CLAUSE_DEPEND:
> @@ -6662,6 +6668,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
> *data)
>  : (flags & GOVD_FORCE_MAP
> ? GOMP_MAP_FORCE_TOFROM
> : GOMP_MAP_TOFROM));
> +
>if (DECL_SIZE (decl)
> && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
>   {
> @@ -6687,7 +6694,17 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
> *data)
> OMP_CLAUSE_CHAIN (clause) = nc;
>   }
>else
> - OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl);
> + {
> +   if (gimplify_omp_ctxp->outer_context)
> + {
> +   struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp->outer_context;
> +   splay_tree_node on
> + = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
> +   if (on && (on->value & GOVD_USE_DEVPTR))
> + OMP_CLAUSE_SET_MAP_KIND (clause, GOMP_MAP_FORCE_PRESENT);
> + }
> +   OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl);
> + }
>  }
>if (code == OMP_CLAUSE_FIRSTPRIVATE && (flags & GOVD_LASTPRIVATE) != 0)
>  {

The patch that you committed (r225518) also includes a test case
(thanks!) as follows:

--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
@@ -0,0 +1,12 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+void
+subr (int *a)
+{
+#pragma acc data deviceptr (a)
+#pragma acc parallel
+  a[0] += 1.0;
+}
+
+/* { dg-final { scan-tree-dump-times "#pragma omp target 
oacc_parallel.*map\\(force_present:a \\\[len: 8\\\]\\)" 1 "gimple" } } */
+/* { dg-final { cleanup-tree-dump "gimple" } } */

That len: 8 is obviously valid only for 64-bit configurations, so will
cause a FAIL on anything else.


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-09 Thread Richard Biener
On Wed, 8 Jul 2015, Marek Polacek wrote:

> On Fri, Jul 03, 2015 at 03:41:29PM +0200, Richard Biener wrote:
> > On Fri, 3 Jul 2015, Marek Polacek wrote:
> > 
> > > This patch implements a new pass, called laddress, which deals with
> > > lowering ADDR_EXPR assignments.  Such lowering ought to help the
> > > vectorizer, but it also could expose more CSE opportunities, maybe
> > > help reassoc, etc.  It's only active when optimize != 0.
> > > 
> > > So e.g.
> > >   _1 = (sizetype) i_9;
> > >   _7 = _1 * 4;
> > >   _4 = &b + _7;
> > > instead of
> > >   _4 = &b[i_9];
> > > 
> > > This triggered 14105 times during the regtest and 6392 times during
> > > the bootstrap.
> > > 
> > > The fallout (at least on x86_64) is surprisingly small, i.e. none, just
> > > gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due
> > > to a bug in the vectorizer.  Jakub has a patch and knows the details.
> > > As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants
> > > (that was the motivation of this pass).
> > > 
> > > This doesn't introduce any kind of verification nor PROP_laddress.
> > > Don't know if we want that, but hopefully it can be done as a follow-up
> > > if we do.
> > 
> > Yes.  At the moment nothing requires lowered address form so this is
> > merely an optimization (and not a bug for some later pass to
> > re-introduce un-lowered non-invariant addresses).  I can imagine
> > that for example IVOPTs could be simplified if we didn't have this
> > kind of addresses in the IL.
> > 
> > > Do we want to move some optimizations into this new pass, e.g.
> > > from fwprop?
> > 
> > I think we might want to re-try forwprop_into_addr_expr before lowering
> > the address.  Well, but that's maybe just over-cautionous.
> > 
> > > Thoughts?
> > 
> > Please move the pass before crited, crited and pre are supposed to
> > go together.
> 
> Done.
>  
> > Otherwise looks ok to me.
> 
> I renamed the file to gimple-laddress.c then and adjusted the timevar.
> Another change is that for x86_64 we don't need -mavx at all, so I dropped
> that.  The test is now restricted to x86_64/i?86; on aarch64/ppc64 we aren't
> able to vectorize all the functions.
> 
> Bootstrapped/regtested on x86_64-linux + ppc64-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2015-07-08  Marek Polacek  
> 
>   PR tree-optimization/66718
>   * Makefile.in (OBJS): Add gimple-laddress.o. 
>   * passes.def: Schedule pass_laddress.
>   * timevar.def (DEFTIMEVAR): Add TV_GIMPLE_LADDRESS.
>   * tree-pass.h (make_pass_laddress): Declare.
>   * gimple-laddress.c: New file.
> 
>   * gcc.dg/vect/vect-126.c: New test.
> 
> diff --git gcc/Makefile.in gcc/Makefile.in
> index 89eda96..1817025 100644
> --- gcc/Makefile.in
> +++ gcc/Makefile.in
> @@ -1255,6 +1255,7 @@ OBJS = \
>   gimple-expr.o \
>   gimple-iterator.o \
>   gimple-fold.o \
> + gimple-laddress.o \
>   gimple-low.o \
>   gimple-match.o \
>   generic-match.o \
> diff --git gcc/gimple-laddress.c gcc/gimple-laddress.c
> index e69de29..c8036b9 100644
> --- gcc/gimple-laddress.c
> +++ gcc/gimple-laddress.c
> @@ -0,0 +1,137 @@
> +/* Lower and optimize address expressions.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +   Contributed by Marek Polacek 
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "alias.h"
> +#include "predict.h"
> +#include "tm.h"
> +#include "function.h"
> +#include "dominance.h"
> +#include "cfg.h"
> +#include "basic-block.h"
> +#include "tree-ssa-alias.h"
> +#include "symtab.h"
> +#include "tree.h"
> +#include "stringpool.h"
> +#include "tree-ssanames.h"
> +#include "fold-const.h"
> +#include "gimple-expr.h"
> +#include "gimple.h"
> +#include "gimplify.h"
> +#include "gimple-iterator.h"
> +#include "gimplify-me.h"
> +#include "tree-pass.h"
> +
> +
> +namespace {
> +
> +const pass_data pass_data_laddress =
> +{
> +  GIMPLE_PASS, /* type */
> +  "laddress", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_GIMPLE_LADDRESS, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_laddress : public gimple_opt_pass
> +{
> +public:
>

Re: [PR25530] Convert (unsigned t / 2) * 2 into (unsigned t & ~1)

2015-07-09 Thread Richard Biener
On Thu, Jul 9, 2015 at 8:35 AM, Paolo Bonzini  wrote:
>
>
> On 07/07/2015 11:08, Richard Biener wrote:
>> Also I am not sure ceil_div and floor_div can be handled this way.
>> (5 /[ceil] 2) * 2 == 6 but you compute it as 4.  So I am only convinced
>> trunc_div works this way.
>
> Of course also floor_div for unsigned arguments.
>
> For signed arguments, ceil_div works if the operand is known-negative
> and floor_div if known-positive.
>
> Perhaps you could optimize these cases to trunc_div first (or maybe it's
> already done...)?

fold-const.c does this already, yes.

Richard.

> Paolo


Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-09 Thread Marek Polacek
On Thu, Jul 09, 2015 at 10:53:30AM +0200, Richard Biener wrote:
> > I renamed the file to gimple-laddress.c then and adjusted the timevar.
> > Another change is that for x86_64 we don't need -mavx at all, so I dropped
> > that.  The test is now restricted to x86_64/i?86; on aarch64/ppc64 we aren't
> > able to vectorize all the functions.
> > 
> > Bootstrapped/regtested on x86_64-linux + ppc64-linux, ok for trunk?
> 
> Ok.

Thanks, committed.

Jakub, you can proceed with your third patch now ;).

Marek


Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 11:04:38AM +0200, Marek Polacek wrote:
> On Thu, Jul 09, 2015 at 10:53:30AM +0200, Richard Biener wrote:
> > > I renamed the file to gimple-laddress.c then and adjusted the timevar.
> > > Another change is that for x86_64 we don't need -mavx at all, so I dropped
> > > that.  The test is now restricted to x86_64/i?86; on aarch64/ppc64 we 
> > > aren't
> > > able to vectorize all the functions.
> > > 
> > > Bootstrapped/regtested on x86_64-linux + ppc64-linux, ok for trunk?
> > 
> > Ok.
> 
> Thanks, committed.
> 
> Jakub, you can proceed with your third patch now ;).

That one has not been yet reviewed.
Richard, ping on: http://gcc.gnu.org/ml/gcc-patches/2015-07/msg00251.html

Jakub


Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-09 Thread Richard Biener
On Sat, 4 Jul 2015, Jakub Jelinek wrote:

> On Fri, Jul 03, 2015 at 04:06:26PM +0200, Jakub Jelinek wrote:
> > In the pr59984.c testcase, with Marek's patch and this patch, one loop in
> > test is already vectorized (the ICE was on the other one), I'll work on
> > recognizing multiples of GOMP_SIMD_LANE () as linear next, so that we
> > vectorize also the loop with bar.  Without Marek's patch we weren't
> 
> And here is a patch to vectorize everything in pr59984.c.
> For the purpose of elemental functions, addresses of variables in
> simd magic arrays (e.g. private, linear, reduction etc.) are linear,
> but they aren't linear in the whole loop, just are linear within the
> vectorization factor.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> This one really depends on Marek's patch, doesn't make sense to commit
> before it goes in.
> 
> 2015-07-04  Jakub Jelinek  
> 
>   PR tree-optimization/66718
>   * tree-vect-stmts.c (struct simd_call_arg_info): Add simd_lane_linear
>   field.
>   (vectorizable_simd_clone_call): Support using linear arguments for
>   addresses of arrays elements indexed by GOMP_SIMD_LANE result.
> 
> --- gcc/tree-vect-stmts.c.jj  2015-07-03 14:06:28.0 +0200
> +++ gcc/tree-vect-stmts.c 2015-07-03 20:43:42.796573076 +0200
> @@ -2618,6 +2618,7 @@ struct simd_call_arg_info
>enum vect_def_type dt;
>HOST_WIDE_INT linear_step;
>unsigned int align;
> +  bool simd_lane_linear;
>  };
>  
>  /* Function vectorizable_simd_clone_call.
> @@ -2702,6 +2703,7 @@ vectorizable_simd_clone_call (gimple stm
>thisarginfo.linear_step = 0;
>thisarginfo.align = 0;
>thisarginfo.op = NULL_TREE;
> +  thisarginfo.simd_lane_linear = false;
>  
>op = gimple_call_arg (stmt, i);
>if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo,
> @@ -2724,21 +2726,24 @@ vectorizable_simd_clone_call (gimple stm
>  
>/* For linear arguments, the analyze phase should have saved
>the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
> -  if (i * 2 + 3 <= STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length ()
> -   && STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2])
> +  if (i * 3 + 4 <= STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length ()
> +   && STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2])
>   {
> gcc_assert (vec_stmt);
> thisarginfo.linear_step
> - = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2]);
> + = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2]);
> thisarginfo.op
> - = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 1];
> + = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 1];
> +   thisarginfo.simd_lane_linear
> + = (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 3]
> +== boolean_true_node);
> /* If loop has been peeled for alignment, we need to adjust it.  */
> tree n1 = LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
> tree n2 = LOOP_VINFO_NITERS (loop_vinfo);
> -   if (n1 != n2)
> +   if (n1 != n2 && !thisarginfo.simd_lane_linear)
>   {
> tree bias = fold_build2 (MINUS_EXPR, TREE_TYPE (n1), n1, n2);
> -   tree step = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2];
> +   tree step = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2];
> tree opt = TREE_TYPE (thisarginfo.op);
> bias = fold_convert (TREE_TYPE (step), bias);
> bias = fold_build2 (MULT_EXPR, TREE_TYPE (step), bias, step);
> @@ -2764,6 +2769,93 @@ vectorizable_simd_clone_call (gimple stm
>   || thisarginfo.dt == vect_external_def)
>  && POINTER_TYPE_P (TREE_TYPE (op)))
>   thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
> +  /* Addresses of array elements indexed by GOMP_SIMD_LANE are
> +  linear too.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (op))
> +   && !thisarginfo.linear_step
> +   && !vec_stmt
> +   && thisarginfo.dt != vect_constant_def
> +   && thisarginfo.dt != vect_external_def
> +   && loop_vinfo
> +   && !slp_node
> +   && TREE_CODE (op) == SSA_NAME)
> + {
> +   def_stmt = SSA_NAME_DEF_STMT (op);
> +   if (is_gimple_assign (def_stmt)
> +   && gimple_assign_rhs_code (def_stmt) == POINTER_PLUS_EXPR
> +   && is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt)))
> + {
> +   tree base = gimple_assign_rhs1 (def_stmt);
> +   HOST_WIDE_INT linear_step = 0;
> +   tree v = gimple_assign_rhs2 (def_stmt);
> +   while (v && TREE_CODE (v) == SSA_NAME)

Hmm, this looks like it could be split out to a function.

> + {
> +   tree t;
> +   def_stmt = SSA_NAME_DEF_STMT (v);
> +   if (is_gimple_assign (def_stmt))
> + switch (gimple_assign_rhs_code (def_stmt))
> +   {
> + 

Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-07-09 Thread Tom de Vries

On 09/07/15 05:33, Jeff Law wrote:

On 07/07/2015 09:58 AM, Tom de Vries wrote:
[Big snip]



0001-Add-rewrite_virtuals_into_loop_closed_ssa.patch


Add rewrite_virtuals_into_loop_closed_ssa

2015-07-07  Tom de Vries

* tree-cfg.c (get_virtual_phi): New function.
* tree-cfg.h (get_virtual_phi): Declare.
* tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs)
(rewrite_virtuals_into_loop_closed_ssa): New function.
* tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa):
Declare.
* tree-parloops.c (replace_uses_in_bbs_by): Remove.
(transform_to_exit_first_loop_alt): Use
rewrite_virtuals_into_loop_closed_ssa.



So how is the testcase in 56113 affected by this change (compile-time
and memory usage?)  I didn't see any discussion of that in the thread
from June.



Hi Jeff,

rewrite_virtuals_into_loop_closed_ssa is only active for loops that are 
transformed by parloops (which is off by default).


The example in PR56113 at n == 1000 only contains one loop, with 
!single_dom_exit, so if parloops is switched on, it doesn't transform 
the loop.


Thanks,
- Tom



Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-09 Thread Jakub Jelinek
On Thu, Jul 09, 2015 at 11:14:01AM +0200, Richard Biener wrote:
> > @@ -2764,6 +2769,93 @@ vectorizable_simd_clone_call (gimple stm
> > || thisarginfo.dt == vect_external_def)
> >&& POINTER_TYPE_P (TREE_TYPE (op)))
> > thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
> > +  /* Addresses of array elements indexed by GOMP_SIMD_LANE are
> > +linear too.  */
> > +  if (POINTER_TYPE_P (TREE_TYPE (op))
> > + && !thisarginfo.linear_step
> > + && !vec_stmt
> > + && thisarginfo.dt != vect_constant_def
> > + && thisarginfo.dt != vect_external_def
> > + && loop_vinfo
> > + && !slp_node
> > + && TREE_CODE (op) == SSA_NAME)
> > +   {
> > + def_stmt = SSA_NAME_DEF_STMT (op);
> > + if (is_gimple_assign (def_stmt)
> > + && gimple_assign_rhs_code (def_stmt) == POINTER_PLUS_EXPR
> > + && is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt)))
> > +   {
> > + tree base = gimple_assign_rhs1 (def_stmt);
> > + HOST_WIDE_INT linear_step = 0;
> > + tree v = gimple_assign_rhs2 (def_stmt);
> > + while (v && TREE_CODE (v) == SSA_NAME)
> 
> Hmm, this looks like it could be split out to a function.

Ok, will try that.

> > +   base = fold_build2 (POINTER_PLUS_EXPR,
> > +   TREE_TYPE (base), base, t);
> > +   v = gimple_assign_rhs1 (def_stmt);
> > +   continue;
> 
> what about MINUS_EXPR?  I suppose you only handle stuff that's
> produced by get_inner_reference here?

This is meant to be used with the "omp simd array" vars, and what
is produced by Marek's pass from those, so something that originally
in the code are simple scalar or aggregate
vars, I'm only handling + with a constant bias, wouldn't subtraction
be folded into addition of negative constant anyway?
This code isn't for correctness, but optimization, if it doesn't detect
something is linear, even when it is, it can simply not be vectorized or
vectorized less efficiently through some other simd clone.
> 
> > + case MULT_EXPR:
> > +   t = gimple_assign_rhs2 (def_stmt);
> > +   if (linear_step
> > +   || !tree_fits_shwi_p (t)
> > +   || integer_zerop (t))
> > + {
> 
> So no VLAs... (I understand this code is for correctness?)

VLAs aren't added as "omp simd array" ever, I punt on them early
(set safelen to 1).

Jakub


[C++ Patch, obvious?] Rename warn_args_num

2015-07-09 Thread Paolo Carlini

Hi,

given that the current warn_args_num only outputs errors, never 
warnings, I think we really want to rename it. Unless it's the wrong 
time for such changes...?!?


Thanks,
Paolo.

///
2015-07-09  Paolo Carlini  

* typeck.c (warn_args_num): Rename to error_args_num.
(convert_arguments): Adjust calls.
Index: typeck.c
===
--- typeck.c(revision 225556)
+++ typeck.c(working copy)
@@ -62,7 +62,7 @@ static void casts_away_constness_r (tree *, tree *
 static bool casts_away_constness (tree, tree, tsubst_flags_t);
 static bool maybe_warn_about_returning_address_of_local (tree);
 static tree lookup_destructor (tree, tree, tree, tsubst_flags_t);
-static void warn_args_num (location_t, tree, bool);
+static void error_args_num (location_t, tree, bool);
 static int convert_arguments (tree, vec **, tree, int,
   tsubst_flags_t);
 
@@ -3559,10 +3559,10 @@ cp_build_function_call_vec (tree function, vec
{
   if (complain & tf_error)
 {
- warn_args_num (input_location, fndecl, /*too_many_p=*/true);
+ error_args_num (input_location, fndecl, /*too_many_p=*/true);
   return i;
 }
   else
@@ -3749,7 +3749,7 @@ convert_arguments (tree typelist, vec
   else
{
   if (complain & tf_error)
-   warn_args_num (input_location, fndecl, /*too_many_p=*/false);
+   error_args_num (input_location, fndecl, /*too_many_p=*/false);
  return -1;
}
 }


Re: move a * (1 << b) -> a << b pattern from fold-const.c to match.pd

2015-07-09 Thread Marek Polacek
On Tue, Jul 07, 2015 at 07:47:50AM +0200, Marc Glisse wrote:
> On Tue, 7 Jul 2015, Prathamesh Kulkarni wrote:
> 
> >+/* a * (1 << b) -> a << b */
> >+(simplify
> >+  (mult:c @a (lshift integer_onep @b))
> >+  (if (!FLOAT_TYPE_P (type))
> >+(lshift @a @b)))

Just a nit: the last line is wrongly formatted.

> The test FLOAT_TYPE_P seems unnecessary, 'type' is (up to a useless
> conversion) the result of a shift, so integer, fixed-point or vector. Its
> lhs is integer_onep, which rules out fixed-point.

Right.

> (I think it is the first pattern using @letter and not @number)

Yea, I think we should be consistent and use @0 and @1 here.

Marek


Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-09 Thread Alan Lawrence

Jeff Law wrote:

On 07/08/2015 03:43 AM, Richard Biener wrote:

On Wed, Jul 8, 2015 at 12:07 AM, Jeff Law  wrote:

On 07/07/2015 06:37 AM, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes
FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from
previous patch.

15_native_interpret_real.patch


commit e2e7ca148960a82fc88128820f17e7cbd14173cb
Author: Alan Lawrence
Date:   Thu Apr 9 10:54:40 2015 +0100

  Fix native_interpret_real for HFmode floats on Bigendian with
UNITS_PER_WORD>=4

  (with missing space)

OK with ChangeLog in proper form.

Err - but now offset can become negative?  Shouldn't it rather error out
before as it requires at least 4 bytes for big-endian?

I managed to convince myself the value wouldn't be negative when reviewing.


That said - the whole thing looks it doesn't expect GET_MODE_SIZE < 4
and your "fix" is just very obfuscated (if it really is a fix).
While I couldn't convince myself the function as a whole was prepared 
for smaller objects, I don't think Alan's patch made things worse.  One 
could argue the whole bloody thing ought to be rewritten though.


I'd also managed to convince myself the other instances of "3" weren't 
problematical.


jeff



In response to Richard's comments, may I propose the attached patch instead?

I used the term "long" because of the earlier comment:
  /* There are always 32 bits in each long, no matter the size of
 the hosts long.  We handle floating point representations with
 up to 192 bits.  */
with which I really don't think I want to mess at this point ;). However, I'd be 
happy to change my use of "long" to "32 bits", "4 bytes", or "group of 4" 
instead if one of those was preferable!


Bootstrapped + check-gcc on x86_64 (no change); cross-tested on 
aarch64_be-none-elf (where it fixes the advsimd-intrinsics float16 test)


gcc/ChangeLog (as before):

* fold-const.c (native_interpret_real): Fix HFmode for bigendian where
UNITS_PER_WORD >= 4.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index e61d9463462746f45c96a0e7a154fb45ed026f43..518780e6cd2724002f0c7a805aa7742b6374600c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7592,7 +7592,6 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
 {
   machine_mode mode = TYPE_MODE (type);
   int total_bytes = GET_MODE_SIZE (mode);
-  int byte, offset, word, words, bitpos;
   unsigned char value;
   /* There are always 32 bits in each long, no matter the size of
  the hosts long.  We handle floating point representations with
@@ -7603,16 +7602,18 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
   total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
   if (total_bytes > len || total_bytes > 24)
 return NULL_TREE;
-  words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
+  int words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
 
   memset (tmp, 0, sizeof (tmp));
-  for (bitpos = 0; bitpos < total_bytes * BITS_PER_UNIT;
+  for (int bitpos = 0; bitpos < total_bytes * BITS_PER_UNIT;
bitpos += BITS_PER_UNIT)
 {
-  byte = (bitpos / BITS_PER_UNIT) & 3;
+  /* Both OFFSET and BYTE index within a long;
+	 bitpos indexes the whole float.  */
+  int offset, byte = (bitpos / BITS_PER_UNIT) & 3;
   if (UNITS_PER_WORD < 4)
 	{
-	  word = byte / UNITS_PER_WORD;
+	  int word = byte / UNITS_PER_WORD;
 	  if (WORDS_BIG_ENDIAN)
 	word = (words - 1) - word;
 	  offset = word * UNITS_PER_WORD;
@@ -7622,7 +7623,16 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
 	offset += byte % UNITS_PER_WORD;
 	}
   else
-	offset = BYTES_BIG_ENDIAN ? 3 - byte : byte;
+	{
+	  offset = byte;
+	  if (BYTES_BIG_ENDIAN)
+	{
+	  /* Reverse bytes within each long, or within the entire float
+		 if it's smaller than a long (for HFmode).  */
+	  offset = MIN (3, total_bytes - 1) - offset;
+	  gcc_assert (offset >= 0);
+	}
+	}
   value = ptr[offset + ((bitpos / BITS_PER_UNIT) & ~3)];
 
   tmp[bitpos / 32] |= (unsigned long)value << (bitpos & 31);


[PATCH GCC]Udate best_cost for start cand if it has lower overall cost in iv set narrowing

2015-07-09 Thread Bin Cheng
Hi,
When I going through the code, I spot this minor issue.  When
start_cand/orig_cand/third_cand have overall cost in order like "start_cand
< third_cand < orig_cand", GCC chooses the third_cand instead of start_cand
because we haven't set best_cost for start_cand.  This is an obvious fix to
it.

So is it OK?


2015-07-08  Bin Cheng  

* tree-ssa-loop-ivopts.c (iv_ca_narrow): Update best_cost
if start candidate has lower cost.
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 225531)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -6064,6 +6066,13 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_
   best_cost = iv_ca_cost (ivs);
   /* Start narrowing with START.  */
   new_cp = get_use_iv_cost (data, use, start);
+  if (new_cp != NULL)
+   {
+ iv_ca_set_cp (data, ivs, use, new_cp);
+ acost = iv_ca_cost (ivs);
+ if (compare_costs (acost, best_cost) < 0)
+   best_cost = acost;
+   }
 
   if (data->consider_all_candidates)
{


[PATCH GCC]Improve auto-increment addressing mode support in IVO by refactoring add candiate logic.

2015-07-09 Thread Bin Cheng
Hi,
This patch refactors codes adding iv candidates in IVO.  It renames
functions using straightforward names, it also factors function call to
add_autoinc_candidates from add_candidate to add_iv_candidate_for_use.
Before this patch, we tried to add autoinc candidates for every call to
add_candidate.  This has two issues: A) wasting compilation time.  B) adding
useless auto-inc candidates for iv's which have ZERO base.  These autoinc
candidates are useless because targets generally only support auto-increment
addressing mode with base register pointing to memory object, also because
IVO has its prerequisite conditions on autoinc candidates, these cands are
actually ignored later.

I collected instrumental data, and < 85% candidates are added now when
compiling spec2k on Cortex-a15/thumb.

Also this patch could benefit performance for targets supporting autoinc
addressing mode, because with fewer candidates, IVO algorithm might work
better.
I collected spec2k perf data on Cortex-a15.  It shows several cases in
int/fp suites are improved.  Overall, both spec2k geo-mean of int/fp are
both improved by ~0.5%.  And no regression.

Though I haven't turned on autoinc support in IVO for aarch64, I would
expect this patch will pave the way for that.

So is it OK?

Thanks,
bin

2015-07-08  Bin Cheng  

* tree-ssa-loop-ivopts.c (add_candidate): Remove call to
add_autoinc_candidates.
(add_iv_candidate_for_biv): Rename to add_iv_candidate_for_biv.
(add_iv_candidate_for_biv): Rename from add_iv_candidate_for_biv.
(add_old_ivs_candidates): Rename to add_iv_candidate_for_bivs.
(add_iv_candidate_for_bivs): Rename from add_old_ivs_candidates.
Call new function.
(add_iv_value_candidates): Rename to add_iv_candidate_for_use.
(add_iv_candidate_for_use): Rename from add_iv_value_candidates.
Remove parameter struct iv*.  Call add_autoinc_candidates here.
(add_derived_ivs_candidates): Rename to add_iv_candidate_for_uses.
(add_iv_candidate_for_uses): Rename from add_derived_ivs_candidates.
Call new function.
(find_iv_candidates): Call new functions.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 225531)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -2787,7 +2787,8 @@ add_autoinc_candidates (struct ivopts_data *data,
 
 /* Adds a candidate BASE + STEP * i.  Important field is set to IMPORTANT and
position to POS.  If USE is not NULL, the candidate is set as related to
-   it.  The candidate computation is scheduled on all available positions.  */
+   it.  The candidate computation is scheduled before exit condition and at
+   the end of loop.  */
 
 static void
 add_candidate (struct ivopts_data *data,
@@ -2800,9 +2801,6 @@ add_candidate (struct ivopts_data *data,
   if (ip_end_pos (data->current_loop)
   && allow_ip_end_pos_p (data->current_loop))
 add_candidate_1 (data, base, step, important, IP_END, use, NULL);
-
-  if (use != NULL && use->type == USE_ADDRESS)
-add_autoinc_candidates (data, base, step, important, use);
 }
 
 /* Adds standard iv candidates.  */
@@ -2831,7 +2829,7 @@ add_standard_iv_candidates (struct ivopts_data *da
 /* Adds candidates bases on the old induction variable IV.  */
 
 static void
-add_old_iv_candidates (struct ivopts_data *data, struct iv *iv)
+add_iv_candidate_for_biv (struct ivopts_data *data, struct iv *iv)
 {
   gimple phi;
   tree def;
@@ -2871,7 +2869,7 @@ static void
 /* Adds candidates based on the old induction variables.  */
 
 static void
-add_old_ivs_candidates (struct ivopts_data *data)
+add_iv_candidate_for_bivs (struct ivopts_data *data)
 {
   unsigned i;
   struct iv *iv;
@@ -2881,19 +2879,19 @@ static void
 {
   iv = ver_info (data, i)->iv;
   if (iv && iv->biv_p && !integer_zerop (iv->step))
-   add_old_iv_candidates (data, iv);
+   add_iv_candidate_for_biv (data, iv);
 }
 }
 
-/* Adds candidates based on the value of the induction variable IV and USE.  */
+/* Adds candidates based on the value of USE's iv.  */
 
 static void
-add_iv_value_candidates (struct ivopts_data *data,
-struct iv *iv, struct iv_use *use)
+add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 {
   unsigned HOST_WIDE_INT offset;
   tree base;
   tree basetype;
+  struct iv *iv = use->iv;
 
   add_candidate (data, iv->base, iv->step, false, use);
 
@@ -2903,21 +2901,25 @@ static void
   basetype = TREE_TYPE (iv->base);
   if (POINTER_TYPE_P (basetype))
 basetype = sizetype;
-  add_candidate (data, build_int_cst (basetype, 0),
-iv->step, true, use);
+  add_candidate (data, build_int_cst (basetype, 0), iv->step, true, use);
 
   /* Third, try removing the constant offset.  Make sure to even
  add a candidate for &a[0] vs. (T *)&a.  */
   base = strip_offset (iv->base, &offset);
-  if (offset
- 

Re: [PATCH AArch64]Handle wrong cost for addition of minus immediate in aarch64_rtx_costs.

2015-07-09 Thread Bin.Cheng
Ping.

On Fri, Jun 26, 2015 at 4:47 PM, Bin Cheng  wrote:
> Hi,
> The canonical form of subtract of immediate is (add op0 minus_imm), which is
> supported with addsi3_aarch64 pattern on aarch64.  Unfortunately wrong cost
> (8 rather than 4) is computed by aarch64_rtx_cost because it doesn't honor
> the fact that it actually is a sub instruction.  This patch fixes it, is
> this OK?
>
> Thanks,
> bin
>
> 2015-06-25  Bin Cheng  
>
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle addition of
> minus immediate.


Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-09 Thread Richard Biener
On Thu, Jul 9, 2015 at 11:34 AM, Alan Lawrence  wrote:
> Jeff Law wrote:
>>
>> On 07/08/2015 03:43 AM, Richard Biener wrote:
>>>
>>> On Wed, Jul 8, 2015 at 12:07 AM, Jeff Law  wrote:

 On 07/07/2015 06:37 AM, Alan Lawrence wrote:
>
> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes
> FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from
> previous patch.
>
> 15_native_interpret_real.patch
>
>
> commit e2e7ca148960a82fc88128820f17e7cbd14173cb
> Author: Alan Lawrence
> Date:   Thu Apr 9 10:54:40 2015 +0100
>
>   Fix native_interpret_real for HFmode floats on Bigendian with
> UNITS_PER_WORD>=4
>
>   (with missing space)

 OK with ChangeLog in proper form.
>>>
>>> Err - but now offset can become negative?  Shouldn't it rather error out
>>> before as it requires at least 4 bytes for big-endian?
>>
>> I managed to convince myself the value wouldn't be negative when
>> reviewing.
>>
>>> That said - the whole thing looks it doesn't expect GET_MODE_SIZE < 4
>>> and your "fix" is just very obfuscated (if it really is a fix).
>>
>> While I couldn't convince myself the function as a whole was prepared for
>> smaller objects, I don't think Alan's patch made things worse.  One could
>> argue the whole bloody thing ought to be rewritten though.
>>
>> I'd also managed to convince myself the other instances of "3" weren't
>> problematical.
>>
>> jeff
>>
>
> In response to Richard's comments, may I propose the attached patch instead?
>
> I used the term "long" because of the earlier comment:
>   /* There are always 32 bits in each long, no matter the size of
>  the hosts long.  We handle floating point representations with
>  up to 192 bits.  */
> with which I really don't think I want to mess at this point ;). However,
> I'd be happy to change my use of "long" to "32 bits", "4 bytes", or "group
> of 4" instead if one of those was preferable!
>
> Bootstrapped + check-gcc on x86_64 (no change); cross-tested on
> aarch64_be-none-elf (where it fixes the advsimd-intrinsics float16 test)
>
> gcc/ChangeLog (as before):
>
> * fold-const.c (native_interpret_real): Fix HFmode for bigendian
> where
> UNITS_PER_WORD >= 4.

Looks good to me.  I think you can remove the assert, indeed offset
cannot become negative (but the compiler can't see that).

I wonder why wi::from_buffer doesn't have the same issue though
for HImode ints.  It's structured differently, without magic '4's as well.

Thanks,
Richard.


Re: [PATCH GCC]Udate best_cost for start cand if it has lower overall cost in iv set narrowing

2015-07-09 Thread Richard Biener
On Thu, Jul 9, 2015 at 11:37 AM, Bin Cheng  wrote:
> Hi,
> When I going through the code, I spot this minor issue.  When
> start_cand/orig_cand/third_cand have overall cost in order like "start_cand
> < third_cand < orig_cand", GCC chooses the third_cand instead of start_cand
> because we haven't set best_cost for start_cand.  This is an obvious fix to
> it.
>
> So is it OK?

Ok.  I wonder if you have a testcase which this improves?

Richard.

>
> 2015-07-08  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (iv_ca_narrow): Update best_cost
> if start candidate has lower cost.


Re: [PATCH] Limit alignment on error_mark_node variable

2015-07-09 Thread H.J. Lu
On Thu, Jul 09, 2015 at 10:16:38AM +0200, Richard Biener wrote:
> On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
> > There is no need to try different alignment on variable of
> > error_mark_node.
> >
> > OK for trunk if there is no regression?
> 
> Can't we avoid calling align_variable on error_mark_node type decls
> completely?  That is, punt earlier when we try to emit it.
> 

How about this?  OK for trunk?

H.J.
---
There is no need to analyze error_mark_node type decls.

gcc/

PR target/66810
* varpool.cvarpool.c (varpool_node::analyze): Skip
error_mark_node type decls.

gcc/testsuite/

PR target/66810
* gcc.target/i386/pr66810.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr66810.c | 10 ++
 gcc/varpool.c   | 29 +
 2 files changed, 27 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66810.c

diff --git a/gcc/testsuite/gcc.target/i386/pr66810.c 
b/gcc/testsuite/gcc.target/i386/pr66810.c
new file mode 100644
index 000..4778b37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66810.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-mno-sse -mno-mmx -miamcu" } */
+
+int vv;
+
+void
+i (void)
+{
+  static int a[vv]; /* { dg-error "storage size" } */
+}
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 10fa93c..f7c4d46 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -514,20 +514,25 @@ varpool_node::get_availability (void)
 void
 varpool_node::analyze (void)
 {
-  /* When reading back varpool at LTO time, we re-construct the queue in order
- to have "needed" list right by inserting all needed nodes into varpool.
- We however don't want to re-analyze already analyzed nodes.  */
-  if (!analyzed)
+  /* Skip error_mark_node type decls.  */
+  if (TREE_TYPE (decl) != error_mark_node)
 {
-  gcc_assert (!in_lto_p || symtab->function_flags_ready);
-  /* Compute the alignment early so function body expanders are
-already informed about increased alignment.  */
-  align_variable (decl, 0);
+  /* When reading back varpool at LTO time, we re-construct the
+queue in order to have "needed" list right by inserting all
+needed nodes into varpool.  We however don't want to re-analyze
+already analyzed nodes.  */
+  if (!analyzed)
+   {
+ gcc_assert (!in_lto_p || symtab->function_flags_ready);
+ /* Compute the alignment early so function body expanders are
+already informed about increased alignment.  */
+ align_variable (decl, 0);
+   }
+  if (alias)
+   resolve_alias (varpool_node::get (alias_target));
+  else if (DECL_INITIAL (decl))
+   record_references_in_initializer (decl, analyzed);
 }
-  if (alias)
-resolve_alias (varpool_node::get (alias_target));
-  else if (DECL_INITIAL (decl))
-record_references_in_initializer (decl, analyzed);
   analyzed = true;
 }
 
-- 
2.4.3



Re: [PATCH GCC]Improve auto-increment addressing mode support in IVO by refactoring add candiate logic.

2015-07-09 Thread Richard Biener
On Thu, Jul 9, 2015 at 11:40 AM, Bin Cheng  wrote:
> Hi,
> This patch refactors codes adding iv candidates in IVO.  It renames
> functions using straightforward names, it also factors function call to
> add_autoinc_candidates from add_candidate to add_iv_candidate_for_use.
> Before this patch, we tried to add autoinc candidates for every call to
> add_candidate.  This has two issues: A) wasting compilation time.  B) adding
> useless auto-inc candidates for iv's which have ZERO base.  These autoinc
> candidates are useless because targets generally only support auto-increment
> addressing mode with base register pointing to memory object, also because
> IVO has its prerequisite conditions on autoinc candidates, these cands are
> actually ignored later.
>
> I collected instrumental data, and < 85% candidates are added now when
> compiling spec2k on Cortex-a15/thumb.
>
> Also this patch could benefit performance for targets supporting autoinc
> addressing mode, because with fewer candidates, IVO algorithm might work
> better.
> I collected spec2k perf data on Cortex-a15.  It shows several cases in
> int/fp suites are improved.  Overall, both spec2k geo-mean of int/fp are
> both improved by ~0.5%.  And no regression.
>
> Though I haven't turned on autoinc support in IVO for aarch64, I would
> expect this patch will pave the way for that.
>
> So is it OK?

Looks good to me.

Thanks,
Richard.

> Thanks,
> bin
>
> 2015-07-08  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (add_candidate): Remove call to
> add_autoinc_candidates.
> (add_iv_candidate_for_biv): Rename to add_iv_candidate_for_biv.
> (add_iv_candidate_for_biv): Rename from add_iv_candidate_for_biv.
> (add_old_ivs_candidates): Rename to add_iv_candidate_for_bivs.
> (add_iv_candidate_for_bivs): Rename from add_old_ivs_candidates.
> Call new function.
> (add_iv_value_candidates): Rename to add_iv_candidate_for_use.
> (add_iv_candidate_for_use): Rename from add_iv_value_candidates.
> Remove parameter struct iv*.  Call add_autoinc_candidates here.
> (add_derived_ivs_candidates): Rename to add_iv_candidate_for_uses.
> (add_iv_candidate_for_uses): Rename from add_derived_ivs_candidates.
> Call new function.
> (find_iv_candidates): Call new functions.
>


Re: [PATCH] Limit alignment on error_mark_node variable

2015-07-09 Thread Richard Biener
On Thu, Jul 9, 2015 at 11:52 AM, H.J. Lu  wrote:
> On Thu, Jul 09, 2015 at 10:16:38AM +0200, Richard Biener wrote:
>> On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
>> > There is no need to try different alignment on variable of
>> > error_mark_node.
>> >
>> > OK for trunk if there is no regression?
>>
>> Can't we avoid calling align_variable on error_mark_node type decls
>> completely?  That is, punt earlier when we try to emit it.
>>
>
> How about this?  OK for trunk?

Heh, you now get the obvious question why we can't simply avoid
adding the varpool node in the first place ;)

Richard.

> H.J.
> ---
> There is no need to analyze error_mark_node type decls.
>
> gcc/
>
> PR target/66810
> * varpool.cvarpool.c (varpool_node::analyze): Skip
> error_mark_node type decls.
>
> gcc/testsuite/
>
> PR target/66810
> * gcc.target/i386/pr66810.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr66810.c | 10 ++
>  gcc/varpool.c   | 29 +
>  2 files changed, 27 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66810.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr66810.c 
> b/gcc/testsuite/gcc.target/i386/pr66810.c
> new file mode 100644
> index 000..4778b37
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr66810.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-mno-sse -mno-mmx -miamcu" } */
> +
> +int vv;
> +
> +void
> +i (void)
> +{
> +  static int a[vv]; /* { dg-error "storage size" } */
> +}
> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 10fa93c..f7c4d46 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -514,20 +514,25 @@ varpool_node::get_availability (void)
>  void
>  varpool_node::analyze (void)
>  {
> -  /* When reading back varpool at LTO time, we re-construct the queue in 
> order
> - to have "needed" list right by inserting all needed nodes into varpool.
> - We however don't want to re-analyze already analyzed nodes.  */
> -  if (!analyzed)
> +  /* Skip error_mark_node type decls.  */
> +  if (TREE_TYPE (decl) != error_mark_node)
>  {
> -  gcc_assert (!in_lto_p || symtab->function_flags_ready);
> -  /* Compute the alignment early so function body expanders are
> -already informed about increased alignment.  */
> -  align_variable (decl, 0);
> +  /* When reading back varpool at LTO time, we re-construct the
> +queue in order to have "needed" list right by inserting all
> +needed nodes into varpool.  We however don't want to re-analyze
> +already analyzed nodes.  */
> +  if (!analyzed)
> +   {
> + gcc_assert (!in_lto_p || symtab->function_flags_ready);
> + /* Compute the alignment early so function body expanders are
> +already informed about increased alignment.  */
> + align_variable (decl, 0);
> +   }
> +  if (alias)
> +   resolve_alias (varpool_node::get (alias_target));
> +  else if (DECL_INITIAL (decl))
> +   record_references_in_initializer (decl, analyzed);
>  }
> -  if (alias)
> -resolve_alias (varpool_node::get (alias_target));
> -  else if (DECL_INITIAL (decl))
> -record_references_in_initializer (decl, analyzed);
>analyzed = true;
>  }
>
> --
> 2.4.3
>


Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-09 Thread Alan Lawrence

Richard Biener wrote:


I wonder why wi::from_buffer doesn't have the same issue though
for HImode ints.  It's structured differently, without magic '4's as well.


I don't claim to understand the rest of wi::from_buffer and why it is different. 
However, wrt. HImode, I think the key line is:


 offset = BYTES_BIG_ENDIAN ? (buffer_len - 1) - byte : byte;

HTH,
Alan



Re: [PATCH 2/4] Add liboffloadmic

2015-07-09 Thread Thomas Schwinge
Hi Ilya!

On Tue, 21 Oct 2014 21:20:34 +0400, Ilya Verbin  wrote:
> This patch contains liboffloadmic library.
> 
> It is used by ICC for offloading.  The sources are imported from upstream
> ( https://www.openmprtl.org/sites/default/files/liboffload_oss.tgz )
> Configure and makefiles are new.
> 
> Also liboffloadmic/runtime/emulator directory is new.  [...]

I noticed that -- at least with current versions of GCC -- there are
several compiler diagnostics displayed during the build.  It would be
nice to get these addressed -- as applicable, presumably in the Intel
upstream version, and then a new import be done into GCC?  For example, I
noticed the following changes in my build logs (not a complete list):

{+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_device.cpp:112:28: 
warning: invalid suffix on literal; C++11 requires a space between literal and 
string macro [-Wliteral-suffix]+}
{+   sprintf (pipe_host_path, "%s"PIPE_HOST_PATH, mic_dir);+}
{+^+}
{+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_device.cpp:113:30: 
warning: invalid suffix on literal; C++11 requires a space between literal and 
string macro [-Wliteral-suffix]+}
{+   sprintf (pipe_target_path, "%s"PIPE_TARGET_PATH, mic_dir);+}
{+  ^+}

{+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_host.cpp:892:24: 
warning: invalid suffix on literal; C++11 requires a space between literal and 
string macro [-Wliteral-suffix]+}
{+   sprintf (pipes_path, "%s"PIPES_PATH, eng->dir);+}
{+^+}
{+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_host.cpp:903:28: 
warning: invalid suffix on literal; C++11 requires a space between literal and 
string macro [-Wliteral-suffix]+}
{+   sprintf (pipe_host_path, "%s"PIPE_HOST_PATH, eng->dir);+}
{+^+}
{+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_host.cpp:904:30: 
warning: invalid suffix on literal; C++11 requires a space between literal and 
string macro [-Wliteral-suffix]+}
{+   sprintf (pipe_target_path, "%s"PIPE_TARGET_PATH, eng->dir);+}
{+  ^+}

[...]/source-gcc/liboffloadmic/runtime/offload_host.cpp:107:30: warning: 
[-deprecated conversion from-]{+ISO C++ forbids converting a+} string constant 
to 'char*' [-Wwrite-strings]
 static char *timer_envname = "H_TIME";
  ^

[...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp: In function 
'void __intel_cilk_for_32_offload(int, void (*)(void*, void*), int, void*, 
void*, unsigned int, unsigned int)':
[...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp:762:55: 
warning: [-deprecated conversion from-]{+ISO C++ forbids converting a+} string 
constant to 'char*' [-Wwrite-strings]
args, target_number)
   ^
[...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp: In function 
'void __intel_cilk_for_64_offload(int, void (*)(void*, void*), int, void*, 
void*, uint64_t, uint64_t)':
[...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp:815:49: 
warning: [-deprecated conversion from-]{+ISO C++ forbids converting a+} string 
constant to 'char*' [-Wwrite-strings]
target_number)
 ^

[...]/source-gcc/liboffloadmic/runtime/offload_orsl.cpp:39:33: warning: 
[-deprecated conversion from-]{+ISO C++ forbids converting a+} string constant 
to 'ORSLTag {aka char*}' [-Wwrite-strings]
 static const ORSLTag   my_tag = "Offload";
 ^


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH GCC]Udate best_cost for start cand if it has lower overall cost in iv set narrowing

2015-07-09 Thread Bin.Cheng
On Thu, Jul 9, 2015 at 5:49 PM, Richard Biener
 wrote:
> On Thu, Jul 9, 2015 at 11:37 AM, Bin Cheng  wrote:
>> Hi,
>> When I going through the code, I spot this minor issue.  When
>> start_cand/orig_cand/third_cand have overall cost in order like "start_cand
>> < third_cand < orig_cand", GCC chooses the third_cand instead of start_cand
>> because we haven't set best_cost for start_cand.  This is an obvious fix to
>> it.
>>
>> So is it OK?
>
> Ok.  I wonder if you have a testcase which this improves?
Thanks for reviewing.
No, unfortunately.  I ran into a case with another patch.  Even for
that case, the start_cand has same overall cost with third_cand.  But
I believe the idea is true.  Considering if we didn't handle
start_cand specially (well, we need to handle it specially) in
narrowing, we need to check it for possible lower cost in following
loop anyway.

Thanks,
bin
>
> Richard.
>
>>
>> 2015-07-08  Bin Cheng  
>>
>> * tree-ssa-loop-ivopts.c (iv_ca_narrow): Update best_cost
>> if start candidate has lower cost.


Re: move a * (1 << b) -> a << b pattern from fold-const.c to match.pd

2015-07-09 Thread Richard Biener
On Thu, 9 Jul 2015, Marek Polacek wrote:

> On Tue, Jul 07, 2015 at 07:47:50AM +0200, Marc Glisse wrote:
> > On Tue, 7 Jul 2015, Prathamesh Kulkarni wrote:
> > 
> > >+/* a * (1 << b) -> a << b */
> > >+(simplify
> > >+  (mult:c @a (lshift integer_onep @b))
> > >+  (if (!FLOAT_TYPE_P (type))
> > >+(lshift @a @b)))
> 
> Just a nit: the last line is wrongly formatted.
> 
> > The test FLOAT_TYPE_P seems unnecessary, 'type' is (up to a useless
> > conversion) the result of a shift, so integer, fixed-point or vector. Its
> > lhs is integer_onep, which rules out fixed-point.
> 
> Right.
> 
> > (I think it is the first pattern using @letter and not @number)
> 
> Yea, I think we should be consistent and use @0 and @1 here.

I've added support for non-digit names to allow more descriptive
patterns.  Like when we have

/* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
(X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
   if the new mask might be further optimized.  */
(for shift (lshift rshift)
 (simplify
  (bit_and (convert?@4 (shift@5 (convert1?@3 @0) INTEGER_CST@1)) 
INTEGER_CST@2)

using @andop2 instead of @2 and @shift instead of @5 (etc.) might
make the following code easier to follow.

That everything uses digits right now is historic mostly (and
my habit of using digits just because I got used to it).

But yes, with @a, @b vs. @0, @1 we should standardize on something
(digits).

Richard.


[RFC, Fortran, (pr66775)] Allocatable function result

2015-07-09 Thread Andre Vehreschild
Hi all,

I need your help on how to interpret the standard(s) or how to implement
handling an allocatable function's result, when that result is not allocated by
the function. Imagine the simple (albeit artificial) case:

integer function read_input()
  ! Do whatever is needed to read an int.
  read_input = ...
end function

integer function getNext()
  allocatable :: getNext
  if (more_input_available ()) getNext = read_input()
end function

where the function getNext () returns an (automatically) allocated result when
more_input_available() returns .true.. Otherwise getNext () returns an
unallocated object, i.e., the result's allocation status is .false.. I don't
want to argue about this design's quality (considering it poor myself). I
suppose that this code is legal, right?

Unfortunately gfortran can not handle it currently. The issue here is shown in
the pseudo code of:

integer, allocatable :: res
res = getNext ()

The pseudo code does this:

integer(kind=4) * D.3425;

if (res != 0B) goto L.6;
res = (integer(kind=4) *) __builtin_malloc (4);
L.6:;
D.3425 = getnext ();
*res = *D.3425; // Oooops! D.3425 is NULL, when more_input_available ()
// is .false.

That is there is a classic null-pointer dereference.

I am curious why the memory allocated (or not) by the function is not reused for
the allocatable object on the lhs? I propose to generally handle an assignment
of a scalar allocatable(!) function result to a scalar allocatable(!) variable
like this (shortened a bit for brevity):

TYPE * res, D.3415, D.3417;

D.3415 = get_next ();
// Swap new result and old contents of res.
D.3417 = D.3415;
D.3415 = res;
res = D.3417;
// Deallocate old content of res to prevent memory leaks.
if (D.3415 != 0B)  __builtin_free (D.3415);

Note there is no initial malloc of res! Would this do any harm, besides
preventing memory loss, saving an allocation and be cleaner to read on the
pseudo code side. Does it violate the standard somehow? Any ideas on
alternatives to prevent the null-pointer dereference in the initial code?

Suggestions, issues, objections???

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: move a * (1 << b) -> a << b pattern from fold-const.c to match.pd

2015-07-09 Thread Marek Polacek
On Thu, Jul 09, 2015 at 12:02:19PM +0200, Richard Biener wrote:
> I've added support for non-digit names to allow more descriptive
> patterns.  Like when we have
> 
> /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>if the new mask might be further optimized.  */
> (for shift (lshift rshift)
>  (simplify
>   (bit_and (convert?@4 (shift@5 (convert1?@3 @0) INTEGER_CST@1)) 
> INTEGER_CST@2)
> 
> using @andop2 instead of @2 and @shift instead of @5 (etc.) might
> make the following code easier to follow.

All right, that makes sense.
 
> That everything uses digits right now is historic mostly (and
> my habit of using digits just because I got used to it).
> 
> But yes, with @a, @b vs. @0, @1 we should standardize on something
> (digits).
> 
> Richard.

Marek


Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-09 Thread Tom de Vries

On 07/07/15 16:00, Michael Matz wrote:

Hi,

On Mon, 6 Jul 2015, Richard Biener wrote:


By doing so, we make the behaviour of gt_cleare_cache independent of the
order in which the entries are visited, turning:
- hard-to-trigger bugs which trigger for one visiting order but not for
   another, into
- more easily triggered bugs which trigger for any visiting order.

Any comments?


I think it is only half-way correct in your proposed change.  You only
fix the issue for hashes of the same kind.  To truly fix the issue you'd
have to change generated code for gt_clear_caches () and provide
a clearing-only implementation (or pass a operation mode bool to
the core worker in hash-table.h).


Hmm, and don't we rather want to first mark and _then_ clear?  Because
if entry B in the hash is live and would keep A live then A _is_ kept in
the end but you'll remove it from the hash, possibly no longer using a
still live copy.


I don't think such use has ever worked with the caching hash tables.  Even
the old (before c++) scheme didn't iterate, i.e. if a cache-hash entry A
became life from outside, but it itself kept an entry B live in the hash
table (with no other references to B) then this never worked (or only by
luck), because the slot was always cleared if !ggc_marked_p, so if B was
visited before A it was removed from the hash-table (and in particular B's
gt_ggc_mx routine was never called, so whatever B needed wasn't even
marked).

Given this I think the call to gt_ggc_mx is superfluous because it
wouldn't work relyably for multi-step dependencies anyway.  Hence a
situation that works with that call in place, and breaking without it is
actually a bug waiting to be uncovered.



Attached patch tries to get multi-step dependencies right, without using 
iteration-till-fixed-point.


During the marking phase, we do recursive marking of the cache entries, 
while skipping the marking of:

- the cache entry itself, and
- the key component of the cache entry.
This marking is done for all non-empty cache entries independent of the 
current value of keep_cache_entry. So we make a conservative choice 
here, and by doing so avoid having to iterate-till-fixed-point.


During the cache-clear phase, for each cache entry we either 
non-recursively mark it live, or remove it.


AFAIU, this scheme reliably handles multi-step dependencies and does not 
increase runtime.


Passes a minimal c build, and it fixes the ICE of PR66714 (although 
there still might be a root cause to address, I'm not sure).


Thanks,
- Tom

Use conservative non-iterative strategy for caches

2015-07-09  Tom de Vries  

	PR libgomp/66714
	* hash-table.h (gt_cleare_cache): Don't recurse cache entries, just
	mark.
	* trans-mem.c (struct tm_wrapper_hasher::ggc_mx (tree_map *&m)): New
	function.
	* tree.c (struct tree_vec_map_cache_hasher::ggc_mx (tree_vec_map *&m):
	Same.
	* tree.h
	(struct tree_decl_map_cache_hasher::ggc_mx (tree_decl_map *&m)): Same.
	* ubsan.c
	(struct tree_type_map_cache_hasher::ggc_mx (tree_type_map *&m)): Same.
	* varasm.c (struct tm_clone_hasher::ggc_mx (tree_map *&m)): Same.

	* testsuite/libgomp.c/pr66714.c: New test.
---
 gcc/hash-table.h  | 28 +++-
 gcc/trans-mem.c   |  4 
 gcc/tree.c|  7 +++
 gcc/tree.h|  6 ++
 gcc/ubsan.c   |  4 
 gcc/varasm.c  |  4 
 libgomp/testsuite/libgomp.c/pr66714.c | 17 +
 7 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 libgomp/testsuite/libgomp.c/pr66714.c

diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 12e0c96..ce899bd 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -1046,6 +1046,32 @@ gt_cleare_cache (hash_table *h)
   if (!h)
 return;
 
+  /* Say we have a cache entry E with key 'from' and non-key 'to'.
+
+ The marking of non-key 'to' should be done in ggc_mx () during the marking
+ phase.  We mark the non-key 'to' conservatively, that is, regardless of
+ whether the key 'from' is live or not.
+
+ The marking of key 'from' has already been done or not during the marking
+ phase, and determines whether we keep entry E live during the clear-cache
+ phase.
+ If key 'from' is alive, we mark entry E as such.
+ If key 'from' is not alive:
+ - we remove the entry E from the cache, and
+ - entry E will be ggc-freed, and
+ - key 'from' will be ggc-freed.
+ - non-key 'to' will not be freed, since we conservatively marked it.  But
+   next ggc invocation, entry E will be gone and no longer cause 'to' to be
+   marked.  So 'to' may be gcc-freed the next ggc invocation.
+
+ Background: A more optimal marking strategy would be to mark the non-key
+ 'to' only if key 'from' is live.  But in order to get to the transitive
+ closure of that marking, we'd need an iterate-till-fixed-point loop around
+ the trav

[PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-09 Thread H.J. Lu
Indirect sibcall with register arguments is OK when there is register
available for argument passing.

OK for trunk if there is no regression?


H.J.
---
gcc/

PR target/66819
* config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
indirect sibcall with register arguments if register available
for argument passing.
(init_cumulative_args): Set cfun->machine->arg_reg_available_p
to cum->nregs != 0.
(function_arg_advance_32): Set cfun->machine->arg_reg_available_p
to 0 when setting cum->nregs = 0.
* config/i386/i386.h (machine_function): Add arg_reg_available_p.

gcc/testsuite/

PR target/66819
* gcc.target/i386/pr66819-1.c: New test.
* gcc.target/i386/pr66819-2.c: Likewise.
* gcc.target/i386/pr66819-3.c: Likewise.
* gcc.target/i386/pr66819-4.c: Likewise.
* gcc.target/i386/pr66819-5.c: Likewise.
---
 gcc/config/i386/i386.c| 15 +--
 gcc/config/i386/i386.h|  3 +++
 gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
 gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
 gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
 gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
 7 files changed, 60 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 54ee6f3..85e59a8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   if (!decl
  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
{
- if (ix86_function_regparm (type, NULL) >= 3)
-   {
- /* ??? Need to count the actual number of registers to be used,
-not the possible number of registers.  Fix later.  */
- return false;
-   }
+ /* FIXME: The symbol indirect call doesn't need a
+call-clobbered register.  But we don't know if
+this is a symbol indirect call or not  here.  */
+ if (ix86_function_regparm (type, NULL) >= 3
+ && !cfun->machine->arg_reg_available_p)
+   return false;
}
 }
 
@@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument 
info to initialize */
? X86_64_REGPARM_MAX
: X86_64_MS_REGPARM_MAX);
 }
+  cfun->machine->arg_reg_available_p = cum->nregs != 0;
   if (TARGET_SSE)
 {
   cum->sse_nregs = SSE_REGPARM_MAX;
@@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument 
info to initialize */
  else
cum->nregs = ix86_function_regparm (fntype, fndecl);
}
+  cfun->machine->arg_reg_available_p = cum->nregs != 0;
 
   /* Set up the number of SSE registers used for passing SFmode
 and DFmode arguments.  Warn for mismatching ABI.  */
@@ -7584,6 +7586,7 @@ pass_in_reg:
{
  cum->nregs = 0;
  cum->regno = 0;
+ cfun->machine->arg_reg_available_p = 0;
}
   break;
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 74334ff..0b6e304 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2479,6 +2479,9 @@ struct GTY(()) machine_function {
   /* If true, it is safe to not save/restore DRAP register.  */
   BOOL_BITFIELD no_drap_save_restore : 1;
 
+  /* If true, there is register available for argument passing.  */
+  BOOL_BITFIELD arg_reg_available_p : 1;
+
   /* During prologue/epilogue generation, the current frame state.
  Otherwise, the frame state at the end of the prologue.  */
   struct machine_frame_state fs;
diff --git a/gcc/testsuite/gcc.target/i386/pr66819-1.c 
b/gcc/testsuite/gcc.target/i386/pr66819-1.c
new file mode 100644
index 000..7c8a1ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66819-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -mregparm=3" } */
+/* { dg-final { scan-assembler-not "call" } } */
+
+void foo(void (*bar)(void))
+{
+  bar();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr66819-2.c 
b/gcc/testsuite/gcc.target/i386/pr66819-2.c
new file mode 100644
index 000..9de4f97
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66819-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-fPIC -O2 -mregparm=3" } */
+/* { dg-final { scan-assembler-not "call" } } */
+
+void foo(void (*bar)(void))
+{
+  bar();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr66819-3.c 
b/gcc/testsuite/gcc.target/i386/pr66819

Re: [PATCH] Teach genmatch.c to generate single-use restrictions from flags

2015-07-09 Thread Richard Biener
On Wed, 8 Jul 2015, Richard Biener wrote:

> 
> This introduces a :s flag to match expressions which enforces
> the expression to have a single-use if(!) the simplified
> expression is larger than one statement.
> 
> Thus with that we for example allow
> 
>   tem = a + 1;
>   x = tem - 3;
>   foo (tem);
> 
> to simplify to
> 
>   tem = a + 1;
>   x = a - 2;
>   foo (tem);
> 
> and more importantly not require "hacks" like
> 
>   /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
>   (simplify
> !   (pointer_plus (pointer_plus@2 @0 @1) @3)
> !   (if (single_use (@2)
> !|| (TREE_CODE (@1) == INTEGER_CST && TREE_CODE (@3) == 
> INTEGER_CST))
> !(pointer_plus @0 (plus @1 @3
> 
> to allow the simplification if (plus @1 @3) will simplify further.
> 
> The trick is that generated code is changed like
> 
>  tree captures[4] ATTRIBUTE_UNUSED = {};
>  captures[0] = op0;
>  captures[1] = o20;
>  captures[2] = o21;
>  captures[3] = op1;
> -/* #line 659 "/space/rguenther/tramp3d/trunk/gcc/match.pd" */
> -if (single_use (captures[0]) || (TREE_CODE (captures[2]) == INTEGER_CST 
> && TREE
> _CODE (captures[3]) == INTEGER_CST))
> -{
> +gimple_seq *lseq = seq;
> +if (!single_use (captures[0])) lseq = NULL;
> ...
> 
> so in particular it disables all single-use restrictions if seq is NULL
> anyway (that is enough of a restriction already).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.

The following is what I will apply soon.  I have changed code-generation
to use

  gimple_seq *lseq = seq;
  if (lseq
  && (!single_use (captures[0])
  || !single_use (captures[2])))
lseq = NULL;

to short-cut the single_use checks completely if they would have no
effect.  I've also cleaned up genmatch by adding a kind-of 
copy-constructor to expr (verified that doesn't change generated code).

I've also changed all but one pattern to the new style.  The single
non-changed one is

/* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
...
(for cmp (eq ne)
 (simplify
  (cmp (minus@2 @0 @1) integer_zerop)
  (if (single_use (@2))
   (cmp @0 @1

which is the one regressing DOM.  The issue with conditionals
if not appearing as the RHS of an assignment is that their
value is not available, so it isn't just a no-op change
regarding to surrounding code.

I've once attempted to "fix" that by forcing predicate computation
out of GIMPLE_CONDs but never finished that (doing this for
[VEC_]COND_EXPRs on RHS of assignments is still on my list of
things to do though).

Richard.

2015-07-09  Richard Biener  

* genmatch.c (struct expr): Add force_single_use flag.
(expr::expr): Add copy constructor.
(capture_info::walk_match): Gather force_single_use captures.
(expr::gen_transform): Use possibly NULLified sequence.
(dt_simplify::gen): Apply single-use restrictions by NULLifying
seq if any constrained expr is not single-use.
(parser::parse_expr): Refactor to allow multiple flags.  Handle
's' flag to force an expression have a single-use if the pattern
simplifies to more than one statement.
* match.pd: Convert most single_use conditionals to :s flags.

Index: gcc/genmatch.c
===
*** gcc/genmatch.c  (revision 225600)
--- gcc/genmatch.c  (working copy)
*** struct expr : public operand
*** 491,497 
expr (id_base *operation_, bool is_commutative_ = false)
  : operand (OP_EXPR), operation (operation_),
ops (vNULL), expr_type (NULL), is_commutative (is_commutative_),
!   is_generic (false) {}
void append_op (operand *op) { ops.safe_push (op); }
/* The operator and its operands.  */
id_base *operation;
--- 491,501 
expr (id_base *operation_, bool is_commutative_ = false)
  : operand (OP_EXPR), operation (operation_),
ops (vNULL), expr_type (NULL), is_commutative (is_commutative_),
!   is_generic (false), force_single_use (false) {}
!   expr (expr *e)
! : operand (OP_EXPR), operation (e->operation),
!   ops (vNULL), expr_type (e->expr_type), is_commutative 
(e->is_commutative),
!   is_generic (e->is_generic), force_single_use (e->force_single_use) {}
void append_op (operand *op) { ops.safe_push (op); }
/* The operator and its operands.  */
id_base *operation;
*** struct expr : public operand
*** 503,508 
--- 507,515 
bool is_commutative;
/* Whether the expression is expected to be in GENERIC form.  */
bool is_generic;
+   /* Whether pushing any stmt to the sequence should be conditional
+  on this expression having a single-use.  */
+   bool force_single_use;
virtual void gen_transform (FILE *f, const char *, bool, int,
  const char *, capture_info *,
  dt_operand ** = 0, bool = true);
*** commutate (operand *op)
*** 747,753 
  
for (unsigned i =

Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-07-09 Thread Tom de Vries

On 07/07/15 17:58, Tom de Vries wrote:

If you can
handle one exit edge I also can't see the difficulty in handling
all exit edges.



Agreed, that doesn't look to complicated. I could call
rewrite_virtuals_into_loop_closed_ssa for all loops in
rewrite_virtuals_into_loop_closed_ssa, to get non-single_dom_exit loops
exercising the code, and fix what breaks.


Hmm, I just realised, it's more complicated than I thought.

In loops with single_dom_exit, the exit dominates the uses outside the 
loop, so I can replace the uses of the def with the uses of the exit phi 
result.


If !single_dom_exit, the exit(s) may not dominate all uses, and I need 
to insert non-loop-exit phi nodes to deal with that.


Thanks,
- Tom


Re: [PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-09 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 12:54 PM, H.J. Lu  wrote:
> Indirect sibcall with register arguments is OK when there is register
> available for argument passing.
>
> OK for trunk if there is no regression?
>
>
> H.J.
> ---
> gcc/
>
> PR target/66819
> * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
> indirect sibcall with register arguments if register available
> for argument passing.
> (init_cumulative_args): Set cfun->machine->arg_reg_available_p
> to cum->nregs != 0.
> (function_arg_advance_32): Set cfun->machine->arg_reg_available_p
> to 0 when setting cum->nregs = 0.
> * config/i386/i386.h (machine_function): Add arg_reg_available_p.
>
> gcc/testsuite/
>
> PR target/66819
> * gcc.target/i386/pr66819-1.c: New test.
> * gcc.target/i386/pr66819-2.c: Likewise.
> * gcc.target/i386/pr66819-3.c: Likewise.
> * gcc.target/i386/pr66819-4.c: Likewise.
> * gcc.target/i386/pr66819-5.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 15 +--
>  gcc/config/i386/i386.h|  3 +++
>  gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
>  gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
>  gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
>  gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
>  7 files changed, 60 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54ee6f3..85e59a8 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>if (!decl
>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
> {
> - if (ix86_function_regparm (type, NULL) >= 3)
> -   {
> - /* ??? Need to count the actual number of registers to be used,
> -not the possible number of registers.  Fix later.  */
> - return false;
> -   }
> + /* FIXME: The symbol indirect call doesn't need a
> +call-clobbered register.  But we don't know if
> +this is a symbol indirect call or not  here.  */
> + if (ix86_function_regparm (type, NULL) >= 3
> + && !cfun->machine->arg_reg_available_p)
> +   return false;
> }
>  }

Why can't we directly look at nregs != 0 in the above code? AFAICS,
nregs accurately tracks number of available argument registers.

Uros.

> @@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> Argument info to initialize */
> ? X86_64_REGPARM_MAX
> : X86_64_MS_REGPARM_MAX);
>  }
> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>if (TARGET_SSE)
>  {
>cum->sse_nregs = SSE_REGPARM_MAX;
> @@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> Argument info to initialize */
>   else
> cum->nregs = ix86_function_regparm (fntype, fndecl);
> }
> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>
>/* Set up the number of SSE registers used for passing SFmode
>  and DFmode arguments.  Warn for mismatching ABI.  */
> @@ -7584,6 +7586,7 @@ pass_in_reg:
> {
>   cum->nregs = 0;
>   cum->regno = 0;
> + cfun->machine->arg_reg_available_p = 0;
> }
>break;
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 74334ff..0b6e304 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2479,6 +2479,9 @@ struct GTY(()) machine_function {
>/* If true, it is safe to not save/restore DRAP register.  */
>BOOL_BITFIELD no_drap_save_restore : 1;
>
> +  /* If true, there is register available for argument passing.  */
> +  BOOL_BITFIELD arg_reg_available_p : 1;
> +
>/* During prologue/epilogue generation, the current frame state.
>   Otherwise, the frame state at the end of the prologue.  */
>struct machine_frame_state fs;
> diff --git a/gcc/testsuite/gcc.target/i386/pr66819-1.c 
> b/gcc/testsuite/gcc.target/i386/pr66819-1.c
> new file mode 100644
> index 000..7c8a1ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr66819-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O2 -mregparm=3" } */
> +/* { dg-final { scan-assembler-not "call" } } */
> +
> +void foo(void (*bar)(void))
> +{
> +  bar();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr66819-2.c 
> b/gcc/testsuite/gcc.target

Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-07-09 Thread Richard Biener
On Thu, 9 Jul 2015, Tom de Vries wrote:

> On 07/07/15 17:58, Tom de Vries wrote:
> > > If you can
> > > handle one exit edge I also can't see the difficulty in handling
> > > all exit edges.
> > > 
> > 
> > Agreed, that doesn't look to complicated. I could call
> > rewrite_virtuals_into_loop_closed_ssa for all loops in
> > rewrite_virtuals_into_loop_closed_ssa, to get non-single_dom_exit loops
> > exercising the code, and fix what breaks.
> 
> Hmm, I just realised, it's more complicated than I thought.
> 
> In loops with single_dom_exit, the exit dominates the uses outside the loop,
> so I can replace the uses of the def with the uses of the exit phi result.
> 
> If !single_dom_exit, the exit(s) may not dominate all uses, and I need to
> insert non-loop-exit phi nodes to deal with that.

Yes.  This is why I originally suggested to amend the regular
loop-close-SSA rewriting code.

Richard.


[PATCH 0/6] {function,edge}_summary for IPA passes

2015-07-09 Thread mliska
Following series of patches continues where I stopped at the end of
previous stage 1. It ports places in IPA passes to function_summary container
and I introduce a new edge_summary, which is essentially very similar to
the aforementioned container.

Patches were tested together on x86_64-linux-gnu and boostrap on both
x86_64-linux-gnu and ppc64le-linux-gnu machines.

Ready for trunk?

Thanks,
Martin

mliska (6):
  hash_set: add iterator and remove method.
  Introduce new edge_summary class and replace ipa_edge_args_sum.
  IPA inline: port inline_edge_summary to a new infrastructure.
  Port ipa-cp to use cgraph_edge summary.
  Port IPA reference to function_summary infrastructure.
  Migrate ipa-pure-const to function_summary.

 gcc/cgraph.c  |   2 +
 gcc/cgraph.h  |   5 +-
 gcc/hash-set.h|  39 +
 gcc/ipa-cp.c  | 198 ---
 gcc/ipa-inline-analysis.c | 107 +---
 gcc/ipa-inline.c  |  18 ++--
 gcc/ipa-inline.h  |  28 +--
 gcc/ipa-profile.c |   4 +-
 gcc/ipa-prop.c|  78 -
 gcc/ipa-prop.h|  44 ++
 gcc/ipa-pure-const.c  | 177 +++
 gcc/ipa-reference.c   | 203 ++--
 gcc/ipa-split.c   |   3 +-
 gcc/ipa.c |   2 +-
 gcc/symbol-summary.h  | 208 +-
 15 files changed, 645 insertions(+), 471 deletions(-)

-- 
2.4.5



[PATCH 1/6] hash_set: add iterator and remove method.

2015-07-09 Thread mliska
gcc/ChangeLog:

2015-07-03  Martin Liska  

* hash-set.h (remove): New function.
(iterator): New iteration class for hash_set.
---
 gcc/hash-set.h | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/gcc/hash-set.h b/gcc/hash-set.h
index 2fb6cae..e85af2a 100644
--- a/gcc/hash-set.h
+++ b/gcc/hash-set.h
@@ -59,6 +59,11 @@ public:
   return !Traits::is_empty (e);
 }
 
+  void remove (const Key &k)
+{
+  m_table.remove_elt_with_hash (k, Traits::hash (k));
+}
+
   /* Call the call back on each pair of key and value with the passed in
  arg.  */
 
@@ -74,6 +79,40 @@ public:
 
   size_t elements () const { return m_table.elements (); }
 
+  class iterator
+  {
+  public:
+explicit iterator (const typename hash_table::iterator &iter) :
+  m_iter (iter) {}
+
+iterator &operator++ ()
+  {
+   ++m_iter;
+   return *this;
+  }
+
+Key
+operator* ()
+  {
+   return *m_iter;
+  }
+
+bool
+operator != (const iterator &other) const
+  {
+   return m_iter != other.m_iter;
+  }
+
+  private:
+typename hash_table::iterator m_iter;
+  };
+
+  /* Standard iterator retrieval methods.  */
+
+  iterator begin () const { return iterator (m_table.begin ()); }
+  iterator end () const { return iterator (m_table.end ()); }
+
+
 private:
 
   template friend void gt_ggc_mx (hash_set *);
-- 
2.4.5




  1   2   >