Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Jakub Jelinek
Hi!

On Thu, Jun 02, 2016 at 06:28:21PM -0600, Martin Sebor wrote:

First of all, can you please respond to the mail I've sent about
NULL argument issues (and proposal for __builtin_*_overflow_p)?

This patch as well as the nonnull attribute patch then depends on that
decision...

> + {
> +   tree type = TREE_TYPE (TREE_TYPE (t));
> +   tree vflow = arith_overflowed_p (opcode, type, arg0, arg1)
> +? integer_one_node : integer_zero_node;

This looks incorrect, the return type is TREE_TYPE (t), some complex integer
type, therefore vflow needs to be
  tree vflow = build_int_cst (TREE_TYPE (TREE_TYPE (t)),
  arith_overflowed_p (opcode, type, arg0, arg1)
  ? 1 : 0);
no?

Jakub


Re: [PR tree-optimization/71328] Fix off-by-one error in CFG/SSA updates for backward threading

2016-06-03 Thread Jakub Jelinek
On Thu, Jun 02, 2016 at 11:24:49PM -0600, Jeff Law wrote:
> commit 96a568909e429b0f24d61c8a2f3dd3c183d720d7
> Author: law 
> Date:   Fri Jun 3 05:20:16 2016 +
> 
>   PR tree-optimization/71328
>   * tree-ssa-threadupdate.c (duplicate_thread_path): Fix off-by-one
>   error when checking for a jump back onto the copied path.  */

The C comment termination in the ChangeLog entry is weird.

Jakub


Re: [PATCH, OpenACC] Make reduction arguments addressable

2016-06-03 Thread Chung-Lin Tang
On 2016/6/2 10:00 PM, Jakub Jelinek wrote:
> Wouldn't it be better to pass either a bool openacc_async flag, or
> whole clauses, down to gfc_trans_omp_reduction_list and handle it there
> instead of walking the list after the fact?

You mean this style? (patch attached)
Tested again with no regressions.

Thanks,
Chung-Lin

* trans-openmp.c (gfc_trans_omp_reduction_list): Add mark_addressable
bool parameter, set reduction clause DECLs as addressable when true.
(gfc_trans_omp_clauses): Pass clauses->async to
gfc_trans_omp_reduction_list, add comment describing OpenACC situation.

Index: trans-openmp.c
===
--- trans-openmp.c  (revision 236845)
+++ trans-openmp.c  (working copy)
@@ -1646,7 +1646,7 @@ gfc_trans_omp_array_reduction_or_udr (tree c, gfc_
 
 static tree
 gfc_trans_omp_reduction_list (gfc_omp_namelist *namelist, tree list,
- locus where)
+ locus where, bool mark_addressable)
 {
   for (; namelist != NULL; namelist = namelist->next)
 if (namelist->sym->attr.referenced)
@@ -1657,6 +1657,8 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist *na
tree node = build_omp_clause (where.lb->location,
  OMP_CLAUSE_REDUCTION);
OMP_CLAUSE_DECL (node) = t;
+   if (mark_addressable)
+ TREE_ADDRESSABLE (t) = 1;
switch (namelist->u.reduction_op)
  {
  case OMP_REDUCTION_PLUS:
@@ -1747,7 +1749,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp
   switch (list)
{
case OMP_LIST_REDUCTION:
- omp_clauses = gfc_trans_omp_reduction_list (n, omp_clauses, where);
+ /* An OpenACC async clause indicates the need to set reduction
+arguments addressable, to allow asynchronous copy-out.  */
+ omp_clauses = gfc_trans_omp_reduction_list (n, omp_clauses, where,
+ clauses->async);
  break;
case OMP_LIST_PRIVATE:
  clause_code = OMP_CLAUSE_PRIVATE;


Re: [PATCH, OpenACC] Make reduction arguments addressable

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 03:13:40PM +0800, Chung-Lin Tang wrote:
> On 2016/6/2 10:00 PM, Jakub Jelinek wrote:
> > Wouldn't it be better to pass either a bool openacc_async flag, or
> > whole clauses, down to gfc_trans_omp_reduction_list and handle it there
> > instead of walking the list after the fact?
> 
> You mean this style? (patch attached)
> Tested again with no regressions.
> 
> Thanks,
> Chung-Lin
> 
> * trans-openmp.c (gfc_trans_omp_reduction_list): Add mark_addressable
> bool parameter, set reduction clause DECLs as addressable when true.
> (gfc_trans_omp_clauses): Pass clauses->async to
> gfc_trans_omp_reduction_list, add comment describing OpenACC 
> situation.

Yep, thanks (and the C/C++ patch is ok too).

> Index: trans-openmp.c
> ===
> --- trans-openmp.c(revision 236845)
> +++ trans-openmp.c(working copy)
> @@ -1646,7 +1646,7 @@ gfc_trans_omp_array_reduction_or_udr (tree c, gfc_
>  
>  static tree
>  gfc_trans_omp_reduction_list (gfc_omp_namelist *namelist, tree list,
> -   locus where)
> +   locus where, bool mark_addressable)
>  {
>for (; namelist != NULL; namelist = namelist->next)
>  if (namelist->sym->attr.referenced)
> @@ -1657,6 +1657,8 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist *na
>   tree node = build_omp_clause (where.lb->location,
> OMP_CLAUSE_REDUCTION);
>   OMP_CLAUSE_DECL (node) = t;
> + if (mark_addressable)
> +   TREE_ADDRESSABLE (t) = 1;
>   switch (namelist->u.reduction_op)
> {
> case OMP_REDUCTION_PLUS:
> @@ -1747,7 +1749,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp
>switch (list)
>   {
>   case OMP_LIST_REDUCTION:
> -   omp_clauses = gfc_trans_omp_reduction_list (n, omp_clauses, where);
> +   /* An OpenACC async clause indicates the need to set reduction
> +  arguments addressable, to allow asynchronous copy-out.  */
> +   omp_clauses = gfc_trans_omp_reduction_list (n, omp_clauses, where,
> +   clauses->async);
> break;
>   case OMP_LIST_PRIVATE:
> clause_code = OMP_CLAUSE_PRIVATE;


Jakub


Re: [PATCH] Fix cgraph edge redirection with non-POD lhs (PR middle-end/71387)

2016-06-03 Thread Richard Biener
On Thu, 2 Jun 2016, Jakub Jelinek wrote:

> Hi!
> 
> Apparently my r236430 change (trunk) and r236431 (6.x) broke the following
> testcase.  In the later similar change to gimple-fold.c in r236506
> I've added code to tweak gimple_call_fntype if we have newly one of the
> void something (void) __attribute__((noreturn)) functions like
> __builtin_unreachable or __cxa_pure_virtual, and drop the lhs even if it
> has been before non-POD, but the new fntype has void return type,
> but apparently we need to do the same in cgraph.c as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2?

Ok.

Thanks,
Richard.

> 2016-06-02  Jakub Jelinek  
> 
>   PR middle-end/71387
>   * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): If redirecting
>   to noreturn e->callee->decl that has void return type and void
>   arguments, adjust gimple_call_fntype and remove lhs even if it had
>   previously addressable type.
> 
>   * g++.dg/opt/pr71387.C: New test.
> 
> --- gcc/cgraph.c.jj   2016-05-26 10:37:54.0 +0200
> +++ gcc/cgraph.c  2016-06-02 17:17:58.963052785 +0200
> @@ -1512,8 +1512,20 @@ cgraph_edge::redirect_call_stmt_to_calle
>update_stmt_fn (DECL_STRUCT_FUNCTION (e->caller->decl), new_stmt);
>  }
>  
> +  /* If changing the call to __cxa_pure_virtual or similar noreturn function,
> + adjust gimple_call_fntype too.  */
> +  if (gimple_call_noreturn_p (new_stmt)
> +  && VOID_TYPE_P (TREE_TYPE (TREE_TYPE (e->callee->decl)))
> +  && TYPE_ARG_TYPES (TREE_TYPE (e->callee->decl))
> +  && (TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (e->callee->decl)))
> +   == void_type_node))
> +gimple_call_set_fntype (new_stmt, TREE_TYPE (e->callee->decl));
> +
>/* If the call becomes noreturn, remove the LHS if possible.  */
> -  if (gimple_call_noreturn_p (new_stmt) && should_remove_lhs_p (lhs))
> +  if (lhs
> +  && gimple_call_noreturn_p (new_stmt)
> +  && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (new_stmt)))
> +   || should_remove_lhs_p (lhs)))
>  {
>if (TREE_CODE (lhs) == SSA_NAME)
>   {
> --- gcc/testsuite/g++.dg/opt/pr71387.C.jj 2016-06-02 17:37:59.868769557 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr71387.C2016-06-02 17:23:51.0 
> +0200
> @@ -0,0 +1,52 @@
> +// PR middle-end/71387
> +// { dg-do compile }
> +// { dg-options "-Og" }
> +
> +struct A
> +{
> +  A ();
> +  inline A (const A &);
> +};
> +
> +struct B
> +{
> +  explicit B (unsigned long) : b(0), c(1) {}
> +  A a;
> +  unsigned long b;
> +  int c;
> +};
> +
> +struct C {};
> +
> +struct D
> +{
> +  explicit D (const C *) {}
> +};
> +
> +struct E : public D
> +{
> +  E (const C *x) : D(x) {}
> +  virtual A foo () const = 0;
> +  virtual A bar () const = 0;
> +};
> +
> +struct F : public B
> +{
> +  inline void baz ();
> +  F (const E *);
> +  const E *f;
> +};
> +
> +inline void
> +F::baz ()
> +{
> +  if (b == 0)
> +a = f->bar ();
> +  else
> +a = f->foo ();
> +}
> +
> +F::F (const E *) : B(4)
> +{
> +  baz ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: move increase_alignment from simple to regular ipa pass

2016-06-03 Thread Richard Biener
On Thu, 2 Jun 2016, Jan Hubicka wrote:

> > On Thu, 2 Jun 2016, David Edelsohn wrote:
> > 
> > > > Richard Biener wrote:
> > > 
> > > >> "This would mean the pass should get its own non-Optimization flag
> > > >> initialized by targets where section anchors are usually used"
> > > >> IIUC should we add a new option -fno-increase_alignment and gate the
> > > >> pass on it ? Um sorry I didn't understand why targets
> > > >> with section anchors (arm, aarch64, ppc) should initialize this option 
> > > >> ?
> > > >
> > > > Currently the pass is only run for targets with section anchors (and 
> > > > there
> > > > by default if they are enabled by default).  So it makes sense to
> > > > run on those by default and the pass is not necessary on targets w/o
> > > > section anchors as the vectorizer can easily adjust alignment itself on
> > > > those.
> > > 
> > > PPC does not always enable section anchors -- it depends on the code
> > > model.  Shouldn't this be tied to use of section anchors?
> > 
> > It effectively is with the patch by walking all functions to see if they
> > have section anchors enabled.  That is unnecessary work for targets that
> 
> fsection-anchors  
>   
> Common Report Var(flag_section_anchors)   
>   
> Access data in the same section from shared anchor points.
>   

Funny.  I see the following on trunk:

fsection-anchors
Common Report Var(flag_section_anchors) Optimization
Access data in the same section from shared anchor points.

> flag_section_anchors is not declared as Optimiation, so it can't be function
> specific right now. It probably should because it is an optimization.  This
> makes me wonder what happens when one function have anchors enabled and other
> doesn't?  Probably anchroing or not anchoring the var will then depend on what
> function comes first in the compilation order and then we will need to make
> backend grok the case where static var is anchored but flag_section_anchors is
> off.

This is because we represent the anchor with DECL_RTL, right?  Maybe
DECL_RTL of globals needs to be re-computed for each function...

> I dunno what is the desired behaviour for LTOint together different code 
> models.

Good question.  There's always the choice to remove 'Optimization' and
enforce same setting for all TUs we LTO in lto-wrapper.

Richard.


Re: move increase_alignment from simple to regular ipa pass

2016-06-03 Thread Jan Hubicka
> > fsection-anchors
> > 
> > Common Report Var(flag_section_anchors) 
> > 
> > Access data in the same section from shared anchor points.  
> > 
> 
> Funny.  I see the following on trunk:
> 
> fsection-anchors
> Common Report Var(flag_section_anchors) Optimization
> Access data in the same section from shared anchor points.

Aha, my local change from last year still inmy tree. Sorry.
Yep, having it as Optimization makes sense, but we need to be sure it works as 
intended.
> 
> > flag_section_anchors is not declared as Optimiation, so it can't be function
> > specific right now. It probably should because it is an optimization.  This
> > makes me wonder what happens when one function have anchors enabled and 
> > other
> > doesn't?  Probably anchroing or not anchoring the var will then depend on 
> > what
> > function comes first in the compilation order and then we will need to make
> > backend grok the case where static var is anchored but flag_section_anchors 
> > is
> > off.
> 
> This is because we represent the anchor with DECL_RTL, right?  Maybe
> DECL_RTL of globals needs to be re-computed for each function...

I would rather anchor variable if it is used by at least one function that is 
compiled
with anchors.  Accessing anchors is IMO no slower than accessing symbols. But I 
am not
that familiar witht his code...
> 
> > I dunno what is the desired behaviour for LTOint together different code 
> > models.
> 
> Good question.  There's always the choice to remove 'Optimization' and
> enforce same setting for all TUs we LTO in lto-wrapper.

Yep. Not sure what is better - I did not really think of targets that use both
models.

Honza
> 
> Richard.


Re: [PATCH v1] Support for SPARC M7 and VIS 4.0

2016-06-03 Thread Eric Botcazou
> This patch adds support for -mcpu=niagara7, corresponding to the SPARC
> M7 CPU as documented in the Oracle SPARC Architecture 2015 and the M7
> Processor Supplement.  The patch also includes intrinsics support for
> all the VIS 4.0 instructions.
> 
> This patch has been tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu
> and sparc-sun-solaris2.11 targets.
> 
> gcc/ChangeLog:
> 
>   * config/sparc/sparc.md (cpu): Add niagara7 cpu type.
>   Include the M7 SPARC DFA scheduler.
>   New attribute v3pipe.
>   Annotate insns with v3pipe where appropriate.
>   Define cpu_feature vis4.
>   Add lzd instruction type and set it on clzdi_sp64 and clzsi_sp64.
>   Add (V8QI "8") to vbits.
>   Add insns {add,sub}v8qi3
>   Add insns ss{add,sub}v8qi3
>   Add insns us{add,sub}{v8qi,v4hi}3
>   Add insns {min,max}{v8qi,v4hi,v2si}3
>   Add insns {minu,maxu}{v8qi,v4hi,v2si}3
>   Add insns fpcmp{le,gt,ule,ug,ule,ugt}{8,16,32}_vis.
>   * config/sparc/niagara4.md: Add a comment explaining the
>   discrepancy between the documented latenty numbers and the
>   implemented ones.
>   * config/sparc/niagara7.md: New file.
>   * configure.ac (HAVE_AS_SPARC5_VIS4): Define if the assembler
>   supports SPARC5 and VIS 4.0 instructions.
>   * configure: Regenerate.
>   * config.in: Likewise.
>   * config.gcc: niagara7 is a supported cpu in sparc*-*-* targets.
>   * config/sparc/sol2.h (ASM_CPU32_DEFAUILT_SPEC): Set for
>   TARGET_CPU_niagara7.
>   (ASM_CPU64_DEFAULT_SPEC): Likewise.
>   (CPP_CPU_SPEC): Handle niagara7.
>   (ASM_CPU_SPEC): Likewise.
>   * config/sparc/sparc-opts.h (processor_type): Add
>   PROCESSOR_NIAGARA7.
>   (mvis4): New option.
>   * config/sparc/sparc.h (TARGET_CPU_niagara7): Define.
>   (AS_NIAGARA7_FLAG): Define.
>   (ASM_CPU64_DEFAULT_SPEC): Set for niagara7.
>   (CPP_CPU64_DEFAULT_SPEC): Likewise.
>   (CPP_CPU_SPEC): Handle niagara7.
>   (ASM_CPU_SPEC): Likewise.
>   * config/sparc/sparc.c (niagara7_costs): Define.
>   (sparc_option_override): Handle niagara7 and adjust cache-related
>   parameters with better values for niagara cpus.  Also support VIS4.
>   (sparc32_initialize_trampoline): Likewise.
>   (sparc_use_sched_lookahead): Likewise.
>   (sparc_issue_rate): Likewise.
>   (sparc_register_move_cost): Likewise.
>   (dump_target_flag_bits): Support VIS4.
>   (sparc_vis_init_builtins): Likewise.
>   (sparc_builtins): Likewise.
>   * config/sparc/sparc-c.c (sparc_target_macros): Define __VIS__ for
>   VIS4 4.0.
>   * config/sparc/driver-sparc.c (cpu_names): Add SPARC-M7 and
>   UltraSparc M7.
>   * config/sparc/sparc.opt (sparc_processor_type): New value
>   niagara7.
>   * config/sparc/visintrin.h (__attribute__): Prototypes for the
>   VIS4 builtins.
>   * doc/invoke.texi (SPARC Options): Document -mcpu=niagara7 and
>   -mvis4.
>   * doc/extend.texi (SPARC VIS Built-in Functions): Document the
>   VIS4 builtins.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/sparc/vis4misc.c: New file.
>   * gcc.target/sparc/fpcmp.c: Likewise.
>   * gcc.target/sparc/fpcmpu.c: Likewise.

OK for mainline, thanks.  As mentioned yesterday, I think that we should also 
put it on the 6 branch, but I can do the backport myself.

-- 
Eric Botcazou


[PATCH][wwwdocs][obvious] Fix typo in -finline-matmul-limit

2016-06-03 Thread Kyrill Tkachov

Committed as obvious.

Thanks,
Kyrill
Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.82
diff -U 3 -r1.82 changes.html
--- htdocs/gcc-6/changes.html	31 May 2016 08:01:35 -	1.82
+++ htdocs/gcc-6/changes.html	2 Jun 2016 10:21:18 -
@@ -346,7 +346,7 @@
   cases if front-end optimization is active.  The maximum size for
   inlining can be set to n with the
   -finline-matmul-limit=n option and turned off
-  with -finline-matmul-llimit=0.
+  with -finline-matmul-limit=0.
 The -Wconversion-extra option will warn about
   REAL constants which have excess precision for
   their kind.


[PATCH][ARM] Fix gcc.target/arm/builtin-bswap16-1.c

2016-06-03 Thread Kyrill Tkachov

Hi all,

The test gcc.target/arm/builtin-bswap16-1.c refuses to compile when testing a 
toolchain configured with
--with-mode=thumb --with-float=hard and an architecture that supports Thumb2.
This is because the test explicitly sets the -march option to armv6 and we get 
an error complaining
about Thumb1 used with the hard-float ABI.

The proposed solution in this patch is to bump the architecture to armv6t2 so 
that it uses Thumb2 when
-mthumb is used.

But we don't want to lose Thumb1 test coverage. So this patch moves the actual 
C code into a separate
.x file and includes it in two different tests, each testing Thumb1 or Thumb2.

The new test passes and builtin-bswap16-1.c also now passes rather than 
complaining about the float ABI.

Ok for trunk?

Thanks,
Kyrill

2016-06-03  Kyrylo Tkachov  

* gcc.target/arm/builtin-bswap16-1.c: Add -mfloat-abi=soft
and -mthumb to dg-options.  Include builtin-bswap16.x.
* gcc.target/arm/builtin-bswap16: New file.
* gcc.target/arm/builtin-bswap16-2.c: New test.
diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
index 6920f004eab42443441227029c579aeb2bb981ee..7c3f8370e132b4c41ad7b3dac973b552c4ddbfe1 100644
--- a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
+++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
@@ -1,15 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mthumb -mfloat-abi=soft" } */
 /* { dg-require-effective-target arm_arch_v6_ok } */
 /* { dg-add-options arm_arch_v6 } */
-/* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
 
-unsigned short swapu16_1 (unsigned short x)
-{
-  return (x << 8) | (x >> 8);
-}
+/* Test Thumb1 code generation when -mthumb is used.  */
+
+#include "builtin-bswap16.x"
 
-unsigned short swapu16_2 (unsigned short x)
-{
-  return (x >> 8) | (x << 8);
-}
+/* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-2.c b/gcc/testsuite/gcc.target/arm/builtin-bswap16-2.c
new file mode 100644
index ..a4927e3ab0ced7a272e5acb4a5c2bcb1b2badafc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_arch_v6t2_ok } */
+/* { dg-add-options arm_arch_v6t2 } */
+
+/* Test Thumb2 code generation when -mthumb is used.  */
+
+#include "builtin-bswap16.x"
+
+/* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16.x b/gcc/testsuite/gcc.target/arm/builtin-bswap16.x
new file mode 100644
index ..1e7f41edf013e353944f0a4879a1248c8a8b2f11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16.x
@@ -0,0 +1,9 @@
+unsigned short swapu16_1 (unsigned short x)
+{
+  return (x << 8) | (x >> 8);
+}
+
+unsigned short swapu16_2 (unsigned short x)
+{
+  return (x >> 8) | (x << 8);
+}


Re: [PATCH][combine] PR middle-end/71074 Check that const_op is >= 0 before potentially shifting in simplify_comparison

2016-06-03 Thread Kyrill Tkachov


On 13/05/16 14:26, Bernd Schmidt wrote:

On 05/13/2016 03:22 PM, Kyrill Tkachov wrote:

/* We only want to handle integral modes.  This catches VOIDmode,
   CCmode, and the floating-point modes.  An exception is that we
@@ -11649,7 +11649,8 @@ simplify_comparison (enum rtx_code code,
/* Try to simplify the compare to constant, possibly changing the
   comparison op, and/or changing op1 to zero.  */
code = simplify_compare_const (code, mode, op0, &op1);
-  const_op = INTVAL (op1);
+  HOST_WIDE_INT const_op = INTVAL (op1);
+  unsigned HOST_WIDE_INT uconst_op = (unsigned HOST_WIDE_INT)
const_op;

Can this be just "unsigned HOST_WIDE_INT uconst_op = UINTVAL (op1);" ?


Either should work.


+  unsigned HOST_WIDE_INT low_mask
+= (((unsigned HOST_WIDE_INT) 1 << INTVAL (amount)) - 1);
unsigned HOST_WIDE_INT low_bits
-= (nonzero_bits (XEXP (op0, 0), mode)
-   & (((unsigned HOST_WIDE_INT) 1
-   << INTVAL (XEXP (op0, 1))) - 1));
+= (nonzero_bits (XEXP (op0, 0), mode) & low_mask);
if (low_bits == 0 || !equality_comparison_p)
  {

(unsigned HOST_WIDE_INT) 1 can be replaced with HOST_WIDE_INT_1U.


Ah, I suspected there was something like this, but none of the surrounding code was using it. Newly changed code should probably use that; we could probably improve things further by using it more consistently in this function, but let's 
do that in another patch.



Bernd


Hi Bernd,

Here is the patch with the changes discussed.
Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and 
x86_64-linux.
Is this ok?

Thanks,
Kyrill

2016-06-03  Bernd Schmidt  
Kyrylo Tkachov  

PR middle-end/71074
* combine.c (simplify_comparison): Factor out XEXP (op, 1) and
UINTVAL (op1).  Avoid left shift of negative value.

2016-06-03  Kyrylo Tkachov  

PR middle-end/71074
* gcc.c-torture/compile/pr71074.c: New test.

diff --git a/gcc/combine.c b/gcc/combine.c
index 0343c3af0ff53199422111c2e40a1afa13ce4e91..752393b7e0e002c0c8152aaa410716f8f3970f82 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11630,13 +11630,13 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 
   while (CONST_INT_P (op1))
 {
+  rtx amount;
   machine_mode mode = GET_MODE (op0);
   unsigned int mode_width = GET_MODE_PRECISION (mode);
   unsigned HOST_WIDE_INT mask = GET_MODE_MASK (mode);
   int equality_comparison_p;
   int sign_bit_comparison_p;
   int unsigned_comparison_p;
-  HOST_WIDE_INT const_op;
 
   /* We only want to handle integral modes.  This catches VOIDmode,
 	 CCmode, and the floating-point modes.  An exception is that we
@@ -11651,7 +11651,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
   /* Try to simplify the compare to constant, possibly changing the
 	 comparison op, and/or changing op1 to zero.  */
   code = simplify_compare_const (code, mode, op0, &op1);
-  const_op = INTVAL (op1);
+  HOST_WIDE_INT const_op = INTVAL (op1);
+  unsigned HOST_WIDE_INT uconst_op = UINTVAL (op1);
 
   /* Compute some predicates to simplify code below.  */
 
@@ -11901,7 +11902,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	  if (GET_MODE_CLASS (mode) == MODE_INT
 	  && (unsigned_comparison_p || equality_comparison_p)
 	  && HWI_COMPUTABLE_MODE_P (mode)
-	  && (unsigned HOST_WIDE_INT) const_op <= GET_MODE_MASK (mode)
+	  && uconst_op <= GET_MODE_MASK (mode)
 	  && const_op >= 0
 	  && have_insn_for (COMPARE, mode))
 	{
@@ -12206,28 +12207,28 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	  break;
 
 	case ASHIFT:
+	  amount = XEXP (op0, 1);
 	  /* If we have (compare (ashift FOO N) (const_int C)) and
 	 the high order N bits of FOO (N+1 if an inequality comparison)
 	 are known to be zero, we can do this by comparing FOO with C
 	 shifted right N bits so long as the low-order N bits of C are
 	 zero.  */
-	  if (CONST_INT_P (XEXP (op0, 1))
-	  && INTVAL (XEXP (op0, 1)) >= 0
-	  && ((INTVAL (XEXP (op0, 1)) + ! equality_comparison_p)
+	  if (CONST_INT_P (amount)
+	  && INTVAL (amount) >= 0
+	  && ((INTVAL (amount) + ! equality_comparison_p)
 		  < HOST_BITS_PER_WIDE_INT)
-	  && (((unsigned HOST_WIDE_INT) const_op
-		   & (((unsigned HOST_WIDE_INT) 1 << INTVAL (XEXP (op0, 1)))
-		  - 1)) == 0)
+	  && ((uconst_op
+		   & ((HOST_WIDE_INT_1U << INTVAL (amount)) - 1)) == 0)
 	  && mode_width <= HOST_BITS_PER_WIDE_INT
 	  && (nonzero_bits (XEXP (op0, 0), mode)
-		  & ~(mask >> (INTVAL (XEXP (op0, 1))
+		  & ~(mask >> (INTVAL (amount)
 			   + ! equality_comparison_p))) == 0)
 	{
 	  /* We must perform a logical shift, not an arithmetic one,
 		 as we want the top N bits of C to be zero.  */
 	  unsigned HOST_WIDE_INT temp = const_op & GET_MODE_MASK (mode);
 
-

[AArch64] Give some new costs for Cortex-A57 floating-point operations

2016-06-03 Thread James Greenhalgh

Hi,

This patch rebases the floating-point cost table for Cortex-A57 to be
relative to the cost of a floating-point move. This in response to this
feedback from Richard Sandiford [2] on Ramana's patch to calls.c [1] from
2014:

  I think this is really a bug in the backend.  The backend is assigning a
  cost of COSTS_N_INSNS (3) to a floating-point constant not because the
  constant itself is expensive -- it's actually as cheap as a register
  in this context -- but because the backend considers floating-point
  moves to be 3 times more expensive than cheap integer moves.

The argument is that a move in mode X should be treated with cost
COSTS_N_INSNS (1), and other instructions should have a cost relative to
that move. For example, in this patch we say that instructions building a
floating-point constant are the same cost as a floating-point register to
register move. Fixing this fixes the issue Ramana was seeing, in a way
consistent with what other back-ends do.

This patch gives a small improvement to Spec2000FP on a Cortex-A57
platform.

Bootstrapped on aarch64-none-linux-gnu with no issues.

OK?

Thanks,
James

---
2016-06-03  James Greenhalgh  

* config/arm/aarch-cost-tables.h (cortexa57_extra_costs): Make FP
costs relative to the cost of a register move.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00136.html
[2] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00391.html

diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h
index c971b30..5f42253 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -294,35 +294,35 @@ const struct cpu_cost_table cortexa57_extra_costs =
   {
 /* FP SFmode */
 {
-  COSTS_N_INSNS (17),  /* div.  */
-  COSTS_N_INSNS (5),   /* mult.  */
-  COSTS_N_INSNS (9),   /* mult_addsub. */
-  COSTS_N_INSNS (9),   /* fma.  */
-  COSTS_N_INSNS (4),   /* addsub.  */
-  COSTS_N_INSNS (2),   /* fpconst. */
-  COSTS_N_INSNS (2),   /* neg.  */
-  COSTS_N_INSNS (2),   /* compare.  */
-  COSTS_N_INSNS (4),   /* widen.  */
-  COSTS_N_INSNS (4),   /* narrow.  */
-  COSTS_N_INSNS (4),   /* toint.  */
-  COSTS_N_INSNS (4),   /* fromint.  */
-  COSTS_N_INSNS (4)/* roundint.  */
+  COSTS_N_INSNS (6),  /* div.  */
+  COSTS_N_INSNS (1),   /* mult.  */
+  COSTS_N_INSNS (2),   /* mult_addsub.  */
+  COSTS_N_INSNS (2),   /* fma.  */
+  COSTS_N_INSNS (1),   /* addsub.  */
+  0,		   /* fpconst.  */
+  0,		   /* neg.  */
+  0,		   /* compare.  */
+  COSTS_N_INSNS (1),   /* widen.  */
+  COSTS_N_INSNS (1),   /* narrow.  */
+  COSTS_N_INSNS (1),   /* toint.  */
+  COSTS_N_INSNS (1),   /* fromint.  */
+  COSTS_N_INSNS (1)/* roundint.  */
 },
 /* FP DFmode */
 {
-  COSTS_N_INSNS (31),  /* div.  */
-  COSTS_N_INSNS (5),   /* mult.  */
-  COSTS_N_INSNS (9),   /* mult_addsub.  */
-  COSTS_N_INSNS (9),   /* fma.  */
-  COSTS_N_INSNS (4),   /* addsub.  */
-  COSTS_N_INSNS (2),   /* fpconst.  */
-  COSTS_N_INSNS (2),   /* neg.  */
-  COSTS_N_INSNS (2),   /* compare.  */
-  COSTS_N_INSNS (4),   /* widen.  */
-  COSTS_N_INSNS (4),   /* narrow.  */
-  COSTS_N_INSNS (4),   /* toint.  */
-  COSTS_N_INSNS (4),   /* fromint.  */
-  COSTS_N_INSNS (4)/* roundint.  */
+  COSTS_N_INSNS (11),  /* div.  */
+  COSTS_N_INSNS (1),   /* mult.  */
+  COSTS_N_INSNS (2),   /* mult_addsub.  */
+  COSTS_N_INSNS (2),   /* fma.  */
+  COSTS_N_INSNS (1),   /* addsub.  */
+  0,		   /* fpconst.  */
+  0,		   /* neg.  */
+  0,		   /* compare.  */
+  COSTS_N_INSNS (1),   /* widen.  */
+  COSTS_N_INSNS (1),   /* narrow.  */
+  COSTS_N_INSNS (1),   /* toint.  */
+  COSTS_N_INSNS (1),   /* fromint.  */
+  COSTS_N_INSNS (1)/* roundint.  */
 }
   },
   /* Vector */


[Committed] Adding myself to MAINTAINERS

2016-06-03 Thread Jose E. Marchesi

Index: MAINTAINERS
===
--- MAINTAINERS (revision 237053)
+++ MAINTAINERS (working copy)
@@ -493,6 +493,7 @@
 Ziga Mahkovec  
 David Malcolm  
 Mikhail Maltsev
+Jose E. Marchesi   
 Patrick Marlier

 Simon Martin   
 Ranjit Mathew  
Index: ChangeLog
===
--- ChangeLog   (revision 237053)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2016-06-03  Jose E. Marchesi  
+
+   * MAINTAINERS (Write After Approval): Add myself. 
+
 2016-05-28  Alan Modra  
 
* Makefile.tpl (configure): Depend on m4 files included.


[PATCH][ARM][obvious] Fix typos in *thumb1_mulsi3 comment

2016-06-03 Thread Kyrill Tkachov

Hi all,

This patch just fixes a couple of typos (s/can fails/can fail/, s/with the 
Thumb/on Thumb/) and makes
the commenting style consistent on this pattern in thumb1.md
Tested on arm-none-eabi.

Committing as obvious.

Thanks,
Kyrill

2016-06-03  Kyrylo Tkachov  

* config/arm/thumb1.md (*thumb1_mulsi3): Fix typos in comment.
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index c5b59bd3e1577a904a93bb8bdf7d486b086fb848..035641b3335797cb843a5741d7372492d5a91cfc 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -142,11 +142,11 @@ (define_insn "thumb1_subsi3_insn"
(set_attr "type" "alus_sreg")]
 )
 
-; Unfortunately with the Thumb the '&'/'0' trick can fails when operands
-; 1 and 2; are the same, because reload will make operand 0 match
-; operand 1 without realizing that this conflicts with operand 2.  We fix
-; this by adding another alternative to match this case, and then `reload'
-; it ourselves.  This alternative must come first.
+;; Unfortunately on Thumb the '&'/'0' trick can fail when operands
+;; 1 and 2 are the same, because reload will make operand 0 match
+;; operand 1 without realizing that this conflicts with operand 2.  We fix
+;; this by adding another alternative to match this case, and then `reload'
+;; it ourselves.  This alternative must come first.
 (define_insn "*thumb_mulsi3"
   [(set (match_operand:SI  0 "register_operand" "=&l,&l,&l")
 	(mult:SI (match_operand:SI 1 "register_operand" "%l,*h,0")


[C++ Patch] PR 70202 ("ICE on invalid code in build_simple_base_path, at cp/class.c:579")

2016-06-03 Thread Paolo Carlini

Hi,

in this error recovery issue, after a sensible error message about 
duplicate base type and a redundant one about an incomplete type (the 
type with the erroneous base type) we finally crash much later in 
build_simple_base_path. Yesterday I noticed that elsewhere we don't 
check the return value of xref_basetypes (which only returns false after 
an hard error) and error recovery seems better if we don't zero the type 
in such cases (we don't end up with this weird half-erroneous, 
half-incomplete type declaration which we have to tell from the good 
ones downstream, we have instead an usable reference for the informs 
about duplicated definitions (see yesterday's patch), etc.). I also 
found the commit which introduced the zeroing, r189582, back in 2012, 
and the testcase which was crashing back then is now correctly handled 
anyway. Thus, all in all, I'm proposing the below for trunk. I guess we 
can quickly back it out if we realize that error recovery gets much 
worse in other as yet unknown situations.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2016-06-03  Paolo Carlini  

PR c++/70202
* parser.c (cp_parser_class_head): When xref_basetypes fails and
emits an error do not zero the type.

/testsuite
2016-06-03  Paolo Carlini  

PR c++/70202
* g++.dg/inherit/crash5.C: New.
* g++.dg/inherit/virtual1.C: Adjust.
Index: cp/parser.c
===
--- cp/parser.c (revision 237046)
+++ cp/parser.c (working copy)
@@ -22050,9 +22050,8 @@ cp_parser_class_head (cp_parser* parser,
 
   /* If we're really defining a class, process the base classes.
  If they're invalid, fail.  */
-  if (type && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE)
-  && !xref_basetypes (type, bases))
-type = NULL_TREE;
+  if (type && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+xref_basetypes (type, bases);
 
  done:
   /* Leave the scope given by the nested-name-specifier.  We will
Index: testsuite/g++.dg/inherit/crash5.C
===
--- testsuite/g++.dg/inherit/crash5.C   (revision 0)
+++ testsuite/g++.dg/inherit/crash5.C   (working copy)
@@ -0,0 +1,10 @@
+// PR c++/70202
+
+class A
+{
+  virtual void foo () { }
+};
+class B : public A, A { };  // { dg-error "duplicate base type" }
+
+B b1, &b2 = b1;
+A a = b2;
Index: testsuite/g++.dg/inherit/virtual1.C
===
--- testsuite/g++.dg/inherit/virtual1.C (revision 237045)
+++ testsuite/g++.dg/inherit/virtual1.C (working copy)
@@ -5,8 +5,8 @@ struct A
 virtual ~A() {}
 };
 
-struct B : A, virtual A {}; // { dg-error "duplicate base|forward 
declaration" }
+struct B : A, virtual A {}; // { dg-error "duplicate base" }
 
-struct C : A, B {}; // { dg-error "duplicate base|invalid use" }
+struct C : A, B {}; // { dg-error "duplicate base" }
 
-C c;// { dg-error "aggregate" }
+C c;


Re: [AArch64 1/2] Refactor aarch64_operands_ok_for_ldpstp, aarch64_operands_adjust_ok_for_ldpstp

2016-06-03 Thread James Greenhalgh
*ping* https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01193.html

Thanks,
James

On Tue, May 17, 2016 at 10:22:30AM +0100, James Greenhalgh wrote:
> 
> Hi,
> 
> These two functions are very similar and suffer from code duplication.
> With a little bit of work we can reduce the strain on the reader by
> refactoring the functions.
> 
> Essentially, we're going to remove the explicit references to reg_1,
> reg_2, reg_3, reg_4 and keep these things in arrays instead, at which
> point it becomes clear that these functions are very similar and can be
> pulled together.
> 
> OK?
> 
> Bootstrapped and tested for aarch64-none-linux-gnu with no issues.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2016-05-17  James Greenhalgh  
> 
>   * config/aarch64/aarch64.c
>   (aarch64_extract_ldpstp_operands): New.
>   (aarch64_ldpstp_ops_same_reg_class_p): Likewise.
>   (aarch64_ldpstp_load_regs_clobber_base_p): Likewise.
>   (aarch64_ldpstp_offsets_consecutive_p): Likewise.
>   (aarch64_operands_ok_for_ldpstp_1): Likewise.
>   (aarch64_operands_ok_for_ldpstp): Refactor to
>   aarch64_operands_ok_for_ldpstp_1.
>   (aarch64_operands_adjust_ok_for_ldpstp): Likewise.
> 



Re: [Patch AArch64 2/2] Some more cleanup of ldp/stp generation

2016-06-03 Thread James Greenhalgh
*ping* https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01192.html

Thanks,
James

On Tue, May 17, 2016 at 10:22:31AM +0100, James Greenhalgh wrote:
> 
> This is another refactoring patch to clean up more of the ldp/stp handling
> code and avoid duplicating quite as much code.
> 
> Much like the other refactoring patch, this reduces the use of reg_1, reg_2,
> etc. leading to a cleaner implementation.
> 
> Bootstrapped on AArch64 with no issues.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2016-05-17  James Greenhalgh  
> 
>   * config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Refactor.
> 

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 434c154..01bbe81 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13549,26 +13549,18 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool 
> load,
>enum machine_mode mode, RTX_CODE code)
>  {
>rtx base, offset, t1, t2;
> -  rtx mem_1, mem_2, mem_3, mem_4;
> +  rtx mem[4];
>HOST_WIDE_INT off_val, abs_off, adj_off, new_off, stp_off_limit, msize;
>  
> -  if (load)
> -{
> -  mem_1 = operands[1];
> -  mem_2 = operands[3];
> -  mem_3 = operands[5];
> -  mem_4 = operands[7];
> -}
> -  else
> -{
> -  mem_1 = operands[0];
> -  mem_2 = operands[2];
> -  mem_3 = operands[4];
> -  mem_4 = operands[6];
> -  gcc_assert (code == UNKNOWN);
> -}
> +  unsigned op_offset = load ? 1 : 0;
> +
> +  for (int i = 0; i < 4; i++)
> +mem[i] = operands[(2 * i) + op_offset];
>  
> -  extract_base_offset_in_addr (mem_1, &base, &offset);
> +  if (!load)
> +gcc_assert (code == UNKNOWN);
> +
> +  extract_base_offset_in_addr (mem[0], &base, &offset);
>gcc_assert (base != NULL_RTX && offset != NULL_RTX);
>  
>/* Adjust offset thus it can fit in ldp/stp instruction.  */
> @@ -13597,59 +13589,32 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool 
> load,
>  }
>  
>/* Create new memory references.  */
> -  mem_1 = change_address (mem_1, VOIDmode,
> -   plus_constant (DImode, operands[8], new_off));
> +  mem[0] = change_address (mem[0], VOIDmode,
> +   plus_constant (Pmode, operands[8], new_off));
>  
>/* Check if the adjusted address is OK for ldp/stp.  */
> -  if (!aarch64_mem_pair_operand (mem_1, mode))
> +  if (!aarch64_mem_pair_operand (mem[0], mode))
>  return false;
>  
>msize = GET_MODE_SIZE (mode);
> -  mem_2 = change_address (mem_2, VOIDmode,
> -   plus_constant (DImode,
> -  operands[8],
> -  new_off + msize));
> -  mem_3 = change_address (mem_3, VOIDmode,
> -   plus_constant (DImode,
> -  operands[8],
> -  new_off + msize * 2));
> -  mem_4 = change_address (mem_4, VOIDmode,
> -   plus_constant (DImode,
> -  operands[8],
> -  new_off + msize * 3));
> -
> -  if (code == ZERO_EXTEND)
> -{
> -  mem_1 = gen_rtx_ZERO_EXTEND (DImode, mem_1);
> -  mem_2 = gen_rtx_ZERO_EXTEND (DImode, mem_2);
> -  mem_3 = gen_rtx_ZERO_EXTEND (DImode, mem_3);
> -  mem_4 = gen_rtx_ZERO_EXTEND (DImode, mem_4);
> -}
> -  else if (code == SIGN_EXTEND)
> -{
> -  mem_1 = gen_rtx_SIGN_EXTEND (DImode, mem_1);
> -  mem_2 = gen_rtx_SIGN_EXTEND (DImode, mem_2);
> -  mem_3 = gen_rtx_SIGN_EXTEND (DImode, mem_3);
> -  mem_4 = gen_rtx_SIGN_EXTEND (DImode, mem_4);
> -}
>  
> -  if (load)
> -{
> -  operands[1] = mem_1;
> -  operands[3] = mem_2;
> -  operands[5] = mem_3;
> -  operands[7] = mem_4;
> -}
> -  else
> -{
> -  operands[0] = mem_1;
> -  operands[2] = mem_2;
> -  operands[4] = mem_3;
> -  operands[6] = mem_4;
> -}
> +  for (int i = 1; i < 4; i++)
> +mem[i] = change_address (mem[i], VOIDmode,
> +  plus_constant (Pmode,
> + operands[8],
> + new_off + (msize * i)));
> +
> +  for (int i = 0; i < 4; i++)
> +if (code == ZERO_EXTEND)
> +  mem[i] = gen_rtx_ZERO_EXTEND (Pmode, mem[i]);
> +else if (code == SIGN_EXTEND)
> +  mem[i] = gen_rtx_SIGN_EXTEND (Pmode, mem[i]);
> +
> +  for (int i = 0; i < 4; i++)
> +operands[(2 * i) + op_offset] = mem[i];
>  
>/* Emit adjusting instruction.  */
> -  emit_insn (gen_rtx_SET (operands[8], plus_constant (DImode, base, 
> adj_off)));
> +  emit_insn (gen_rtx_SET (operands[8], plus_constant (Pmode, base, 
> adj_off)));
>/* Emit ldp/stp instructions.  */
>t1 = gen_rtx_SET (operands[0], operands[1]);
>t2 = gen_rtx_SET (operands[2], operands[3]);



[Patch AArch64] Refactor and clean up some of the sched_fusion handling

2016-06-03 Thread James Greenhalgh

Hi,

I had a little trouble understanding the sched_fusion code, so I refactored
it and added some comments. I think this is a bit easier to read, but
I'm equally happy dropping the patch.

Bootstrapped on aarch64-none-linux-gnu.

OK?

Thanks,
James

---
2016-06-03  James Greenhalgh  

* config/aarch64/aarch64.c
(aarch64_mode_valid_for_sched_fusion_p): Make easier on the eye.
(sched_fusion_type): Rename to...
(aarch64_sched_fusion_type): ...this.
(fusion_load_store): Rename to...
(aarch64_fusion_load_store): ...this. Simplify some logic.
(aarch64_sched_fusion_priority): Rename some variables.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ad07fe1..9435ca1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3836,8 +3836,10 @@ offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
 static bool
 aarch64_mode_valid_for_sched_fusion_p (machine_mode mode)
 {
-  return mode == SImode || mode == DImode
-	 || mode == SFmode || mode == DFmode
+  return mode == SImode
+	 || mode == DImode
+	 || mode == SFmode
+	 || mode == DFmode
 	 || (aarch64_vector_mode_supported_p (mode)
 	 && GET_MODE_SIZE (mode) == 8);
 }
@@ -13193,7 +13195,7 @@ extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset)
 }
 
 /* Types for scheduling fusion.  */
-enum sched_fusion_type
+enum aarch64_sched_fusion_type
 {
   SCHED_FUSION_NONE = 0,
   SCHED_FUSION_LD_SIGN_EXTEND,
@@ -13207,13 +13209,14 @@ enum sched_fusion_type
extract the two parts and set to BASE and OFFSET.  Return scheduling
fusion type this INSN is.  */
 
-static enum sched_fusion_type
-fusion_load_store (rtx_insn *insn, rtx *base, rtx *offset)
+static enum aarch64_sched_fusion_type
+aarch64_fusion_load_store (rtx_insn *insn, rtx *base, rtx *offset)
 {
   rtx x, dest, src;
-  enum sched_fusion_type fusion = SCHED_FUSION_LD;
+  enum aarch64_sched_fusion_type fusion = SCHED_FUSION_LD;
 
   gcc_assert (INSN_P (insn));
+
   x = PATTERN (insn);
   if (GET_CODE (x) != SET)
 return SCHED_FUSION_NONE;
@@ -13226,24 +13229,22 @@ fusion_load_store (rtx_insn *insn, rtx *base, rtx *offset)
   if (!aarch64_mode_valid_for_sched_fusion_p (dest_mode))
 return SCHED_FUSION_NONE;
 
-  if (GET_CODE (src) == SIGN_EXTEND)
-{
-  fusion = SCHED_FUSION_LD_SIGN_EXTEND;
-  src = XEXP (src, 0);
-  if (GET_CODE (src) != MEM || GET_MODE (src) != SImode)
-	return SCHED_FUSION_NONE;
-}
-  else if (GET_CODE (src) == ZERO_EXTEND)
+  if (GET_CODE (src) == SIGN_EXTEND
+  || GET_CODE (src) == ZERO_EXTEND)
 {
-  fusion = SCHED_FUSION_LD_ZERO_EXTEND;
+  fusion = GET_CODE (src) == SIGN_EXTEND
+	   ? SCHED_FUSION_LD_SIGN_EXTEND
+	   : SCHED_FUSION_LD_ZERO_EXTEND;
   src = XEXP (src, 0);
-  if (GET_CODE (src) != MEM || GET_MODE (src) != SImode)
+  if (GET_CODE (src) != MEM
+	  || GET_MODE (src) != SImode)
 	return SCHED_FUSION_NONE;
 }
 
   if (GET_CODE (src) == MEM && REG_P (dest))
 extract_base_offset_in_addr (src, base, offset);
-  else if (GET_CODE (dest) == MEM && (REG_P (src) || src == const0_rtx))
+  else if (GET_CODE (dest) == MEM
+	   && (REG_P (src) || src == const0_rtx))
 {
   fusion = SCHED_FUSION_ST;
   extract_base_offset_in_addr (dest, base, offset);
@@ -13270,36 +13271,35 @@ static void
 aarch64_sched_fusion_priority (rtx_insn *insn, int max_pri,
 			   int *fusion_pri, int *pri)
 {
-  int tmp, off_val;
+  int off_val;
   rtx base, offset;
-  enum sched_fusion_type fusion;
+  enum aarch64_sched_fusion_type fusion;
+
+  /* The highest priority we want to return is (MAX_PRI - 1).  */
+  max_pri--;
 
   gcc_assert (INSN_P (insn));
 
-  tmp = max_pri - 1;
-  fusion = fusion_load_store (insn, &base, &offset);
+  fusion = aarch64_fusion_load_store (insn, &base, &offset);
+
   if (fusion == SCHED_FUSION_NONE)
 {
-  *pri = tmp;
-  *fusion_pri = tmp;
+  *pri = max_pri;
+  *fusion_pri = max_pri;
   return;
 }
 
-  /* Set FUSION_PRI according to fusion type and base register.  */
-  *fusion_pri = tmp - fusion * FIRST_PSEUDO_REGISTER - REGNO (base);
-
-  /* Calculate PRI.  */
-  tmp /= 2;
+  /* Set FUSION_PRI according to fusion type and base register.
+ This gives a two level sort on fusion priorities, first by type of
+ fusion, then by REGNO (base) within that band.  */
+  *fusion_pri = max_pri - (fusion * FIRST_PSEUDO_REGISTER) - REGNO (base);
 
   /* INSN with smaller offset goes first.  */
   off_val = (int)(INTVAL (offset));
   if (off_val >= 0)
-tmp -= (off_val & 0xf);
+*pri = (max_pri / 2) - (off_val & 0xf);
   else
-tmp += ((- off_val) & 0xf);
-
-  *pri = tmp;
-  return;
+*pri = (max_pri / 2) + ((- off_val) & 0xf);
 }
 
 /* Given OPERANDS of consecutive load/store, check if we can merge


Re: [PATCH][combine] PR middle-end/71074 Check that const_op is >= 0 before potentially shifting in simplify_comparison

2016-06-03 Thread Bernd Schmidt

On 06/03/2016 10:35 AM, Kyrill Tkachov wrote:


Here is the patch with the changes discussed.
Bootstrapped and tested on arm-none-linux-gnueabihf,
aarch64-none-linux-gnu and x86_64-linux.
Is this ok?


I think so, but since this is one of those situations where I'm 
essentially approving my own patch, let's wait a few days in case anyone 
has objections. Ok to commit afterwards.



 PR middle-end/71074
 * gcc.c-torture/compile/pr71074.c: New test.


Not strictly required in the testsuite I think, but it would be nice if 
the testcase was formatted better.



Bernd



Re: [AArch64 1/2] Refactor aarch64_operands_ok_for_ldpstp, aarch64_operands_adjust_ok_for_ldpstp

2016-06-03 Thread Kyrill Tkachov


On 17/05/16 10:22, James Greenhalgh wrote:

Hi,

These two functions are very similar and suffer from code duplication.
With a little bit of work we can reduce the strain on the reader by
refactoring the functions.

Essentially, we're going to remove the explicit references to reg_1,
reg_2, reg_3, reg_4 and keep these things in arrays instead, at which
point it becomes clear that these functions are very similar and can be
pulled together.

OK?

Bootstrapped and tested for aarch64-none-linux-gnu with no issues.

OK?

Thanks,
James

---
2016-05-17  James Greenhalgh  

* config/aarch64/aarch64.c
(aarch64_extract_ldpstp_operands): New.
(aarch64_ldpstp_ops_same_reg_class_p): Likewise.
(aarch64_ldpstp_load_regs_clobber_base_p): Likewise.
(aarch64_ldpstp_offsets_consecutive_p): Likewise.
(aarch64_operands_ok_for_ldpstp_1): Likewise.
(aarch64_operands_ok_for_ldpstp): Refactor to
aarch64_operands_ok_for_ldpstp_1.
(aarch64_operands_adjust_ok_for_ldpstp): Likewise.



FWIW I looked at this when it was first sent out and it looked ok to me
(but I cannot approve).

Same with patch 2/2.

Kyrill


Re: [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models

2016-06-03 Thread Bernd Schmidt

On 06/02/2016 06:53 PM, James Greenhalgh wrote:

As I iterated through versions of this patch set, I realised that all we
really wanted for ifcvt was a way to estimate the cost of a branch in units
that were comparable to the cost of instructions. The trouble with BRANCH_COST
wasn't that it was returning a magic number, it was just that it was returning
a magic number which had inconsistent meanings in the compiler. Otherwise,
BRANCH_COST was a straightforward, low-complexity target hook.


[...]


Having worked through the patch set, I'd say it is probably a small
improvement over what we currently do, but I'm not very happy with it. I'm
posting it for comment so we can discuss any directions for costs that I
haven't thought about or prototyped. I'm also happy to drop the costs
rewrite if this seems like complexity for no benefit.

Any thoughts?


I think it all looks fairly reasonable, and on the whole lower 
complexity is likely a better approach. A few comments on individual 
patches:



+unsigned int
+default_rtx_branch_cost (bool speed_p,
+bool predictable_p)


No need to wrap the line.


+noce_estimate_conversion_profitable_p (struct noce_if_info *if_info,
+  unsigned int ninsns)
+{
+  return (if_info->rtx_edge_cost >= ninsns * COSTS_N_INSNS (1));


Please no parens around return. There are several examples across the 
series.


NINSNS is the number of simple instructions we're going to add, right? 
How about the instructions we're going to remove, shouldn't these be 
counted too? I think that kind of thing was implicit in the old tests vs 
branch_cost.



   if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
 predictable_edge_p (then_edge));
+  if_info.rtx_edge_cost
+= targetm.rtx_branch_cost (optimize_bb_for_speed_p (test_bb),
+  predictable_edge_p (then_edge));


This I have the most problems with, mostly as an issue with naming. 
Calling it an edge_cost implies that it depends on whether the branch is 
taken or not, which I believe is not the case. Maybe the interface ought 
to be able to provide taken/not-taken information, although I can't 
off-hand think of a way to make use of such information.


Here, I'd rather call the field branch_cost, but there's already one 
with that name. Are there still places that use the old one after your 
patch series?


Hmm, I guess information about whether the branch is likely taken/not 
taken/unpredictable would be of use to add the instructions behind it 
into the cost of the existing code.



+/* Return TRUE if CODE is an RTX comparison operator.  */
+
+static bool
+noce_code_is_comparison_p (rtx_code code)


Isn't there some way to do this based on GET_RTX_CLASS?

In the noce_cmove_arith patch, is it possible to just construct the 
actual sequence we want to use and test its cost (much like the 
combiner's approach), rather than building up a random one for 
estimation? Seems like bailing out early based on a cost estimate is no 
longer profitable for compile-time if getting the estimate is as much 
work as doing the conversion in the first place.



I've bootstrapped and tested the patch set on x86_64 and aarch64, but
they probably need some further polishing if we were to decide this was a
useful direction.


Also, I'd like some information on what this does to code generation on 
a few different targets.



Bernd


Re: [PATCH] Warn about return with a void expression with -Wreturn-type.

2016-06-03 Thread Bernd Schmidt

On 06/02/2016 12:03 PM, Marcin Baczyński wrote:

2016-06-02 4:51 GMT+02:00 Martin Sebor :

As a disclaimer, someone else endowed with those special powers
will need to approve your final patch.  If you don't get a timely
approval please ping the patch weekly.



   * doc/invoke.texi (-Wreturn-type): Mention not warning on return with
a void expression in a void function.


If Martin is happy with this then OK.


Bernd



Re: [PATCH] Warn about return with a void expression with -Wreturn-type.

2016-06-03 Thread Marcin Baczyński
2016-06-03 11:36 GMT+02:00 Bernd Schmidt :
> On 06/02/2016 12:03 PM, Marcin Baczyński wrote:
>>
>> 2016-06-02 4:51 GMT+02:00 Martin Sebor :
>>>
>>> As a disclaimer, someone else endowed with those special powers
>>> will need to approve your final patch.  If you don't get a timely
>>> approval please ping the patch weekly.
>
>
>>* doc/invoke.texi (-Wreturn-type): Mention not warning on return with
>> a void expression in a void function.
>
>
> If Martin is happy with this then OK.

Thanks!
Could someone with repository write access commit the patch, please?
>
>
> Bernd
>


Re: [PATCH, rs6000] Fix PR70957 (skip vsx-elemrev-[24].c tests for a downlevel assembler)

2016-06-03 Thread Segher Boessenkool
On Thu, Jun 02, 2016 at 03:50:21PM -0500, Bill Schmidt wrote:
> The only way I know to make the test predictable is to use a run-time test to 
> check whether
> P9 vector instructions will execute.  Thus this solution.  I’ve verified we 
> no longer have test
> failures on machines with a downlevel assembler, and the tests run correctly 
> on machines
> with an up-to-date assembler.  Is this ok for trunk and 6.2?

This is fine.  Okay for both.  Thanks,


Segher


Re: [RFC: Patch 1/6] New target hook: rtx_branch_cost

2016-06-03 Thread Richard Biener
On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch introduces a new target hook, to be used like BRANCH_COST but
> with a guaranteed unit of measurement. We want this to break away from
> the current ambiguous uses of BRANCH_COST.
>
> BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
> instruction counts - where it is used as the limit on the number of new
> instructions we are permitted to generate. The other (after multiplying
> by COSTS_N_INSNS (1)) directly against RTX costs.
>
> Of these, a comparison against RTX costs is the more easily understood
> metric across the compiler, and the one I've pulled out to the new hook.
> To keep things consistent for targets which don't migrate, this new hook
> has a default value of BRANCH_COST * COSTS_N_INSNS (1).
>
> OK?

How does the caller compute "predictable"?  There are some archs where
an information on whether this is a forward or backward jump is more
useful I guess.  Also at least for !speed_p the distance of the branch is
important given not all targets support arbitrary branch offsets.

I remember that at the last Cauldron we discussed to change things to
compare costs of sequences of instructions rather than giving targets no
context with just asking for single (sub-)insn rtx costs.

That said, the patch is certainly an improvement.

Thanks,
Richard.

> Thanks,
> James
>
> ---
> 2016-06-02  James Greenhalgh  
>
> * target.def (rtx_branch_cost): New.
> * doc/tm.texi.in (TARGET_RTX_BRANCH_COST): Document it.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_rtx_branch_cost): New.
> * targhooks.c (default_rtx_branch_cost): New.


[PATCH][AArch64] Increase code alignment

2016-06-03 Thread Wilco Dijkstra
Increase loop alignment on Cortex cores to 8 and set function alignment to 16.  
This makes things consistent across big.LITTLE cores, improves performance of 
benchmarks with tight loops and reduces performance variations due to small 
changes in code layout. It looks almost all AArch64 cores agree on alignment of 
16 for function, and 8 for loops and branches, so we should change 
-mcpu=generic as well if there is no disagreement - feedback welcome.

OK for commit?

ChangeLog:

2016-05-03  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (cortexa53_tunings):
Increase loop alignment to 8.  Set function alignment to 16.
(cortexa35_tunings): Likewise.
(cortexa57_tunings): Increase loop alignment to 8.
(cortexa72_tunings): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
12e5017a6d4b0ab15dcf932014980fdbd1a598ee..6ea10a187a1f895a399515b8cd0da0be63be827a
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -424,9 +424,9 @@ static const struct tune_params cortexa35_tunings =
   1, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  8,   /* function_align.  */
+  16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
@@ -449,9 +449,9 @@ static const struct tune_params cortexa53_tunings =
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  8,   /* function_align.  */
+  16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
@@ -476,7 +476,7 @@ static const struct tune_params cortexa57_tunings =
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
@@ -502,7 +502,7 @@ static const struct tune_params cortexa72_tunings =
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */



[PATCH] Do not instrument string cst w/ unknown decl size (PR, sanitizer/71378)

2016-06-03 Thread Martin Liška
Hi!

As seen in the issue, we try to instrument a global variable that contains a 
string
constant. Following patch does not instrument in the size is variable (VLA).

Patch survives regression tests and bootstraps on x86_64-linux.
It's questionable whether the same situation can also happen in 
asan_finish_file:

  FOR_EACH_DEFINED_VARIABLE (vnode)
if (TREE_ASM_WRITTEN (vnode->decl)
&& asan_protect_global (vnode->decl))
  asan_add_global (vnode->decl, TREE_TYPE (type), v);

?

Ready to be installed?
Thanks,
Martin
>From 341129d1277cacdee7bcd2129ad8282d9319b11d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 Jun 2016 10:23:57 +0200
Subject: [PATCH] Do not instrument string cst w/ unknown decl size (PR
 sanitizer/71378)

gcc/ChangeLog:

2016-06-03  Martin Liska  

	* asan.c (add_string_csts): Instrument just string csts with a
	known decl size.

gcc/testsuite/ChangeLog:

2016-06-03  Martin Liska  

	* g++.dg/asan/pr71378.C: New test.
---
 gcc/asan.c  |  6 --
 gcc/testsuite/g++.dg/asan/pr71378.C | 11 +++
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/asan/pr71378.C

diff --git a/gcc/asan.c b/gcc/asan.c
index 71095fb..0dae480 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2474,8 +2474,10 @@ add_string_csts (constant_descriptor_tree **slot,
   && TREE_ASM_WRITTEN (desc->value)
   && asan_protect_global (desc->value))
 {
-  asan_add_global (SYMBOL_REF_DECL (XEXP (desc->rtl, 0)),
-		   aascd->type, aascd->v);
+  tree symbol = SYMBOL_REF_DECL (XEXP (desc->rtl, 0));
+
+  if (tree_fits_uhwi_p (DECL_SIZE_UNIT (symbol)))
+	asan_add_global (symbol, aascd->type, aascd->v);
 }
   return 1;
 }
diff --git a/gcc/testsuite/g++.dg/asan/pr71378.C b/gcc/testsuite/g++.dg/asan/pr71378.C
new file mode 100644
index 000..166eae1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/pr71378.C
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+class A {
+public:
+  int GetLen();
+};
+class B {
+  A s_MDSPartIDStr;
+  void FillLoadPartitionInfo();
+};
+void B::FillLoadPartitionInfo() { char a[s_MDSPartIDStr.GetLen()] = "foo"; }
-- 
2.8.3



Re: [PATCH][1/3] Add loop_vinfo to vect_get_vec_def_for_operand

2016-06-03 Thread Richard Biener
On Thu, Jun 2, 2016 at 6:04 PM, Alan Hayward  wrote:
>
>> This patch simply adds loop_vinfo as an extra argument to
>> vect_get_vec_def_for_operand and only generates a stmt_vinfo if required.
>> This is a required cleanup for patch [2/3].
>> Tested on x86 and aarch64.
>>
>> gcc/
>> * tree-vectorizer.h (vect_get_vec_def_for_operand): Pass loop_vinfo in.
>> * tree-vect-stmts.c (vect_get_vec_def_for_operand): Pass loop_vinfo in.
>> (vect_get_vec_defs): Pass down loop_vinfo.
>> (vectorizable_mask_load_store): Likewise.
>> (vectorizable_call): Likewise.
>> (vectorizable_simd_clone_call): Likewise.
>> (vect_get_loop_based_defs): Likewise.
>> (vectorizable_conversion): Likewise.
>> (vectorizable_operation): Likewise.
>> (vectorizable_store): Likewise.
>> (vectorizable_load): Likewise.
>> (vectorizable_condition): Likewise.
>> (vectorizable_comparison): Likewise.
>> * tree-vect-loop.c (get_initial_def_for_induction): Likewise.
>> (get_initial_def_for_reduction): Likewise.
>> (vectorizable_reduction):  Likewise.
>
> New version. I've removed the additional loop_vinfo arg.
> Instead, I've split part of vect_get_vec_def_for_operand into a new
> function vect_get_vec_def_for_operand_1.
> My [2/3] patch will call vect_get_vec_def_for_operand_1 direct from
> vectorizeable_live_operation

Ok.

Richard.

> gcc/
> * tree-vectorizer.h (vect_get_vec_def_for_operand_1): New
> * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): New
> (vect_get_vec_def_for_operand): Split out code.
>
>
>
>
>
>
> Alan.
>


Re: [PATCH] Do not instrument string cst w/ unknown decl size (PR, sanitizer/71378)

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 12:53:06PM +0200, Martin Liška wrote:
> As seen in the issue, we try to instrument a global variable that contains a 
> string
> constant. Following patch does not instrument in the size is variable (VLA).
> 
> Patch survives regression tests and bootstraps on x86_64-linux.
> It's questionable whether the same situation can also happen in 
> asan_finish_file:
> 
>   FOR_EACH_DEFINED_VARIABLE (vnode)
>   if (TREE_ASM_WRITTEN (vnode->decl)
>   && asan_protect_global (vnode->decl))
> asan_add_global (vnode->decl, TREE_TYPE (type), v);

I think the STRING_CST with non-constant size is already a bug, so this
patch looks to me just like a workaround for a bug that is somewhere else.

We should either reject such bogosity already in the FE (e.g. C does not
allow this), or if we really want to support it, it should be genericized
differently (the string must have a fixed size, and either we allow in the
IL assignment of the fixed size array to the VLA, or it should be
genericized e.g. as memcpy from the fixed size STRING_CST to the start of
the VLA.

Jakub


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-03 Thread Richard Biener
On Thu, Jun 2, 2016 at 6:11 PM, Alan Hayward  wrote:
>
>
> On 01/06/2016 10:51, "Richard Biener"  wrote:
>
>>On Wed, Jun 1, 2016 at 10:46 AM, Alan Hayward 
>>wrote:
>>>
>>>
>>> On 30/05/2016 14:22, "Richard Biener" 
>>>wrote:
>>>
On Fri, May 27, 2016 at 5:12 PM, Alan Hayward 
wrote:
>
> On 27/05/2016 12:41, "Richard Biener" 
>wrote:
>
>>On Fri, May 27, 2016 at 11:09 AM, Alan Hayward 
>>wrote:
>>>
>
>>
>>The rest of the changes look ok to me.
>
> Does that include PATCH 1/3  ?

I don't like how 1/3 ends up looking :/  So what was the alternative
again?
I looked into 1/3 and what it takes to remove the 'stmt' argument and
instead pass in a vect_def_type.  It's a bit twisted and just adding
another
argument (the loop_vinfo) doesn't help things here.

So - instead of 1/3 you might want to split out a function

tree
vect_get_vec_def_for_operand_1 (gimple *def_stmt, enum vect_def_type
dt, tree vectype)
{
  switch (dt)
{
...
}
}

and for constant/external force vectype != NULL.
>>>
>>> I’m still a little confused as to exactly what you want here.
>>>
>>> From your two comments I think you wanted me to completely remove the
>>> boolean type check and the vect_init_vector call. But if I remove that
>>> then other code paths will break.
>>>
>>> However, I’ve just realised that in vectorized_live_operation I already
>>> have the def stmt and I can easily get hold of dt from
>>>STMT_VINFO_DEF_TYPE.
>>> Which means I can call vect_get_vec_def_for_operand_1 from
>>> vectorized_live_operation.
>>>
>>> I’ve put together a version where I have:
>>>
>>> tree
>>> vect_get_vec_def_for_operand_1 (gimple *def_stmt, enum vect_def_type dt)
>>> {
>>>
>>>  switch (dt)
>>>  {
>>>case vect_internal_def || vect_external_def:
>>>  gcc_unreachable ()
>>>
>>>.. code for for all other cases..
>>>  }
>>> }
>>>
>>> /* Used by existing code  */
>>> tree
>>> vect_get_vec_def_for_operand (tree op, gimple *stmt, tree vectype)
>>> {
>>>   vect_is_simple_use(op, loop_vinfo, &def_stmt, &dt); ..and the dump
>>>code
>>>
>>>
>>>   If dt == internal_def || vect_external_def:
>>>   .. Check for BOOLEAN_TYPE ..
>>>   return vect_init_vector (stmt, op, vector_type, NULL);
>>>
>>>   else
>>> vect_get_vec_def_for_operand_1 (def_stmt, dt)
>>> }
>>>
>>>
>>> Does that look better?
>>
>>Yes!
>>
>>Thanks,
>>Richard.
>>
>
>
> This version of the patch addresses the simple+invariant issues
> (and patch [3/3] optimizes it).
>
> Also includes a small change to handle when the vect pattern has introduced
> new pattern match statements (in vectorizable_live_operation if
> STMT_VINFO_RELATED_STMT is not null then use it instead of stmt).

This is ok.

Thanks,
Richard.

> gcc/
> * tree-vect-loop.c (vect_analyze_loop_operations): Allow live stmts.
> (vectorizable_reduction): Check for new relevant state.
> (vectorizable_live_operation): vectorize live stmts using
> BIT_FIELD_REF.  Remove special case for gimple assigns stmts.
> * tree-vect-stmts.c (is_simple_and_all_uses_invariant): New function.
> (vect_stmt_relevant_p): Check for stmts which are only used live.
> (process_use): Use of a stmt does not inherit it's live value.
> (vect_mark_stmts_to_be_vectorized): Simplify relevance inheritance.
> (vect_analyze_stmt): Check for new relevant state.
> *tree-vectorizer.h (vect_relevant): New entry for a stmt which is used
> outside the loop, but not inside it.
>
> testsuite/
> * gcc.dg/tree-ssa/pr64183.c: Ensure test does not vectorize.
> * testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c: Remove xfail.
> * gcc.dg/vect/vect-live-1.c: New test.
> * gcc.dg/vect/vect-live-2.c: New test.
> * gcc.dg/vect/vect-live-3.c: New test.
> * gcc.dg/vect/vect-live-4.c: New test.
> * gcc.dg/vect/vect-live-5.c: New test.
> * gcc.dg/vect/vect-live-slp-1.c: New test.
> * gcc.dg/vect/vect-live-slp-2.c: New test.
> * gcc.dg/vect/vect-live-slp-3.c: New test.
>
>
> Alan.
>


Re: [PATCH][3/3] No need to vectorize simple only-live stmts

2016-06-03 Thread Richard Biener
On Thu, Jun 2, 2016 at 6:14 PM, Alan Hayward  wrote:
>
>>Statements which are live but not relevant need marking to ensure they are
>>vectorized.
>>
>>Live statements which are simple and all uses of them are invariant do not
>>need
>>to be vectorized.
>>
>>This patch adds a check to make sure those stmts which pass both the above
>>checks are not vectorized and then discarded.
>>
>>Tested on x86 and aarch64.
>>
>>
>>gcc/
>>*tree-vect-stmts.c (vect_stmt_relevant_p): Do not vectorize non
>>live
>>relevant stmts which are simple and invariant.
>>
>>testsuite/
>>* gcc.dg/vect/vect-live-slp-5.c: Remove dg check.
>
>
> This version adds an addition relevance check in
> vectorizable_live_operation.
> It requires the "Remove duplicated GOMP SIMD LANE code” to work.
>
> gcc/
> \* tree-vect-stmts.c (vect_stmt_relevant_p): Do not vectorize non live
> relevant stmts which are simple and invariant.
> \* tree-vect-loop.c (vectorizable_live_operation): Check relevance
> instead of simple and invariant

Ok.

Richard.

> testsuite/
> \* gcc.dg/vect/vect-live-slp-5.c: Remove dg check.
>
>
>
>
> Alan.
>


Re: Remove duplicated GOMP SIMD LANE code

2016-06-03 Thread Richard Biener
On Thu, Jun 2, 2016 at 6:03 PM, Alan Hayward  wrote:
> The IFN_GOMP_SIMD_LANE code in vectorizable_call is essentially a
> duplicate of
> the code in vectorizable_live_operation. They both replace all uses
> outside the
> loop with the constant VF - 1.
>
> Note that my patch to vectorize inductions that are live after the loop
> will
> also remove the IFN_GOMP_SIMD_LANE code in vectorizable_live_operation.
>
> This patch is required in order for the follow on optimisation (No need to
> Vectorise simple only-live stmts) to work.
>
> Tested with libgomp on x86 and aarch64

Ok.

Thanks,
Richard.

> gcc/
> * tree-vect-stmts.c (vectorizable_call) Remove GOMP_SIMD_LANE code.
>
> Alan.
>


[PATCH] Print column numbers in inclusion trace consistently.

2016-06-03 Thread Marcin Baczyński
Hi,
the patch below fixes PR/42014. Although the fix itself seems easy enough,
I have a problem with the test. Is there a way to match the output before
the "warning:" line? dg-{begin,end}-multiline-output doesn't do the job, or
at least I don't know how to convince it.

Bootstrapped on x86_64 linux.

Thanks,
Marcin


gcc/ChangeLog:

   PR/42014

   * diagnostic.c (diagnostic_report_current_module): Print column numbers
for all mentioned files if context->show_column.

gcc/testsuite/ChangeLog:

   PR/42014

   * gcc.dg/inclusion-trace-column.i: New test.
---
 gcc/diagnostic.c  | 12 +---
 gcc/testsuite/gcc.dg/inclusion-trace-column.i | 16 
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/inclusion-trace-column.i

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 8106172..05037ba 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -537,9 +537,15 @@ diagnostic_report_current_module (diagnostic_context 
*context, location_t where)
  while (! MAIN_FILE_P (map))
{
  map = INCLUDED_FROM (line_table, map);
- pp_verbatim (context->printer,
-  ",\n from %r%s:%d%R", "locus",
-  LINEMAP_FILE (map), LAST_SOURCE_LINE (map));
+  if (context->show_column)
+pp_verbatim (context->printer,
+ ",\n from %r%s:%d:%d%R", "locus",
+ LINEMAP_FILE (map),
+ LAST_SOURCE_LINE (map), LAST_SOURCE_COLUMN (map));
+  else
+pp_verbatim (context->printer,
+ ",\n from %r%s:%d%R", "locus",
+ LINEMAP_FILE (map), LAST_SOURCE_LINE (map));
}
  pp_verbatim (context->printer, ":");
  pp_newline (context->printer);
diff --git a/gcc/testsuite/gcc.dg/inclusion-trace-column.i 
b/gcc/testsuite/gcc.dg/inclusion-trace-column.i
new file mode 100644
index 000..1fb8923
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/inclusion-trace-column.i
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fshow-column -Wreturn-type" } */
+/* { dg-begin-multiline-output "" }
+In file included from b.h:1:0,
+ from a.h:1:0,
+ from t.c:1:0:
+   { dg-end-multiline-output "" } */
+/* PR 42014 */
+# 1 "t.c"
+# 1 "a.h" 1
+# 1 "b.h" 1
+# 1 "c.h" 1
+double f () {}  /* { dg-warning "reaches end" "no return" { target *-*-* } 1 } 
*/
+# 1 "b.h" 2
+# 1 "a.h" 2
+# 1 "t.c" 2
-- 
2.8.3



Re: [PATCH] Do not instrument string cst w/ unknown decl size (PR, sanitizer/71378)

2016-06-03 Thread Martin Liška
On 06/03/2016 01:00 PM, Jakub Jelinek wrote:
> I think the STRING_CST with non-constant size is already a bug, so this
> patch looks to me just like a workaround for a bug that is somewhere else.

Ah, thanks for clarification, it looked weird to me. I've just reclassified
the PR, btw. clang++ rejects the snippet:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71378#c8

Martin



[PATCH] rs6000: Remove the ancient mfcr peepholes

2016-06-03 Thread Segher Boessenkool
These peepholes replace two mfcr;mask sequences by one mfcr;mask;mask
sequence.  On modern cpus, the original mfcr's were actually mfocrf,
but the new insn is an actual heavy-weight mfcr.  This is very bad
for performance.

The comment says there is a three cycle delay between two consecutive
mfcr insns.  This may have been true on rios, and it's true on 604,
but on 603, 750, 7400 it is just a single cycle (on 7450 it is two).

This is also a define_peephole, and we should get rid of those.

So this patch just removes the peepholes; the benefit is marginal at
best, and it so very hurts in other cases.

Pre-approved by David, committing to trunk,


Segher


2016-06-03  Segher Boessenkool  

* config/rs6000/rs6000.md (define_peepholes for two mfcr's): Delete.

---
 gcc/config/rs6000/rs6000.md | 31 ---
 1 file changed, 31 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b1b7692..accdf75 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -11255,37 +11255,6 @@ (define_split
(const_int 0)))]
   "")
 
-;; There is a 3 cycle delay between consecutive mfcr instructions
-;; so it is useful to combine 2 scc instructions to use only one mfcr.
-
-(define_peephole
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
-   (match_operator:SI 1 "scc_comparison_operator"
-  [(match_operand 2 "cc_reg_operand" "y")
-   (const_int 0)]))
-   (set (match_operand:SI 3 "gpc_reg_operand" "=r")
-   (match_operator:SI 4 "scc_comparison_operator"
-  [(match_operand 5 "cc_reg_operand" "y")
-   (const_int 0)]))]
-  "REGNO (operands[2]) != REGNO (operands[5])"
-  "mfcr %3\;rlwinm %0,%3,%J1,1\;rlwinm %3,%3,%J4,1"
-  [(set_attr "type" "mfcr")
-   (set_attr "length" "12")])
-
-(define_peephole
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
-   (match_operator:DI 1 "scc_comparison_operator"
-  [(match_operand 2 "cc_reg_operand" "y")
-   (const_int 0)]))
-   (set (match_operand:DI 3 "gpc_reg_operand" "=r")
-   (match_operator:DI 4 "scc_comparison_operator"
-  [(match_operand 5 "cc_reg_operand" "y")
-   (const_int 0)]))]
-  "TARGET_POWERPC64 && REGNO (operands[2]) != REGNO (operands[5])"
-  "mfcr %3\;rlwinm %0,%3,%J1,1\;rlwinm %3,%3,%J4,1"
-  [(set_attr "type" "mfcr")
-   (set_attr "length" "12")])
-
 
 (define_mode_attr scc_eq_op2 [(SI "rKLI")
  (DI "rKJI")])
-- 
1.9.3



[PATCH][AArch64] Cleanup -mpc-relative-loads

2016-06-03 Thread Wilco Dijkstra
This patch cleans up the -mpc-relative-loads option processing.  Rename to 
avoid the
"no*" name and confusing !no* expressions.  Fix the option processing code to 
implement
-mno-pc-relative-loads rather than ignore it.

OK for commit?

ChangeLog:
2016-06-03  Wilco Dijkstra  

* config/aarch64/aarch64.opt
(mpc-relative-literal-loads): Rename internal option name.
* config/aarch64/aarch64.c
(aarch64_nopcrelative_literal_loads): Rename to 
aarch64_pcrelative_literal_loads.
(aarch64_expand_mov_immediate): Likewise.
(aarch64_secondary_reload): Likewise.
(aarch64_can_use_per_function_literal_pools_p): Likewise.
(aarch64_override_options_after_change_1): Rename and simplify logic.
(aarch64_classify_symbol): Merge large model checks into switch,
remove pc-relative load check.
---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
be27243f690a2dd83935c67d2956e51b24b7..2394f91de655c35f5ce8798c845391ae0b289e46
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -152,7 +152,7 @@ enum aarch64_processor aarch64_tune = cortexa53;
 unsigned long aarch64_tune_flags = 0;
 
 /* Global flag for PC relative loads.  */
-bool aarch64_nopcrelative_literal_loads;
+bool aarch64_pcrelative_literal_loads;
 
 /* Support for command line parsing of boolean flags in the tuning
structures.  */
@@ -1703,7 +1703,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 we need to expand the literal pool access carefully.
 This is something that needs to be done in a number
 of places, so could well live as a separate function.  */
- if (aarch64_nopcrelative_literal_loads)
+ if (!aarch64_pcrelative_literal_loads)
{
  gcc_assert (can_create_pseudo_p ());
  base = gen_reg_rtx (ptr_mode);
@@ -4043,7 +4043,7 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
  return ((GET_CODE (sym) == LABEL_REF
   || (GET_CODE (sym) == SYMBOL_REF
   && CONSTANT_POOL_ADDRESS_P (sym)
-  && !aarch64_nopcrelative_literal_loads)));
+  && aarch64_pcrelative_literal_loads)));
}
   return false;
 
@@ -5092,7 +5092,7 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx 
x,
   if (MEM_P (x) && GET_CODE (x) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (x)
   && (SCALAR_FLOAT_MODE_P (GET_MODE (x))
  || targetm.vector_mode_supported_p (GET_MODE (x)))
-  && aarch64_nopcrelative_literal_loads)
+  && !aarch64_pcrelative_literal_loads)
 {
   sri->icode = aarch64_constant_pool_reload_icode (mode);
   return NO_REGS;
@@ -5426,7 +5426,7 @@ aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
 static inline bool
 aarch64_can_use_per_function_literal_pools_p (void)
 {
-  return (!aarch64_nopcrelative_literal_loads
+  return (aarch64_pcrelative_literal_loads
  || aarch64_cmodel == AARCH64_CMODEL_LARGE);
 }
 
@@ -7952,32 +7952,31 @@ aarch64_override_options_after_change_1 (struct 
gcc_options *opts)
opts->x_align_functions = aarch64_tune_params.function_align;
 }
 
-  /* If nopcrelative_literal_loads is set on the command line, this
+  /* We default to no pc-relative literal loads.  */
+
+  aarch64_pcrelative_literal_loads = false;
+
+  /* If -mpc-relative-literal-loads is set on the command line, this
  implies that the user asked for PC relative literal loads.  */
-  if (opts->x_nopcrelative_literal_loads == 1)
-aarch64_nopcrelative_literal_loads = false;
+  if (opts->x_pcrelative_literal_loads == 1)
+aarch64_pcrelative_literal_loads = true;
 
-  /* If it is not set on the command line, we default to no pc
- relative literal loads, unless the workaround for Cortex-A53
- erratum 843419 is in effect.  */
   /* This is PR70113. When building the Linux kernel with
  CONFIG_ARM64_ERRATUM_843419, support for relocations
  R_AARCH64_ADR_PREL_PG_HI21 and R_AARCH64_ADR_PREL_PG_HI21_NC is
  removed from the kernel to avoid loading objects with possibly
- offending sequences. With nopcrelative_literal_loads, we would
+ offending sequences.  Without -mpc-relative-literal-loads we would
  generate such relocations, preventing the kernel build from
  succeeding.  */
-  if (opts->x_nopcrelative_literal_loads == 2
-  && !TARGET_FIX_ERR_A53_843419)
-aarch64_nopcrelative_literal_loads = true;
+  if (opts->x_pcrelative_literal_loads == 2
+  && TARGET_FIX_ERR_A53_843419)
+aarch64_pcrelative_literal_loads = true;
 
-  /* In the tiny memory model it makes no sense
- to disallow non PC relative literal pool loads
- as many other things will break anyway.  */
-  if (opts->x_nopcrelative_literal_loads
-  && (aarch64_cmodel == AARCH64_CMODEL_TINY
- || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC))
-aarch64_nopcrelative_literal

[PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-03 Thread Michael Meissner
These patches were installed on the trunk on May 2nd, with a fix from Alan
Modra on May 11th.  Unless I here objections in the next few days, I will
commit these changes to the GCC 6.x branch.  These changes will allow people to
use complex __float128 types (via an attribute) on the PowerPC.

Note, we will need patches to libgcc to fully enable complex __float128 support
on the PowerPC.  These patches enable the compiler support, so that the libgcc
changes can be coded.

In addition to bootstrapping and regtesting on the PowerPC (little endian
power8), I also bootstrapped and regested the changes on x86_64 running RHEL
6.2.  There were no regressions in either case.

[gcc]
2016-06-02  Michael Meissner  

Back port from trunk
2016-05-11  Alan Modra  

* config/rs6000/rs6000.c (is_complex_IBM_long_double,
abi_v4_pass_in_fpr): New functions.
(rs6000_function_arg_boundary): Exclude complex IBM long double
from 64-bit alignment when ABI_V4.
(rs6000_function_arg, rs6000_function_arg_advance_1,
rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr.

Back port from trunk
2016-05-02  Michael Meissner  

* machmode.h (mode_complex): Add support to give the complex mode
for a given mode.
(GET_MODE_COMPLEX_MODE): Likewise.
* stor-layout.c (layout_type): For COMPLEX_TYPE, use the mode
stored by build_complex_type and gfc_build_complex_type instead of
trying to figure out the appropriate mode based on the size. Raise
an assertion error, if the type was not set.
* genmodes.c (struct mode_data): Add field for the complex type of
the given type.
(blank_mode): Likewise.
(make_complex_modes): Remember the complex mode created in the
base type.
(emit_mode_complex): Write out the mode_complex array to map a
type mode to the complex version.
(emit_insn_modes_c): Likewise.
* tree.c (build_complex_type): Set the complex type to use before
calling layout_type.
* config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Add
support for __float128 complex datatypes.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_complex_function_value): Likewise.
* config/rs6000/rs6000.h (FLOAT128_IEEE_P): Likewise.
__float128 and __ibm128 complex.
(FLOAT128_IBM_P): Likewise.
(ALTIVEC_ARG_MAX_RETURN): Likewise.
* doc/extend.texi (Additional Floating Types): Document that
-mfloat128 must be used to enable __float128.  Document complex
__float128 and __ibm128 support.

[gcc/fortran]
2016-06-02  Michael Meissner  

Back port from trunk
2016-05-02  Michael Meissner  

* trans-types.c (gfc_build_complex_type):

[gcc/testsuite]
2016-06-02  Michael Meissner  

Back port from trunk
2016-05-02  Michael Meissner  

* gcc.target/powerpc/float128-complex-1.c: New tests for complex
__float128.
* gcc.target/powerpc/float128-complex-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/machmode.h
===
--- gcc/machmode.h  (revision 237045)
+++ gcc/machmode.h  (working copy)
@@ -269,6 +269,10 @@ extern const unsigned char mode_wider[NU
 extern const unsigned char mode_2xwider[NUM_MACHINE_MODES];
 #define GET_MODE_2XWIDER_MODE(MODE) ((machine_mode) mode_2xwider[MODE])
 
+/* Get the complex mode from the component mode.  */
+extern const unsigned char mode_complex[NUM_MACHINE_MODES];
+#define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
+
 /* Return the mode for data of a given size SIZE and mode class CLASS.
If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE.
The value is BLKmode if no other mode is found.  */
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   (revision 237045)
+++ gcc/stor-layout.c   (working copy)
@@ -2146,11 +2146,13 @@ layout_type (tree type)
 
 case COMPLEX_TYPE:
   TYPE_UNSIGNED (type) = TYPE_UNSIGNED (TREE_TYPE (type));
-  SET_TYPE_MODE (type,
-mode_for_size (2 * TYPE_PRECISION (TREE_TYPE (type)),
-   (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE
-? MODE_COMPLEX_FLOAT : MODE_COMPLEX_INT),
-0));
+
+  /* build_complex_type and fortran's gfc_build_complex_type have set the
+expected mode to allow having multiple complex types for multiple
+floating point types that have the same size such as the PowerPC with
+__ibm128 and __float128.  */
+  gcc_assert (TYPE_MODE (type) != VOID

Fix continue prediction for C++

2016-06-03 Thread Jan Hubicka
Hi,
as noticed by Richard, genericize_continue_stmt doesn't really add the
prediction statement becuase it has no side effects, so we don't do
any continue prediction for C++.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* g++.dg/tree-ssa/pred-1.C: New testcase
* gcc.dg/tree-ssa/pred-1.c: New testcase
* cp-gimplify.c (genericize_continue_stmt): Force addition of
predict stmt.

Index: testsuite/g++.dg/tree-ssa/pred-1.C
===
--- testsuite/g++.dg/tree-ssa/pred-1.C  (revision 0)
+++ testsuite/g++.dg/tree-ssa/pred-1.C  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
+int a[100];
+void foo(int);
+main()
+{
+  int i;
+  for (i=0;i<100;i++)
+{
+  if (a[i])
+   continue;
+   foo(i);
+}
+}
+// { dg-final { scan-tree-dump "continue heuristics" "profile_estimate" } }
Index: testsuite/gcc.dg/tree-ssa/pred-1.c
===
--- testsuite/gcc.dg/tree-ssa/pred-1.c  (revision 0)
+++ testsuite/gcc.dg/tree-ssa/pred-1.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
+int a[100];
+void foo(int);
+int
+main()
+{
+  int i;
+  for (i=0;i<100;i++)
+{
+  if (a[i])
+   continue;
+   foo(i);
+}
+}
+// { dg-final { scan-tree-dump "continue heuristics" "profile_estimate" } }
Index: cp/cp-gimplify.c
===
--- cp/cp-gimplify.c(revision 237043)
+++ cp/cp-gimplify.c(working copy)
@@ -362,7 +362,7 @@ genericize_continue_stmt (tree *stmt_p)
   tree label = get_bc_label (bc_continue);
   location_t location = EXPR_LOCATION (*stmt_p);
   tree jump = build1_loc (location, GOTO_EXPR, void_type_node, label);
-  append_to_statement_list (pred, &stmt_list);
+  append_to_statement_list_force (pred, &stmt_list);
   append_to_statement_list (jump, &stmt_list);
   *stmt_p = stmt_list;
 }


[Patch, avr] Fix PR 71151

2016-06-03 Thread Senthil Kumar Selvaraj
Hi,

  This patch fixes PR 71151 by eliminating the
  TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
  JUMP_TABLES_IN_TEXT_SECTION to 1.

  As described in the bugzilla entry, this hook assumed it will get
  called only for jumptable rodata for functions. This was true until
  6.1, when a commit in varasm.c started calling the hook for mergeable
  string/constant data as well.

  This resulted in string constants ending up in a section intended for
  jumptables (flash), and broke code using those constants, which
  expects them to be present in rodata (SRAM).

  Given that the original reason for placing jumptables in a section was
  fixed by Johann in PR 63323, this patch restores the original
  behavior. Reg testing on both gcc-6-branch and trunk showed no regressions.

  As pointed out by Johann, this may end up increasing code
  size if there are lots of branches that cross the jump tables. I
  intend to propose a separate patch that gives additional information
  to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
  what type of function rodata is coming on. Johann also suggested
  handling jump table generation ourselves - I'll experiment with that
  some more.

  If ok, could someone commit please? Could you also backport to
  gcc-6-branch?

Regards
Senthil

gcc/ChangeLog

2016-06-03  Senthil Kumar Selvaraj  

* config/avr/avr.c (avr_asm_function_rodata_section): Remove.
* config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.

gcc/testsuite/ChangeLog

2016-06-03  Senthil Kumar Selvaraj  

* gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
* gcc/testsuite/gcc.target/avr/pr71151-2.c: New.

diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index ba5cd91..3cb8cb7 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -9488,65 +9488,6 @@ avr_asm_init_sections (void)
 }
 
 
-/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'.  */
-
-static section*
-avr_asm_function_rodata_section (tree decl)
-{
-  /* If a function is unused and optimized out by -ffunction-sections
- and --gc-sections, ensure that the same will happen for its jump
- tables by putting them into individual sections.  */
-
-  unsigned int flags;
-  section * frodata;
-
-  /* Get the frodata section from the default function in varasm.c
- but treat function-associated data-like jump tables as code
- rather than as user defined data.  AVR has no constant pools.  */
-  {
-int fdata = flag_data_sections;
-
-flag_data_sections = flag_function_sections;
-frodata = default_function_rodata_section (decl);
-flag_data_sections = fdata;
-flags = frodata->common.flags;
-  }
-
-  if (frodata != readonly_data_section
-  && flags & SECTION_NAMED)
-{
-  /* Adjust section flags and replace section name prefix.  */
-
-  unsigned int i;
-
-  static const char* const prefix[] =
-{
-  ".rodata",  ".progmem.gcc_sw_table",
-  ".gnu.linkonce.r.", ".gnu.linkonce.t."
-};
-
-  for (i = 0; i < sizeof (prefix) / sizeof (*prefix); i += 2)
-{
-  const char * old_prefix = prefix[i];
-  const char * new_prefix = prefix[i+1];
-  const char * name = frodata->named.name;
-
-  if (STR_PREFIX_P (name, old_prefix))
-{
-  const char *rname = ACONCAT ((new_prefix,
-name + strlen (old_prefix), NULL));
-  flags &= ~SECTION_CODE;
-  flags |= AVR_HAVE_JMP_CALL ? 0 : SECTION_CODE;
-
-  return get_section (rname, flags, frodata->named.decl);
-}
-}
-}
-
-  return progmem_swtable_section;
-}
-
-
 /* Implement `TARGET_ASM_NAMED_SECTION'.  */
 /* Track need of __do_clear_bss, __do_copy_data for named sections.  */
 
@@ -13747,9 +13688,6 @@ avr_fold_builtin (tree fndecl, int n_args 
ATTRIBUTE_UNUSED, tree *arg,
 #undef  TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN avr_fold_builtin
 
-#undef  TARGET_ASM_FUNCTION_RODATA_SECTION
-#define TARGET_ASM_FUNCTION_RODATA_SECTION avr_asm_function_rodata_section
-
 #undef  TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P avr_scalar_mode_supported_p
 
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 01da708..ab5e465 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -391,7 +391,7 @@ typedef struct avr_args
 
 #define SUPPORTS_INIT_PRIORITY 0
 
-#define JUMP_TABLES_IN_TEXT_SECTION 0
+#define JUMP_TABLES_IN_TEXT_SECTION 1
 
 #define ASM_COMMENT_START " ; "
 
diff --git gcc/testsuite/gcc.target/avr/pr71151-1.c 
gcc/testsuite/gcc.target/avr/pr71151-1.c
new file mode 100644
index 000..615dce8
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr71151-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -ffunction-sections -fdata-sections" } */
+
+/* { dg-final { scan-assembler-not ".section   
.progmem.gcc_sw_table.foo.str1.1" } } */
+/* { dg-final { scan-assembler ".sect

[PING] [PR other/70945] Handle function_glibc_finite_math in offloading

2016-06-03 Thread Thomas Schwinge
Hi!

Ping.

On Sat, 21 May 2016 17:59:17 +0200, I wrote:
> As discussed in  "Offloading: compatibility
> of target and offloading toolchains", there are situations where we have
> to do more work to ensure compatibility between target and offloading
> toolchains.
> 
> The first thing I'm working on is math functions usage in offloaded
> regions.
> 
> Here is a first patch, addressing glibc's finite math optimizations: if
> -ffinite-math-only (as implied by -ffast-math, or -Ofast) is in effect,
> glibc's  is known to include  for "special
> entry points to use when the compiler got told to only expect finite
> results".  This divertes the math functions' assembler names from
> "[function]" to "__[function]_finite".  This, obviously, is incompatible
> with offloading targets that don't use glibc, and thus don't provide
> these "__[function]_finite" entry points.
> 
> In response to Alexander's
> , I argue this
> does belong into the generic offloading data handling, instead of a
> nvptx-specific pass, for the reason that it is not a nvptx-specific
> transformation but addresses a general target vs. offloading target
> configuration mismatch.
> 
> If I understand him correctly, Joseph in
>  confirms my idea
> about how this may use some extension (attributes), allowing us to get
> rid of the "__[function]_finite" name matching heuristic.  That's indeed
> something to work on, but as it will take a rather long time until glibc
> changes make their way into distributions that end users are using, I'd
> like to start with the heuristic as implemented, and adjust this later
> on.
> 
> OK for trunk?  I'm working on a test case, too.
> 
> commit 0f65dbe65e883d2294c055631eccb07869bc5375
> Author: Thomas Schwinge 
> Date:   Fri May 13 17:02:30 2016 +0200
> 
> [PR other/70945] Handle function_glibc_finite_math in offloading
> 
>   gcc/
>   PR other/70945
>   * targhooks.c (default_libc_has_function): Update comment.
>   * target.def (libc_has_function): Likewise.
>   * doc/tm.texi: Regenerate.
>   * coretypes.h (enum function_class): Add
>   function_glibc_finite_math.
>   * config/darwin.c (darwin_libc_has_function): Handle it.
>   * lto-streamer.h (enum lto_section_type): Rename
>   LTO_section_offload_table to LTO_section_offload_data.  Adjust all
>   users.
>   * lto-cgraph.c (void output_offload_data): New function, split out
>   of output_offload_tables.  Adjust all users.  Stream the target's
>   function_glibc_finite_math property.
>   (input_offload_data): New function, split out of
>   input_offload_tables.  Adjust all users.  Handle mismatch between
>   the target's and the offloading target's
>   function_glibc_finite_math property.
> ---
>  gcc/config/darwin.c|   2 +
>  gcc/coretypes.h|  11 ++-
>  gcc/doc/tm.texi|   2 +-
>  gcc/lto-cgraph.c   | 181 
> -
>  gcc/lto-streamer-out.c |   2 +-
>  gcc/lto-streamer.h |   6 +-
>  gcc/lto/lto.c  |   2 +-
>  gcc/target.def |   2 +-
>  gcc/targhooks.c|   2 +-
>  9 files changed, 152 insertions(+), 58 deletions(-)
> 
> diff --git gcc/config/darwin.c gcc/config/darwin.c
> index 0055d80..92fe3e5 100644
> --- gcc/config/darwin.c
> +++ gcc/config/darwin.c
> @@ -3401,6 +3401,8 @@ darwin_libc_has_function (enum function_class fn_class)
>|| fn_class == function_c99_misc)
>  return (TARGET_64BIT
>   || strverscmp (darwin_macosx_version_min, "10.3") >= 0);
> +  if (fn_class == function_glibc_finite_math)
> +return false;
>  
>return true;
>  }
> diff --git gcc/coretypes.h gcc/coretypes.h
> index b3a91a6..aa48b5a 100644
> --- gcc/coretypes.h
> +++ gcc/coretypes.h
> @@ -305,14 +305,21 @@ union _dont_use_tree_here_;
>  
>  #endif
>  
> -/* Classes of functions that compiler needs to check
> +/* Properties, such as classes of functions that the compiler can check
> whether they are present at the runtime or not.  */
>  enum function_class {
>function_c94,
>function_c99_misc,
>function_c99_math_complex,
>function_sincos,
> -  function_c11_misc
> +  function_c11_misc,
> +  /* If -ffinite-math-only (as implied by -ffast-math, or -Ofast) is in 
> effect,
> + glibc's  is known to include  for "special
> + entry points to use when the compiler got told to only expect finite
> + results".  This divertes the math functions' assembler names from
> + "[function]" to "__[function]_finite".  This property indicates whether
> + such diversion may occur, not whether it actually has.  */
> +  function_glibc_finite_math
>  };
>  
>  /* Enumerate visibility settings.  This is deliberately ordered from most
> diff --git gcc/doc/tm.texi gcc/doc/tm.texi
> index 8c7f2a1..4ce3a43 100644
> --- gcc/doc/tm.texi
> +++ gcc/doc/

Re: [PING] [PR other/70945] Handle function_glibc_finite_math in offloading

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 04:44:15PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> Ping.

I think it would be better to just add this support to newlib.
Or are they opposed to that for whatever reason?

Jakub


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-06-03 Thread H.J. Lu
On Mon, May 23, 2016 at 11:16 AM, Aaron Conole  wrote:
> Nathan Sidwell  writes:
>
>> On 05/19/16 14:40, Aaron Conole wrote:
>>> Nathan Sidwell  writes:
>>
> +FILE *__gcov_error_file = NULL;

 Unless I'm missing something, isn't this only accessed from this file?
 (So could be static with a non-underbarred name)
>>>
>>> Ack.
>>
>> I have a vague memory that perhaps the __gcov_error_file is seen from
>> other dynamic objects, and one of them gets to open/close it?  I think
>> the closing function needs to reset it to NULL though?  (In case it's
>> reactivated before the process exits)
>
> This is being introduced here, so the actual variable won't be seen,
> however you're correct - the APIs could still be called.
>
> I think there does exist a possibility that it can get re-activated
> before the process exits. So, I've changed it to have a proper block
> cope and to reset gcov_error_file to NULL.
>
 And this protection here, makes me wonder what happens if one is
 IN_GCOV_TOOL. Does it pay attention to GCOV_ERROR_FILE?  That would
 seem incorrect, and thus the above should be changed so that stderr is
 unconditionally used when IN_GCOV_TOOL?
>>>
>>> You are correct.  I will fix it.
>>
>> thanks.
>>
> +static void
> +gcov_error_exit(void)
> +{
> +  if (__gcov_error_file && __gcov_error_file != stderr)
> +{

 Braces are not needed here.
>>
>> Unless of course my speculation about setting it to NULL is right.
>
> It is - I've fixed it, and will post the v3 patch shortly.
>
> Thank you for your help, Nathan!
>

It breaks profiledbootstrap:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71400

-- 
H.J.


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Martin Sebor

On 06/02/2016 01:34 AM, Jakub Jelinek wrote:

On Thu, Jun 02, 2016 at 09:23:16AM +0200, Jakub Jelinek wrote:

Also, perhaps just a documentation thing, it would be good to clarify the
NULL last argument.  From the POV of generating efficient code, I think we
should say something that the last argument (for the GNU builtins) must be
either a pointer to a valid object, or NULL/nullptr constant cast expression
cast to a pointer type, but nothing else.  That is actually what your patch
implements.  But, I'd like to make sure that
   int *p = NULL;
   if (__builtin_add_overflow (a, b, p))
 ...
is actually not valid, otherwise we unnecessarily pessimize many of the GNU
style calls (those that don't pass &var), where instead of
   tem = ADD_OVERFLOW (a, b);
   *p = REALPART_EXPR (tem);
   ovf = IMAGPART_EXPR (tem);
we'd need to emit instead
   tem = ADD_OVERFLOW (a, b);
   ovf = IMAGPART_EXPR (tem);
   if (p != NULL)
 *p = REALPART_EXPR (tem);


Though, perhaps that is too ugly, that it has different semantics for
   __builtin_add_overflow (a, b, (int *) NULL)
and for
   int *p = NULL;
   __builtin_add_overflow (a, b, p)
Maybe the cleanest would be to just add 3 extra builtins, again,
typegeneric,
   __builtin_{add,sub,mul}_overflow_p
where either the arguments would be instead of integertype1, integertype2,
integertype3 * rather integertype1, integertype2, integertype3
and we'd only care about the type, not value, of the last argument,
so use it like __builtin_add_overflow_p (a, b, (__typeof ((a) + (b))) 0)
or handle those 3 similarly to __builtin_va_arg, and use
__builtin_add_overflow_p (a, b, int);
I think I prefer the former though.


I'm not sure I understand your concern but at this point I would
prefer to keep things as they are.  I like that the functionality
that was requested in c/68120 could be provided under the existing
interface, and I'm not fond of the idea of adding yet another set
of built-ins just to get at a bit that's already available via
the existing ones (in C++ 11, even in constexpr contexts, provided
the built-ins were allowed there).  I've also spent far more time
on this patch than I had planned and I'm way behind on the tasks
I was asked to work on.

That said, in c/68120 the requester commented that he's considering
submitting a request for yet another enhancement in this area, so
I think letting him play with what we've got now for a bit will be
an opportunity to get his feedback and tweak the API further based
on it.

Martin


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 09:07:09AM -0600, Martin Sebor wrote:
> I'm not sure I understand your concern but at this point I would
> prefer to keep things as they are.  I like that the functionality

My concern is that if you declare that NULL is valid third argument for e.g.
__builtin_add_overflow, then unless we add some complicated wording into the
docs, we have to penalize the code we emit for e.g.
bool
foo (int a, int b, int *p)
{
  return __builtin_add_overflow (a, b, p);
}
While we could previously emit
addl%edi, %esi
movl%esi, (%rdx)
seto%al
retq
you suddenly have to emit (and your patch doesn't do that):
addl%edi, %esi
seto%al
testq   %rdx, %rdx
je  .L1
movl%esi, (%rdx)
.L1:
retq
because you can't at compile time prove if p is NULL (then you wouldn't
store the result) or not.

Trying to document that the third argument may be NULL, but only if it is
constant NULL pointer expression or something like that (what exactly?)
might not be very easily understandable and clear to users.

Which is why I've suggested just not allowing (like before) the third
argument to be NULL, and just add new 3 builtins for the test for overflow,
but don't store it anywhere.  They would just be folded early to the same
internal function.  And when adding the 3 new builtins, we can also choose
a different calling convention that would allow the C++98/C constant
expressions, by not having the third argument a pointer (we don't need to
dereference anything), but e.g. just any integer where we ignore the value
(well, evaluate for side-effects) and only care about the type.

Jakub


C PATCH to improve location for abstract declarators (PR c/71362)

2016-06-03 Thread Marek Polacek
This fixes an imprecise location info with abstract declarators.  The problem
was that when we build_id_declarator, the default location was input_location
and we never attempted to use a more precise location.  This patch does it.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-06-03  Marek Polacek  

PR c/71362
* c-parser.c (c_parser_direct_declarator): Set location.

* gcc.dg/pr71362.c: New test.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index bca8653..799a473 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -3430,6 +3430,7 @@ c_parser_direct_declarator (c_parser *parser, bool 
type_seen_p, c_dtr_syn kind,
   && c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
 {
   struct c_declarator *inner = build_id_declarator (NULL_TREE);
+  inner->id_loc = c_parser_peek_token (parser)->location;
   return c_parser_direct_declarator_inner (parser, *seen_id, inner);
 }
 
diff --git gcc/testsuite/gcc.dg/pr71362.c gcc/testsuite/gcc.dg/pr71362.c
index e69de29..fd9cd6a 100644
--- gcc/testsuite/gcc.dg/pr71362.c
+++ gcc/testsuite/gcc.dg/pr71362.c
@@ -0,0 +1,10 @@
+/* PR c/71362 */
+/* { dg-do compile } */
+
+extern void foo (int[-1]); /* { dg-error "21:size of unnamed array is 
negative" } */
+
+int
+bar (void)
+{
+  123 + sizeof (int[-1]); /* { dg-error "20:size of unnamed array is negative" 
} */
+}

Marek


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Martin Sebor

+   {
+ tree type = TREE_TYPE (TREE_TYPE (t));
+ tree vflow = arith_overflowed_p (opcode, type, arg0, arg1)
+  ? integer_one_node : integer_zero_node;


This looks incorrect, the return type is TREE_TYPE (t), some complex integer
type, therefore vflow needs to be
   tree vflow = build_int_cst (TREE_TYPE (TREE_TYPE (t)),
  arith_overflowed_p (opcode, type, arg0, arg1)
  ? 1 : 0);
no?


I guess it didn't think it mattered since the complex type specifies
the types of the two members.  I don't mind changing it if it does
but I'd like to have a test case to go with it.  Maybe Jason can
help with that.

Martin


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-06-03 Thread Aaron Conole
"H.J. Lu"  writes:

> On Mon, May 23, 2016 at 11:16 AM, Aaron Conole  wrote:
>> Nathan Sidwell  writes:
>>
>>> On 05/19/16 14:40, Aaron Conole wrote:
 Nathan Sidwell  writes:
>>>
>> +FILE *__gcov_error_file = NULL;
>
> Unless I'm missing something, isn't this only accessed from this file?
> (So could be static with a non-underbarred name)

 Ack.
>>>
>>> I have a vague memory that perhaps the __gcov_error_file is seen from
>>> other dynamic objects, and one of them gets to open/close it?  I think
>>> the closing function needs to reset it to NULL though?  (In case it's
>>> reactivated before the process exits)
>>
>> This is being introduced here, so the actual variable won't be seen,
>> however you're correct - the APIs could still be called.
>>
>> I think there does exist a possibility that it can get re-activated
>> before the process exits. So, I've changed it to have a proper block
>> cope and to reset gcov_error_file to NULL.
>>
> And this protection here, makes me wonder what happens if one is
> IN_GCOV_TOOL. Does it pay attention to GCOV_ERROR_FILE?  That would
> seem incorrect, and thus the above should be changed so that stderr is
> unconditionally used when IN_GCOV_TOOL?

 You are correct.  I will fix it.
>>>
>>> thanks.
>>>
>> +static void
>> +gcov_error_exit(void)
>> +{
>> +  if (__gcov_error_file && __gcov_error_file != stderr)
>> +{
>
> Braces are not needed here.
>>>
>>> Unless of course my speculation about setting it to NULL is right.
>>
>> It is - I've fixed it, and will post the v3 patch shortly.
>>
>> Thank you for your help, Nathan!
>>
>
> It breaks profiledbootstrap:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71400

d'oh!  Okay, baking a patch.


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 09:29:48AM -0600, Martin Sebor wrote:
> >>+   {
> >>+ tree type = TREE_TYPE (TREE_TYPE (t));
> >>+ tree vflow = arith_overflowed_p (opcode, type, arg0, arg1)
> >>+  ? integer_one_node : integer_zero_node;
> >
> >This looks incorrect, the return type is TREE_TYPE (t), some complex integer
> >type, therefore vflow needs to be
> >   tree vflow = build_int_cst (TREE_TYPE (TREE_TYPE (t)),
> >   arith_overflowed_p (opcode, type, arg0, arg1)
> >   ? 1 : 0);
> >no?
> 
> I guess it didn't think it mattered since the complex type specifies
> the types of the two members.  I don't mind changing it if it does

Sure, it does.  But if there are any differences between the lhs and rhs
type (or e.g. in COMPLEX_EXPR args etc. in GENERIC), then it is invalid IL,
or for GIMPLE if the two types aren't compatible according to the GIMPLE
rules (useless conversion).

Jakub


[PATCH] Fix check_GNU_style.sh for BSD / Mac OS X

2016-06-03 Thread Alan Hayward
check_GNU_style.sh fails to detect lines >80 chars on BSD / Mac OS X
systems.

This is becuase paste is being called with an empty delimiter list.
Instead \0 should be used.

Tested on Ubuntu 14.04 and OS X 10.9.5

contrib/
* check_GNU_style.sh: Fix paste args for BSD


Alan


diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh
index 
a7478f8f573132aef5ed1010f0cf5b13f08350d4..87a276c9cf47b5e07c4407f740ce05dce
1928c30 100755
--- a/contrib/check_GNU_style.sh
+++ b/contrib/check_GNU_style.sh
@@ -191,7 +191,7 @@ col (){
# Combine prefix back with long lines.
# Filter out empty lines.
local found=false
-   paste -d '' "$tmp2" "$tmp3" \
+   paste -d '\0' "$tmp2" "$tmp3" \
| grep -v '^[0-9][0-9]*:+$' \
> "$tmp" && found=true





Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Martin Sebor

On 06/03/2016 09:23 AM, Jakub Jelinek wrote:

On Fri, Jun 03, 2016 at 09:07:09AM -0600, Martin Sebor wrote:

I'm not sure I understand your concern but at this point I would
prefer to keep things as they are.  I like that the functionality


My concern is that if you declare that NULL is valid third argument for e.g.
__builtin_add_overflow, then unless we add some complicated wording into the
docs,


Thanks the clarification.


Trying to document that the third argument may be NULL, but only if it is
constant NULL pointer expression or something like that (what exactly?)
might not be very easily understandable and clear to users.


I think the updated wording is quite clear:

  The first of the functions accepts either a pointer to an integer
  object or a null pointer constant.

/A null pointer constant/ is a basic term defined by both C and C++
so it should be familiar to all C++ programmers.  (To make it 100%
correct we should perhaps say:

  "...or a null pointer constant cast to a pointer to integer."

even though that should be obvious since the functions won't
accept a bare NULL.)


Which is why I've suggested just not allowing (like before) the third
argument to be NULL, and just add new 3 builtins for the test for overflow,
but don't store it anywhere.  They would just be folded early to the same
internal function.  And when adding the 3 new builtins, we can also choose
a different calling convention that would allow the C++98/C constant
expressions, by not having the third argument a pointer (we don't need to
dereference anything), but e.g. just any integer where we ignore the value
(well, evaluate for side-effects) and only care about the type.


I understand what you're suggesting and it's something I could
have easily done when I started on it a few weeks ago but I'm
afraid really out of cycles to continue to tweak the patch.
I think it's good enough as it is for now, gives the requester
what they asked for and lets me finish the C++ VLA bounds
checking, which is the main reason why I did this work to begin
with.  We can revisit this idea when we get the requester's
feedback (and after I've made some headway on my pending tasks).
Does that sound reasonable?

Martin


[PATCH, rs6000] Fix ICE for vec_ld and vec_st of array when compiled with g++

2016-06-03 Thread Bill Schmidt
Hi,

When changing vec_ld and vec_st to be expanded during parsing, I
missed a subtle case that happens only for C++.  If the base address
argument has an array type, this doesn’t get converted to a pointer.
For our purposes, we need the pointer, and it’s safe to make that
conversion, so this patch performs that adjustment.  I’ve added a
test to the C++ torture bucket to verify this now works.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk, and eventual backport to the 6 branch?

Thanks,
Bill


[gcc]

2016-06-03  Bill Schmidt  

* rs6000-c.c (c/c-tree.h): Add #include.
(altivec_resolve_overloaded_builtin): Handle ARRAY_TYPE arguments
in C++ when found in the base position of vec_ld or vec_st.

[gcc/testsuite]

2016-06-03  Bill Schmidt  

* g++.dg/torture/ppc-ldst-array.C: New.


diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 79ac115..9e479a9 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -30,6 +30,7 @@
 #include "stor-layout.h"
 #include "c-family/c-pragma.h"
 #include "langhooks.h"
+#include "c/c-tree.h"
 
 
 
@@ -5203,6 +5204,14 @@ assignment for unaligned loads and stores");
arg0 = build1 (NOP_EXPR, sizetype, arg0);
 
  tree arg1_type = TREE_TYPE (arg1);
+ if (TREE_CODE (arg1_type) == ARRAY_TYPE)
+   {
+ arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
+ tree const0 = build_int_cstu (sizetype, 0);
+ tree arg1_elt0 = build_array_ref (loc, arg1, const0);
+ arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
+   }
+
  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
   arg1, arg0);
  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,
@@ -5256,6 +5265,14 @@ assignment for unaligned loads and stores");
arg1 = build1 (NOP_EXPR, sizetype, arg1);
 
  tree arg2_type = TREE_TYPE (arg2);
+ if (TREE_CODE (arg2_type) == ARRAY_TYPE)
+   {
+ arg2_type = TYPE_POINTER_TO (TREE_TYPE (arg2_type));
+ tree const0 = build_int_cstu (sizetype, 0);
+ tree arg2_elt0 = build_array_ref (loc, arg2, const0);
+ arg2 = build1 (ADDR_EXPR, arg2_type, arg2_elt0);
+   }
+
  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
   arg2, arg1);
  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type, addr,
diff --git a/gcc/testsuite/g++.dg/torture/ppc-ldst-array.C 
b/gcc/testsuite/g++.dg/torture/ppc-ldst-array.C
new file mode 100644
index 000..9f7da6c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/ppc-ldst-array.C
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-O0 -mcpu=power8" } */
+
+/* When compiled with C++, this code was breaking because of different
+   tree representations of arrays between C and C++.  */
+
+#include 
+
+extern vector float vf;
+
+void foo ()
+{
+  float __attribute__((aligned (16))) x[4];
+  float __attribute__((aligned (16))) y[4];
+  vf = vec_ld (0, x);
+  vec_st (vf, 0, y);
+}



Re: [Patch, avr] Fix PR 71151

2016-06-03 Thread Georg-Johann Lay

Senthil Kumar Selvaraj schrieb:

Hi,

  This patch fixes PR 71151 by eliminating the
  TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
  JUMP_TABLES_IN_TEXT_SECTION to 1.

  As described in the bugzilla entry, this hook assumed it will get
  called only for jumptable rodata for functions. This was true until
  6.1, when a commit in varasm.c started calling the hook for mergeable
  string/constant data as well.

  This resulted in string constants ending up in a section intended for
  jumptables (flash), and broke code using those constants, which
  expects them to be present in rodata (SRAM).

  Given that the original reason for placing jumptables in a section was
  fixed by Johann in PR 63323, this patch restores the original
  behavior. Reg testing on both gcc-6-branch and trunk showed no regressions.


Just for the record:

The intention for jump-tables in function-rodata-section was to get 
fine-grained section for the tables so that --gc-sections and 
-ffunction-sections not only gc unused functions but also unused 
jump-tables.  As these tables had to reside in the lowest 64KiB of flash 
(.progmem section) neither .rodata nor .text was a correct placement, 
hence the hacking in TARGET_ASM_FUNCTION_RODATA_SECTION.


Before using TARGET_ASM_FUNCTION_RODATA_SECTION, all jump tables were 
put into .progmem.gcc_sw_table by ASM_OUTPUT_BEFORE_CASE_LABEL switching 
to that section.


We actually never had fump-tables in .text before...

The purpose of PR63323 was to have more generic jump-table 
implementation that also works if the table does NOT reside in the lower 
64KiB.  This happens when moving whole whole TEXT section around like 
for a bootloader.



  As pointed out by Johann, this may end up increasing code
  size if there are lots of branches that cross the jump tables. I
  intend to propose a separate patch that gives additional information
  to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
  what type of function rodata is coming on. Johann also suggested
  handling jump table generation ourselves - I'll experiment with that
  some more.

  If ok, could someone commit please? Could you also backport to
  gcc-6-branch?

Regards
Senthil

gcc/ChangeLog

2016-06-03  Senthil Kumar Selvaraj  

* config/avr/avr.c (avr_asm_function_rodata_section): Remove.
* config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.

gcc/testsuite/ChangeLog

2016-06-03  Senthil Kumar Selvaraj  

* gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
* gcc/testsuite/gcc.target/avr/pr71151-2.c: New.

diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index ba5cd91..3cb8cb7 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -9488,65 +9488,6 @@ avr_asm_init_sections (void)
 }
 
 
-/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'.  */

-
-static section*
-avr_asm_function_rodata_section (tree decl)
-{
-  /* If a function is unused and optimized out by -ffunction-sections
- and --gc-sections, ensure that the same will happen for its jump
- tables by putting them into individual sections.  */
-
-  unsigned int flags;
-  section * frodata;
-
-  /* Get the frodata section from the default function in varasm.c
- but treat function-associated data-like jump tables as code
- rather than as user defined data.  AVR has no constant pools.  */
-  {
-int fdata = flag_data_sections;
-
-flag_data_sections = flag_function_sections;
-frodata = default_function_rodata_section (decl);
-flag_data_sections = fdata;
-flags = frodata->common.flags;
-  }
-
-  if (frodata != readonly_data_section
-  && flags & SECTION_NAMED)
-{
-  /* Adjust section flags and replace section name prefix.  */
-
-  unsigned int i;
-
-  static const char* const prefix[] =
-{
-  ".rodata",  ".progmem.gcc_sw_table",
-  ".gnu.linkonce.r.", ".gnu.linkonce.t."
-};
-
-  for (i = 0; i < sizeof (prefix) / sizeof (*prefix); i += 2)
-{
-  const char * old_prefix = prefix[i];
-  const char * new_prefix = prefix[i+1];
-  const char * name = frodata->named.name;
-
-  if (STR_PREFIX_P (name, old_prefix))
-{
-  const char *rname = ACONCAT ((new_prefix,
-name + strlen (old_prefix), NULL));
-  flags &= ~SECTION_CODE;
-  flags |= AVR_HAVE_JMP_CALL ? 0 : SECTION_CODE;
-
-  return get_section (rname, flags, frodata->named.decl);
-}
-}
-}
-
-  return progmem_swtable_section;
-}
-
-
 /* Implement `TARGET_ASM_NAMED_SECTION'.  */
 /* Track need of __do_clear_bss, __do_copy_data for named sections.  */
 
@@ -13747,9 +13688,6 @@ avr_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *arg,

 #undef  TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN avr_fold_builtin
 
-#undef  TARGET_ASM_FUNCTION_RODATA_SECTION

-#define TARGET_ASM_FUNCTION_RODATA_SECTION av

Fix predictor hitrate reporting

2016-06-03 Thread Jan Hubicka
Hi,
the hack to report predictor hitrates by re-running branch prediction has
a quirk because the loop niter code feeds the "loop iterations guessed"
predictor by the feedback data, so it is incorrectly reported as predicting
many loops.

This patch avoids that by using the data only when profile is fully read.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* tree-ssa-loop-niter.c (estimate_numbers_of_iterations_loop): Avoid
use of profile unless profile status is PROFILE_READ.
* profile.c (compute_branch_probabilities): Set profile status only 
after
reporting predictor hitrates.

Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 237067)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -3757,10 +3757,12 @@ estimate_numbers_of_iterations_loop (str
   maybe_lower_iteration_bound (loop);
 
   /* If we have a measured profile, use it to estimate the number of
- iterations.  */
-  if (loop->header->count != 0)
+ iterations.  Explicitly check for profile status so we do not report
+ wrong prediction hitrates for guessed loop iterations heuristics.  */
+  if (loop->header->count != 0
+  && profile_status_for_fn (cfun) >= PROFILE_READ)
 {
-  gcov_type nit = expected_loop_iterations_unbounded (loop) + 1;
+  gcov_type nit = expected_loop_iterations_unbounded (loop);
   bound = gcov_type_to_wide_int (nit);
   record_niter_bound (loop, bound, true, false);
 }
Index: profile.c
===
--- profile.c   (revision 237067)
+++ profile.c   (working copy)
@@ -826,8 +826,6 @@ compute_branch_probabilities (unsigned c
}
 }
   counts_to_freqs ();
-  profile_status_for_fn (cfun) = PROFILE_READ;
-  compute_function_frequency ();
 
   if (dump_file)
 {
@@ -1329,8 +1327,13 @@ branch_prob (void)
   values.release ();
   free_edge_list (el);
   coverage_end_function (lineno_checksum, cfg_checksum);
-  if (dump_file && (dump_flags & TDF_DETAILS))
-report_predictor_hitrates ();
+  if (flag_branch_probabilities && profile_info)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   report_predictor_hitrates ();
+  profile_status_for_fn (cfun) = PROFILE_READ;
+  compute_function_frequency ();
+}
 }
 
 /* Union find algorithm implementation for the basic blocks using


[C++ PATCH] Avoid exponential compile time in cp_fold_function (PR c++/70847, PR c++/71330, PR c++/71393)

2016-06-03 Thread Jakub Jelinek
Hi!

As mentioned in these PRs, if there are many nested SAVE_EXPRs or
TARGET_EXPRs and many of those appear two or more times in the tree,
cp_fold_function has huge compile time requirements.  Even when each
cp_fold should use a cache and once something has been folded, will
return the same tree again and again, even just walking the same huge trees
over and over and doing the cache lookups over and over takes lots of time.

The following patch fixes it by making sure that during a single
cp_fold_function, we recurse on the subtrees of each cp_fold returned tree
once, not more.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-06-03  Jakub Jelinek  
Patrick Palka  

PR c++/70847
PR c++/71330
PR c++/71393
* cp-gimplify.c (cp_fold_r): Set *walk_subtrees = 0 and return NULL
right after cp_fold call if cp_fold has returned the same stmt
already in some earlier cp_fold_r call.
(cp_fold_function): Add pset automatic variable, pass its address
to cp_walk_tree.

* g++.dg/opt/pr70847.C: New test.
* g++.dg/ubsan/pr70847.C: New test.
* g++.dg/ubsan/pr71393.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2016-06-02 18:35:18.0 +0200
+++ gcc/cp/cp-gimplify.c2016-06-03 15:23:47.098894612 +0200
@@ -940,6 +940,17 @@ cp_fold_r (tree *stmt_p, int *walk_subtr
 
   *stmt_p = stmt = cp_fold (*stmt_p);
 
+  if (((hash_set *) data)->add (stmt))
+{
+  /* Don't walk subtrees of stmts we've already walked once, otherwise
+we can have exponential complexity with e.g. lots of nested
+SAVE_EXPRs or TARGET_EXPRs.  cp_fold uses a cache and will return
+always the same tree, which the first time cp_fold_r has been
+called on it had the subtrees walked.  */
+  *walk_subtrees = 0;
+  return NULL;
+}
+
   code = TREE_CODE (stmt);
   if (code == OMP_FOR || code == OMP_SIMD || code == OMP_DISTRIBUTE
   || code == OMP_TASKLOOP || code == CILK_FOR || code == CILK_SIMD
@@ -997,7 +1008,8 @@ cp_fold_r (tree *stmt_p, int *walk_subtr
 void
 cp_fold_function (tree fndecl)
 {
-  cp_walk_tree (&DECL_SAVED_TREE (fndecl), cp_fold_r, NULL, NULL);
+  hash_set pset;
+  cp_walk_tree (&DECL_SAVED_TREE (fndecl), cp_fold_r, &pset, NULL);
 }
 
 /* Perform any pre-gimplification lowering of C++ front end trees to
--- gcc/testsuite/g++.dg/opt/pr70847.C.jj   2016-06-03 11:44:13.507026612 
+0200
+++ gcc/testsuite/g++.dg/opt/pr70847.C  2016-06-03 11:43:22.0 +0200
@@ -0,0 +1,11 @@
+// PR c++/70847
+// { dg-do compile }
+
+struct D { virtual D& f(); };
+
+void
+g()
+{
+  D d;
+  
d.f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f();
+}
--- gcc/testsuite/g++.dg/ubsan/pr70847.C.jj 2016-06-03 11:47:13.862624882 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr70847.C2016-06-03 11:43:22.0 
+0200
@@ -0,0 +1,11 @@
+// PR c++/70847
+// { dg-do compile }
+
+struct D { virtual D& f(); };
+
+void
+g()
+{
+  D d;
+  
d.f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f().f();
+}
--- gcc/testsuite/g++.dg/ubsan/pr71393.C.jj 2016-06-03 11:35:55.675656051 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr71393.C2016-06-03 11:35:26.0 
+0200
@@ -0,0 +1,14 @@
+// PR c++/71393
+// { dg-do compile }
+// { dg-options "-fsanitize=undefined" }
+
+struct B { B &operator << (long); };
+struct A { A (); long a, b, c, d, e, f; };
+
+A::A ()
+{
+  B q;
+  q << 0 << a << 0 << b << 0 << (b / a) << 0 << c << 0 << (c / a) << 0
+<< d << 0 << (d / a) << 0 << e << 0 << (e / a) << 0 << f << 0
+<< (f / a) << 0;
+}

Jakub


[PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-03 Thread Jakub Jelinek
On Tue, Jan 12, 2016 at 05:21:37PM +0300, Ilya Enkovich wrote:
> > --- gcc/tree-vect-slp.c.jj  2016-01-08 21:45:57.0 +0100
> > +++ gcc/tree-vect-slp.c 2016-01-11 12:07:19.633366712 +0100
> > @@ -2999,12 +2999,9 @@ vect_get_constant_vectors (tree op, slp_
> >   gimple *init_stmt;
> >   if (VECTOR_BOOLEAN_TYPE_P (vector_type))
> > {
> > - gcc_assert (fold_convertible_p (TREE_TYPE 
> > (vector_type),
> > - op));
> > + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
> >   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, 
> > op);
> 
> In vect_init_vector we had to introduce COND_EXPR to choose between 0 and -1 
> for
> boolean vectors.  Shouldn't we do similar in SLP?

Apparently the answer to this is YES.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2?

2016-06-03  Jakub Jelinek  

PR tree-optimization/71259
* tree-vect-slp.c (vect_get_constant_vectors): For
VECTOR_BOOLEAN_TYPE_P, return all ones constant instead of
one for constant op, and use COND_EXPR for non-constant.

* gcc.dg/vect/pr71259.c: New test.

--- gcc/tree-vect-slp.c.jj  2016-05-24 10:56:02.0 +0200
+++ gcc/tree-vect-slp.c 2016-06-03 17:01:12.740955935 +0200
@@ -3056,7 +3056,7 @@ vect_get_constant_vectors (tree op, slp_
  if (integer_zerop (op))
op = build_int_cst (TREE_TYPE (vector_type), 0);
  else if (integer_onep (op))
-   op = build_int_cst (TREE_TYPE (vector_type), 1);
+   op = build_all_ones_cst (TREE_TYPE (vector_type));
  else
gcc_unreachable ();
}
@@ -3071,8 +3071,14 @@ vect_get_constant_vectors (tree op, slp_
  gimple *init_stmt;
  if (VECTOR_BOOLEAN_TYPE_P (vector_type))
{
+ tree true_val
+   = build_all_ones_cst (TREE_TYPE (vector_type));
+ tree false_val
+   = build_zero_cst (TREE_TYPE (vector_type));
  gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
- init_stmt = gimple_build_assign (new_temp, NOP_EXPR, op);
+ init_stmt = gimple_build_assign (new_temp, COND_EXPR,
+  op, true_val,
+  false_val);
}
  else
{
--- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 17:05:37.693475438 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200
@@ -0,0 +1,28 @@
+/* PR tree-optimization/71259 */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+long a, b[1][44][2];
+long long c[44][17][2];
+
+int
+main ()
+{
+  int i, j, k;
+  check_vect ();
+  asm volatile ("" : : : "memory");
+  for (i = 0; i < 44; i++)
+for (j = 0; j < 17; j++)
+  for (k = 0; k < 2; k++)
+   c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
5105075050047261684;
+  asm volatile ("" : : : "memory");
+  for (i = 0; i < 44; i++) 
+for (j = 0; j < 17; j++)
+  for (k = 0; k < 2; k++)
+   if (c[i][j][k] != -5105075050047261684)
+ __builtin_abort ();
+  return 0;
+}


Jakub


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-03 Thread Jakub Jelinek
On Thu, Jun 02, 2016 at 05:11:15PM +0100, Alan Hayward wrote:
>   * gcc.dg/vect/vect-live-1.c: New test.
>   * gcc.dg/vect/vect-live-2.c: New test.
>   * gcc.dg/vect/vect-live-5.c: New test.
>   * gcc.dg/vect/vect-live-slp-1.c: New test.
>   * gcc.dg/vect/vect-live-slp-2.c: New test.
>   * gcc.dg/vect/vect-live-slp-3.c: New test.

These tests all fail for me on i686-linux.  The problem is
in the use of dg-options in gcc.dg/vect/, where it override all the various
needed vectorization options that need to be enabled on various arches
(e.g. -msse2 on i686).

Fixed thusly, tested on x86_64-linux and i686-linux, ok for trunk?

2016-06-03  Jakub Jelinek  

* gcc.dg/vect/vect-live-1.c: Remove dg-options.  Add
dg-additional-options with just -fno-tree-scev-cprop in it.
* gcc.dg/vect/vect-live-2.c: Likewise.
* gcc.dg/vect/vect-live-5.c: Likewise.
* gcc.dg/vect/vect-live-slp-1.c: Likewise.
* gcc.dg/vect/vect-live-slp-2.c: Likewise.
* gcc.dg/vect/vect-live-slp-3.c: Likewise.

--- gcc/testsuite/gcc.dg/vect/vect-live-1.c.jj  2016-06-03 17:36:38.0 
+0200
+++ gcc/testsuite/gcc.dg/vect/vect-live-1.c 2016-06-03 19:37:09.176283421 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop 
-fdump-tree-vect-details" } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/vect-live-2.c.jj  2016-06-03 17:36:38.0 
+0200
+++ gcc/testsuite/gcc.dg/vect/vect-live-2.c 2016-06-03 19:37:27.537042349 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop 
-fdump-tree-vect-details" } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/vect-live-5.c.jj  2016-06-03 17:36:38.0 
+0200
+++ gcc/testsuite/gcc.dg/vect/vect-live-5.c 2016-06-03 19:37:53.239704879 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop 
-fdump-tree-vect-details" } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c.jj  2016-06-03 
17:36:38.0 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c 2016-06-03 19:38:13.341440948 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop 
-fdump-tree-vect-details" } */
+/* { dg-options "-fno-tree-scev-cprop" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c.jj  2016-06-03 
17:36:38.0 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c 2016-06-03 19:38:32.364191184 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop 
-fdump-tree-vect-details" } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c.jj  2016-06-03 
17:36:38.0 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c 2016-06-03 19:38:49.490966314 
+0200
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop 
-fdump-tree-vect-details" } */
+/* { dg-options "-fno-tree-scev-cprop" } */
 
 #include "tree-vect.h"
 


Jakub


Re: [PATCH, rs6000] Fix ICE for vec_ld and vec_st of array when compiled with g++

2016-06-03 Thread Segher Boessenkool
On Fri, Jun 03, 2016 at 11:41:26AM -0500, Bill Schmidt wrote:
> When changing vec_ld and vec_st to be expanded during parsing, I
> missed a subtle case that happens only for C++.  If the base address
> argument has an array type, this doesn’t get converted to a pointer.
> For our purposes, we need the pointer, and it’s safe to make that
> conversion, so this patch performs that adjustment.  I’ve added a
> test to the C++ torture bucket to verify this now works.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk, and eventual backport to the 6 branch?

Okay.  A few questions about the testcase...


> +++ b/gcc/testsuite/g++.dg/torture/ppc-ldst-array.C
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc64*-*-* } } } */
> +/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> +/* { dg-options "-O0 -mcpu=power8" } */

It's torture, do you need to force -O0?  And the skip isn't necessary I think?


Segher


Re: [PATCH, rs6000] Fix ICE for vec_ld and vec_st of array when compiled with g++

2016-06-03 Thread Bill Schmidt
You’re right, I don’t need the -O0.  I’d like to leave the dg-skip-if in place 
because I’m worried about older processors not defining altivec, etc.

Thanks!

Bill

> On Jun 3, 2016, at 1:17 PM, Segher Boessenkool  
> wrote:
> 
> On Fri, Jun 03, 2016 at 11:41:26AM -0500, Bill Schmidt wrote:
>> When changing vec_ld and vec_st to be expanded during parsing, I
>> missed a subtle case that happens only for C++.  If the base address
>> argument has an array type, this doesn’t get converted to a pointer.
>> For our purposes, we need the pointer, and it’s safe to make that
>> conversion, so this patch performs that adjustment.  I’ve added a
>> test to the C++ torture bucket to verify this now works.
>> 
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> regressions.  Is this ok for trunk, and eventual backport to the 6 branch?
> 
> Okay.  A few questions about the testcase...
> 
> 
>> +++ b/gcc/testsuite/g++.dg/torture/ppc-ldst-array.C
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile { target { powerpc64*-*-* } } } */
>> +/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
>> "-mcpu=power8" } } */
>> +/* { dg-options "-O0 -mcpu=power8" } */
> 
> It's torture, do you need to force -O0?  And the skip isn't necessary I think?
> 
> 
> Segher
> 



[PATCH] Selftest framework (v7)

2016-06-03 Thread David Malcolm
On Fri, 2016-06-03 at 01:21 +0200, Bernd Schmidt wrote:
> On 06/02/2016 11:06 PM, David Malcolm wrote:
> > gcc/ChangeLog:
> > * Makefile.in (OBJS): Add function-tests.o,
> > hash-map-tests.o, hash-set-tests.o, rtl-tests.o,
> > selftest-run-tests.o.
> > (OBJS-libcommon): Add selftest.o.
> > (OBJS-libcommon-target): Add selftest.o.
> > (all.internal): Add "selftests".
> > (all.cross): Likewise.
> > (selftests): New phony target.
> > (s-selftests): New target.
> > (selftests-gdb): New phony target.
> > (COLLECT2_OBJS): Add selftest.o.
> > * common.opt (fself-test): New.
> > * selftest-run-tests.c: New file.
> > * selftest.c: New file.
> > * selftest.h: New file.
> > * toplev.c: Include selftest.h.
> > (toplev::run_self_tests): New.
> > (toplev::main): Handle -fself-test.
> > * toplev.h (toplev::run_self_tests): New.
>
> This one looks good to me. I kind of liked the auto-registration, but
> I
> guess manually calling functions is preferrable to including C files
> and
> similar in effort required. So it's probably better this way.

Thanks.

> > +  fprintf (stderr,
> > +  "%s:%i: FAIL: %s\n",
> > +  file, line, msg);
> > +  /* TODO: add calling function name as well?  */
> > +  abort ();
> > +}
>
> That'll fit on one line.

Fixed.

> Otherwise OK. Likewise for anything Jeff has
> already approved in a different form - but please make another pass
> and
> add brief function comments for new functions,

Done.

> and please ensure every
> step you commit actually compiles (this patch alone won't).

Given that this would all be committed atomically, here's a merged
version of the patch.

I've also rebased the code against today's trunk (r237076).

> Let me know which patches still need approval after that.

I believe I can self-approve the changes to diagnostic-show-locus.c

You've approved the new selftests.* files (I fixed the linewrap issue
you identified) and the changes to Makefile.in, common.opt, and toplev.c.

Remaining approvals needed:

The spellcheck.c changes (moving from a plugin) still need approval.

Jeff approved older versions of the rest of this patch with this
message:
> OK if/when prereqs are approved.  Minor twiddling if we end up moving it
> elsewhere or standardizing/reducing header files is pre-approved.

Since those reviews, the tests have been moved around,
gained comments, and various tweaking.

I've also ported them to the new API.

It's not clear to me if these approvals still hold.  In particular,
the wide-int.cc tests required a substantial rewrite; otherwise
the tweaking could reasonably be described as "minor".

Specifically, Jeff's reviews were:

  bitmap.c changes (as unittests/test-bitmap.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03284.html

  et-forest.c additions:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03295.html

  fold-const.c additions (as unittests/test-folding.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03305.html

  function-tests.c (as unittests/test-functions.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03310.html

  gimple.c additions (as unittests/test-gimple.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03304.html

  hash-map-tests.c (as unittests/test-hash-map.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03301.html

  hash-set-tests.c (as unittests/test-hash-set.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03300.html

  input.c additions (as unittests/test-locations.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03307.html

  rtl-tests.c (as unittests/test-rtl.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03302.html

  tree.c additions (as unittests/test-tree.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03303.html

  tree-cfg.c: add selftests
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03285.html
> Unless there's a good reason, drop the presumably redundant tests
> and this is OK. Save preapprovald for these changes as the bitmap
> patch.
This version does remove the redundant tests.

  vec.c: add selftests (as unittests/test-vec.c):
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03308.html

  wide-int.cc: add selftests:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03309.html

   The wide-int.cc tests required a substantial rewrite, since
   they are type-parametrized.

I believe the only changes since my last round of testing have
been:
* the tweaks mentioned above
* rebasing from r236397 (May 18th) to r237076 (today)
* the addition of comments
* squashing it into one patch

I'm re-testing now to be sure (checked and release builds,
bootstrap & regrtest, multi-config build for all in config-list.mk).

Assuming the testing is OK, is this OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add function-tests.o,
hash-map-tests.o, hash-set-tests.o, rtl-tests.o,
selftest-run-tests.o.
(OBJS-libcommon): Add selftest.o.
(OBJS-libco

Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-03 Thread Richard Biener
On June 3, 2016 7:45:24 PM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, Jun 02, 2016 at 05:11:15PM +0100, Alan Hayward wrote:
>>  * gcc.dg/vect/vect-live-1.c: New test.
>>  * gcc.dg/vect/vect-live-2.c: New test.
>>  * gcc.dg/vect/vect-live-5.c: New test.
>>  * gcc.dg/vect/vect-live-slp-1.c: New test.
>>  * gcc.dg/vect/vect-live-slp-2.c: New test.
>>  * gcc.dg/vect/vect-live-slp-3.c: New test.
>
>These tests all fail for me on i686-linux.  The problem is
>in the use of dg-options in gcc.dg/vect/, where it override all the
>various
>needed vectorization options that need to be enabled on various arches
>(e.g. -msse2 on i686).
>
>Fixed thusly, tested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

>
>2016-06-03  Jakub Jelinek  
>
>   * gcc.dg/vect/vect-live-1.c: Remove dg-options.  Add
>   dg-additional-options with just -fno-tree-scev-cprop in it.
>   * gcc.dg/vect/vect-live-2.c: Likewise.
>   * gcc.dg/vect/vect-live-5.c: Likewise.
>   * gcc.dg/vect/vect-live-slp-1.c: Likewise.
>   * gcc.dg/vect/vect-live-slp-2.c: Likewise.
>   * gcc.dg/vect/vect-live-slp-3.c: Likewise.
>
>--- gcc/testsuite/gcc.dg/vect/vect-live-1.c.jj 2016-06-03
>17:36:38.0 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-live-1.c2016-06-03
>19:37:09.176283421 +0200
>@@ -1,5 +1,5 @@
> /* { dg-require-effective-target vect_int } */
>-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop
>-fdump-tree-vect-details" } */
>+/* { dg-additional-options "-fno-tree-scev-cprop" } */
> 
> #include "tree-vect.h"
> 
>--- gcc/testsuite/gcc.dg/vect/vect-live-2.c.jj 2016-06-03
>17:36:38.0 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-live-2.c2016-06-03
>19:37:27.537042349 +0200
>@@ -1,5 +1,5 @@
> /* { dg-require-effective-target vect_int } */
>-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop
>-fdump-tree-vect-details" } */
>+/* { dg-additional-options "-fno-tree-scev-cprop" } */
> 
> #include "tree-vect.h"
> 
>--- gcc/testsuite/gcc.dg/vect/vect-live-5.c.jj 2016-06-03
>17:36:38.0 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-live-5.c2016-06-03
>19:37:53.239704879 +0200
>@@ -1,5 +1,5 @@
> /* { dg-require-effective-target vect_int } */
>-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop
>-fdump-tree-vect-details" } */
>+/* { dg-additional-options "-fno-tree-scev-cprop" } */
> 
> #include "tree-vect.h"
> 
>--- gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c.jj 2016-06-03
>17:36:38.0 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c2016-06-03
>19:38:13.341440948 +0200
>@@ -1,5 +1,5 @@
> /* { dg-require-effective-target vect_int } */
>-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop
>-fdump-tree-vect-details" } */
>+/* { dg-options "-fno-tree-scev-cprop" } */
> 
> #include "tree-vect.h"
> 
>--- gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c.jj 2016-06-03
>17:36:38.0 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c2016-06-03
>19:38:32.364191184 +0200
>@@ -1,5 +1,5 @@
> /* { dg-require-effective-target vect_int } */
>-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop
>-fdump-tree-vect-details" } */
>+/* { dg-additional-options "-fno-tree-scev-cprop" } */
> 
> #include "tree-vect.h"
> 
>--- gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c.jj 2016-06-03
>17:36:38.0 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c2016-06-03
>19:38:49.490966314 +0200
>@@ -1,5 +1,5 @@
> /* { dg-require-effective-target vect_int } */
>-/* { dg-options "-O2 -ftree-vectorize -fno-tree-scev-cprop
>-fdump-tree-vect-details" } */
>+/* { dg-options "-fno-tree-scev-cprop" } */
> 
> #include "tree-vect.h"
> 
>
>
>   Jakub




Re: [patch, fortran] PR52393 I/O: "READ format" statement with parenthesized default-char-expr

2016-06-03 Thread H.J. Lu
On Wed, Jun 1, 2016 at 9:28 AM, Jerry DeLisle  wrote:
> On 06/01/2016 12:25 AM, FX wrote:
>>> 2016-05-30  Jerry DeLisle  
>>>
>>>  PR fortran/52393
>>>  * io.c (match_io): For READ, try to match a default character
>>>  expression. If found, set the dt format expression to this,
>>>  otherwise go back and try control list.
>>
>> OK. Maybe you could add some “negative” tests too? To be sure we still catch 
>> malformed parenthesized formats?
>>
>> FX
>>
>
> Thanks for review!  yes I will add some tests.
>

It caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71404


-- 
H.J.


Re: [C++ Patch] PR 70202 ("ICE on invalid code in build_simple_base_path, at cp/class.c:579")

2016-06-03 Thread Jason Merrill

OK.

Jason


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-03 Thread Martin Sebor

On 06/03/2016 09:32 AM, Jakub Jelinek wrote:

On Fri, Jun 03, 2016 at 09:29:48AM -0600, Martin Sebor wrote:

+   {
+ tree type = TREE_TYPE (TREE_TYPE (t));
+ tree vflow = arith_overflowed_p (opcode, type, arg0, arg1)
+  ? integer_one_node : integer_zero_node;


This looks incorrect, the return type is TREE_TYPE (t), some complex integer
type, therefore vflow needs to be
   tree vflow = build_int_cst (TREE_TYPE (TREE_TYPE (t)),
  arith_overflowed_p (opcode, type, arg0, arg1)
  ? 1 : 0);
no?


I guess it didn't think it mattered since the complex type specifies
the types of the two members.  I don't mind changing it if it does


Sure, it does.  But if there are any differences between the lhs and rhs
type (or e.g. in COMPLEX_EXPR args etc. in GENERIC), then it is invalid IL,
or for GIMPLE if the two types aren't compatible according to the GIMPLE
rules (useless conversion).


I see.  I've made the change in the latest update to the patch
but I wasn't able to create a test case to verify it.  Maybe
that's because this is constexpr the COMPLEX_EXPR doesn't make
it far enough to trigger a problem.  If there is a way to test
it I'd appreciate a suggestion for how (otherwise, if not caught
in a code review like in this case, it would be a ticking time
bomb).

It also occurred to me that a more robust solution might be to
change build_complex to either enforce as a precondition that
the members have a type that matches the complex type.  I've
taken the liberty of making this change as part of this patch.
(It seems that an even better solution would be to have
build_complex convert the arguments to the expected type
so that callers don't have to worry about it.)

Martin
PR c++/70507 - integer overflow builtins not constant expressions
PR c/68120 - can't easily deal with integer overflow at compile time

gcc/cp/ChangeLog:
2016-06-03  Martin Sebor  

	PR c++/70507
	PR c/68120
	* constexpr.c (cxx_eval_internal_function): New function.
	(cxx_eval_call_expression): Call it.
	(potential_constant_expression_1): Handle integer arithmetic
	overflow built-ins.
	* tree.c (builtin_valid_in_constant_expr_p): Same.

gcc/ChangeLog:
2016-06-03  Martin Sebor  

	PR c++/70507
	PR c/68120
	* builtins.c (fold_builtin_unordered_cmp): Handle integer arithmetic
	overflow built-ins.
	* tree.c (build_complex): Assert preconditions.
	* doc/extend.texi (Integer Overflow Builtins): Update.

gcc/testsuite/ChangeLog:
2016-06-03  Martin Sebor  

	PR c++/70507
	PR c/68120
	* c-c++-common/builtin-arith-overflow-1.c: Add test cases.
	* c-c++-common/builtin-arith-overflow-2.c: New test.
	* g++.dg/cpp0x/constexpr-arith-overflow.C: New test.
	* g++.dg/cpp1y/constexpr-arith-overflow.C: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 931d4a6..ada1904 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-chkp.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "gimple-fold.h"
 
 
 struct target_builtins default_target_builtins;
@@ -7957,11 +7958,14 @@ fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
 			 tree arg0, tree arg1, tree arg2)
 {
   enum internal_fn ifn = IFN_LAST;
-  tree type = TREE_TYPE (TREE_TYPE (arg2));
-  tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2);
+  /* The code of the expression corresponding to the type-generic
+ built-in, or ERROR_MARK for the type-specific ones.  */
+  enum tree_code opcode = ERROR_MARK;
+
   switch (fcode)
 {
 case BUILT_IN_ADD_OVERFLOW:
+  opcode = PLUS_EXPR;
 case BUILT_IN_SADD_OVERFLOW:
 case BUILT_IN_SADDL_OVERFLOW:
 case BUILT_IN_SADDLL_OVERFLOW:
@@ -7971,6 +7975,7 @@ fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
   ifn = IFN_ADD_OVERFLOW;
   break;
 case BUILT_IN_SUB_OVERFLOW:
+  opcode = MINUS_EXPR;
 case BUILT_IN_SSUB_OVERFLOW:
 case BUILT_IN_SSUBL_OVERFLOW:
 case BUILT_IN_SSUBLL_OVERFLOW:
@@ -7980,6 +7985,7 @@ fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
   ifn = IFN_SUB_OVERFLOW;
   break;
 case BUILT_IN_MUL_OVERFLOW:
+  opcode = MULT_EXPR;
 case BUILT_IN_SMUL_OVERFLOW:
 case BUILT_IN_SMULL_OVERFLOW:
 case BUILT_IN_SMULLL_OVERFLOW:
@@ -7991,6 +7997,26 @@ fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
 default:
   gcc_unreachable ();
 }
+
+  /* For the "generic" overloads, the first two arguments can have different
+ types and the last argument determines the target type to use to check
+ for overflow.  The arguments of the other overloads all have the same
+ type.  */
+  bool isnullp = integer_zerop (arg2);
+  tree type = TREE_TYPE (TREE_TYPE (arg2));
+
+  /* When the last argument to the type-generic built-in is a null pointer
+ and the first two argum

Re: [PATCH][wwwdocs][AArch64] Mention -mcpu=qdf24xx support for GCC 6

2016-06-03 Thread Evandro Menezes

On 06/02/16 09:54, Kyrill Tkachov wrote:

The Qualcomm QDF24xx processor is now supported via the


Shouldn't this read "The Qualcomm QDF24xx processors are now supported 
via the"?


Not that I have a strong opinion about it, but, otherwise, OK.

--
Evandro Menezes



Re: [PATCH][AArch64] Cleanup -mpc-relative-loads

2016-06-03 Thread Evandro Menezes

On 06/03/16 07:56, Wilco Dijkstra wrote:

This patch cleans up the -mpc-relative-loads option processing.  Rename to 
avoid the
"no*" name and confusing !no* expressions.  Fix the option processing code to 
implement
-mno-pc-relative-loads rather than ignore it.

OK for commit?


LGTM

--
Evandro Menezes



Re: RFC [1/2] divmod transform

2016-06-03 Thread Joseph Myers
On Mon, 30 May 2016, Richard Biener wrote:

> Joseph - do you know sth about why there's not a full set of divmod
> libfuncs in libgcc?

I'm not familiar with the choice of divmod libfuncs.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Support x86-64 TLS code sequences without PLT

2016-06-03 Thread H.J. Lu
We can generate x86-64 TLS code sequences for general and local dynamic
models without PLT, which uses indirect call via GOT:

call *__tls_get_addr@GOTPCREL(%rip)

instead of direct call:

call __tls_get_addr[@PLT]

Since direct call is 4-byte long and indirect call, is 5-byte long, the
extra one byte must be handled properly.

For general dynamic model, one 0x66 prefix before call instruction is
removed to make room for indirect call.  For local dynamic model, we
simply use 5-byte indirect call.

TLS linker optimization is updated to recognize new instruction patterns.
For local dynamic model to local exec model transition, we generate
4 0x66 prefixes, instead of 3, before mov instruction in 64-bit and
generate a 5-byte nop, instead of 4-byte, before mov instruction in
32-bit.  Since linker may convert

call *__tls_get_addr@GOTPCREL(%rip)

to

addr32 call __tls_get_addr

when producing static executable, both patterns are recognized.

I will check it into binutils next week.

H.J.
---
bfd/

* elf64-x86-64.c (elf_x86_64_link_hash_entry): Add tls_get_addr.
(elf_x86_64_link_hash_newfunc): Initialize tls_get_addr to 2.
(elf_x86_64_check_tls_transition): Check indirect call and
direct call with the addr32 prefix for general and local dynamic
models.  Set the tls_get_addr feild.
(elf_x86_64_convert_load_reloc): Always use addr32 prefix for
indirect __tls_get_addr call via GOT.
(elf_x86_64_relocate_section): Handle GD->LE, GD->IE and LD->LE
transitions with indirect call and direct call with the addr32
prefix.

ld/

* testsuite/ld-x86-64/pass.out: New file.
* testsuite/ld-x86-64/tls-def1.c: Likewise.
* testsuite/ld-x86-64/tls-gd1.S: Likewise.
* testsuite/ld-x86-64/tls-ld1.S: Likewise.
* testsuite/ld-x86-64/tls-main1.c: Likewise.
* testsuite/ld-x86-64/tls.exp: Likewise.
* testsuite/ld-x86-64/tlsbin2-nacl.rd: Likewise.
* testsuite/ld-x86-64/tlsbin2.dd: Likewise.
* testsuite/ld-x86-64/tlsbin2.rd: Likewise.
* testsuite/ld-x86-64/tlsbin2.sd: Likewise.
* testsuite/ld-x86-64/tlsbin2.td: Likewise.
* testsuite/ld-x86-64/tlsbinpic2.s: Likewise.
* testsuite/ld-x86-64/tlsgd10.dd: Likewise.
* testsuite/ld-x86-64/tlsgd10.s: Likewise.
* testsuite/ld-x86-64/tlsgd11.dd: Likewise.
* testsuite/ld-x86-64/tlsgd11.s: Likewise.
* testsuite/ld-x86-64/tlsgd12.d: Likewise.
* testsuite/ld-x86-64/tlsgd12.s: Likewise.
* testsuite/ld-x86-64/tlsgd13.d: Likewise.
* testsuite/ld-x86-64/tlsgd13.s: Likewise.
* testsuite/ld-x86-64/tlsgd14.dd: Likewise.
* testsuite/ld-x86-64/tlsgd14.s: Likewise.
* testsuite/ld-x86-64/tlsgd5c.s: Likewise.
* testsuite/ld-x86-64/tlsgd6c.s: Likewise.
* testsuite/ld-x86-64/tlsgd9.dd: Likewise.
* testsuite/ld-x86-64/tlsgd9.s: Likewise.
* testsuite/ld-x86-64/tlsld4.dd: Likewise.
* testsuite/ld-x86-64/tlsld4.s: Likewise.
* testsuite/ld-x86-64/tlsld5.dd: Likewise.
* testsuite/ld-x86-64/tlsld5.s: Likewise.
* testsuite/ld-x86-64/tlsld6.dd: Likewise.
* testsuite/ld-x86-64/tlsld6.s: Likewise.
* testsuite/ld-x86-64/tlspic2-nacl.rd: Likewise.
* testsuite/ld-x86-64/tlspic2.dd: Likewise.
* testsuite/ld-x86-64/tlspic2.rd: Likewise.
* testsuite/ld-x86-64/tlspic2.sd: Likewise.
* testsuite/ld-x86-64/tlspic2.td: Likewise.
* testsuite/ld-x86-64/tlspic3.s: Likewise.
* testsuite/ld-x86-64/tlspie2.s: Likewise.
* testsuite/ld-x86-64/tlspie2a.d: Likewise.
* testsuite/ld-x86-64/tlspie2b.d: Likewise.
* testsuite/ld-x86-64/tlsgd5.dd: Updated.
* testsuite/ld-x86-64/tlsgd6.dd: Likewise.
* testsuite/ld-x86-64/x86-64.exp: Run libtlspic2.so, tlsbin2,
tlsgd5b, tlsgd6b, tlsld4, tlsld5, tlsld6, tlsgd9, tlsgd10,
tlsgd11, tlsgd14, tlsgd12, tlsgd13, tlspie2a and tlspie2b.
---
 bfd/elf64-x86-64.c | 341 -
 ld/testsuite/ld-x86-64/pass.out|   1 +
 ld/testsuite/ld-x86-64/tls-def1.c  |   1 +
 ld/testsuite/ld-x86-64/tls-gd1.S   |  55 +
 ld/testsuite/ld-x86-64/tls-ld1.S   |  47 
 ld/testsuite/ld-x86-64/tls-main1.c |  29 +++
 ld/testsuite/ld-x86-64/tls.exp | 125 +++
 ld/testsuite/ld-x86-64/tlsbin2-nacl.rd | 143 +
 ld/testsuite/ld-x86-64/tlsbin2.dd  | 310 +++
 ld/testsuite/ld-x86-64/tlsbin2.rd  | 141 
 ld/testsuite/ld-x86-64/tlsbin2.sd  |  13 ++
 ld/testsuite/ld-x86-64/tlsbin2.td  |  16 ++
 ld/testsuite/ld-x86-64/tlsbinpic2.s| 146 +
 ld/testsuite/ld-x86-64/tlsgd10.dd  |  23 ++
 ld/testsuite/ld-x86-64/tlsgd10.s   |  18 ++
 ld/testsuite/ld-x86-64/tlsgd11.dd  |  14 ++
 ld/testsuite/ld-x86-64/tlsgd11.s   |  15 ++
 ld/testsuite/ld-x86-64/tlsgd

Re: [PATCH 2/3][AArch64] Emit square root using the Newton series

2016-06-03 Thread Evandro Menezes

On 06/01/16 04:00, James Greenhalgh wrote:

On Fri, May 27, 2016 at 05:57:26PM -0500, Evandro Menezes wrote:

2016-04-04  Evandro Menezes  
 Wilco Dijkstra  

gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(cpu_approx_modes): New member "sqrt".
* config/aarch64/aarch64.c
(generic_approx_modes): New member "sqrt".
(exynosm1_approx_modes): Likewise.
(xgene1_approx_modes): Likewise.
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(aarch64_override_options_after_change_1): Handle new option.
* config/aarch64/aarch64-simd.md
(rsqrt2): Use new function instead.
(sqrt2): New expansion and insn definitions.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-sqrt): Add new option description.
* doc/invoke.texi (mlow-precision-sqrt): Likewise.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 6156281..2f407fd 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -192,7 +192,8 @@ struct cpu_branch_cost
  /* Allowed modes for approximations.  */
  struct cpu_approx_modes
  {
-  const unsigned int recip_sqrt; /* Reciprocal square root.  */
+  const unsigned int sqrt; /* Square root.  */
+  const unsigned int recip_sqrt;   /* Reciprocal square root.  */
  };
  
  struct tune_params

@@ -388,7 +389,7 @@ void aarch64_register_pragmas (void);
  void aarch64_relayout_simd_types (void);
  void aarch64_reset_previous_fndecl (void);
  void aarch64_save_restore_target_globals (tree);
-void aarch64_emit_approx_rsqrt (rtx, rtx);
+bool aarch64_emit_approx_sqrt (rtx, rtx, bool);

There's a goal to try to keep these in alphabetical order (first by return
type, then by name). This should go up by the other "bool" return types.


OK


+/* Emit instruction sequence to compute either the approximate square root
+   or its approximate reciprocal, depending on the flag RECP, and return
+   whether the sequence was emitted or not.  */
  
-void

-aarch64_emit_approx_rsqrt (rtx dst, rtx src)
+bool
+aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
  {
-  machine_mode mode = GET_MODE (src);
-  gcc_assert (
-mode == SFmode || mode == V2SFmode || mode == V4SFmode
-   || mode == DFmode || mode == V2DFmode);
-
-  rtx xsrc = gen_reg_rtx (mode);
-  emit_move_insn (xsrc, src);
-  rtx x0 = gen_reg_rtx (mode);
+  machine_mode mode = GET_MODE (dst);
+  machine_mode mmsk = mode_for_vector (int_mode_for_mode (GET_MODE_INNER 
(mode)),
+  GET_MODE_NUNITS (mode));

Long line. You can run your patch through contrib/check-gnu-style.sh to find
these and other GNU style issues.


OK, but I was shamelessly hoping to be able to whistle through slightly 
long lines. :-D



+  bool use_approx_sqrt_p = (!recp
+   && (flag_mlow_precision_sqrt
+   || (aarch64_tune_params.approx_modes->sqrt
+   & AARCH64_APPROX_MODE (mode;
+  bool use_approx_rsqrt_p = (recp
+&& (flag_mrecip_low_precision_sqrt
+|| 
(aarch64_tune_params.approx_modes->recip_sqrt

Long line.


OK


+& AARCH64_APPROX_MODE (mode;
+
+  if (!flag_finite_math_only
+  || flag_trapping_math
+  || !flag_unsafe_math_optimizations




+  || optimize_function_for_size_p (cfun)
+  || !(use_approx_sqrt_p || use_approx_rsqrt_p))

Swap these two cases to avoid the slightly more expensive call if we fail
the cheap flags check.


OK


+return false;
  
-  emit_insn ((*get_rsqrte_type (mode)) (x0, xsrc));

+  rtx xmsk = gen_reg_rtx (mmsk);
+  if (!recp)
+/* When calculating the approximate square root, compare the argument with
+   0.0 and create a mask.  */
+emit_insn (gen_rtx_SET (xmsk, gen_rtx_NEG (mmsk, gen_rtx_EQ (mmsk, src,
+ CONST0_RTX (mode);

I guess you've done it this way rather than calling gen_aarch64_cmeq
directly to avoid having a switch on mode? I wonder whether it is worth just
writing that helper function to make it explicit what instruction we want
to match?


I prefer to avoid calling the gen_...() functions for forward 
portability.  If a future version of the ISA can do it better than the 
explicit gen_...() function, then this just works.  Or at least this is 
the hope.  Again, this is just me.


  
-  bool double_mode = (mode == DFmode || mode == V2DFmode);

+  /* Estimate the approximate reciprocal square root.  */
+  rtx xdst = gen_reg_rtx (mode);
+  emit_insn ((*get_rsqrte_type (mode)) (xdst, src));
  
-  int iterations = double_mode ? 3 : 2;

+  /* Iterate over the series twice for

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-06-03 Thread Evandro Menezes

Rebasing the patch...

--
Evandro Menezes

>From d791090aae6a29fa94d8fc10894ee1053b05bcc2 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 4 Apr 2016 14:02:24 -0500
Subject: [PATCH 3/3] [AArch64] Emit division using the Newton series

2016-04-04  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-protos.h
	(cpu_approx_modes): Add new member "division".
	(aarch64_emit_approx_div): Declare new function.
	* config/aarch64/aarch64.c
	(generic_approx_modes): New member "division".
	(exynosm1_approx_modes): Likewise.
	(xgene1_approx_modes): Likewise.
	(aarch64_emit_approx_div): Define new function.
	* config/aarch64/aarch64.md ("div3"): New expansion.
	* config/aarch64/aarch64-simd.md ("div3"): Likewise.
	* config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
	* doc/invoke.texi (-mlow-precision-div): Describe new option.
---
 gcc/config/aarch64/aarch64-protos.h |  2 +
 gcc/config/aarch64/aarch64-simd.md  | 14 +-
 gcc/config/aarch64/aarch64.c| 92 +
 gcc/config/aarch64/aarch64.md   | 19 ++--
 gcc/config/aarch64/aarch64.opt  |  6 +++
 gcc/doc/invoke.texi | 10 
 6 files changed, 138 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index eb33118..3e0a0a3 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -192,6 +192,7 @@ struct cpu_branch_cost
 /* Allowed modes for approximations.  */
 struct cpu_approx_modes
 {
+  const unsigned int division;		/* Division.  */
   const unsigned int sqrt;		/* Square root.  */
   const unsigned int recip_sqrt;	/* Reciprocal square root.  */
 };
@@ -303,6 +304,7 @@ int aarch64_branch_cost (bool, bool);
 enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
 bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
+bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2a5c665..a244a27 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1509,7 +1509,19 @@
   [(set_attr "type" "neon_fp_mul_")]
 )
 
-(define_insn "div3"
+(define_expand "div3"
+ [(set (match_operand:VDQF 0 "register_operand")
+   (div:VDQF (match_operand:VDQF 1 "general_operand")
+		 (match_operand:VDQF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+DONE;
+
+  operands[1] = force_reg (mode, operands[1]);
+})
+
+(define_insn "*div3"
  [(set (match_operand:VDQF 0 "register_operand" "=w")
(div:VDQF (match_operand:VDQF 1 "register_operand" "w")
 		 (match_operand:VDQF 2 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ca6035d..7b85a85 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -396,6 +396,7 @@ static const struct cpu_branch_cost cortexa57_branch_cost =
 /* Generic approximation modes.  */
 static const cpu_approx_modes generic_approx_modes =
 {
+  AARCH64_APPROX_NONE,	/* division  */
   AARCH64_APPROX_NONE,	/* sqrt  */
   AARCH64_APPROX_NONE	/* recip_sqrt  */
 };
@@ -403,6 +404,7 @@ static const cpu_approx_modes generic_approx_modes =
 /* Approximation modes for Exynos M1.  */
 static const cpu_approx_modes exynosm1_approx_modes =
 {
+  AARCH64_APPROX_NONE,	/* division  */
   AARCH64_APPROX_ALL,	/* sqrt  */
   AARCH64_APPROX_ALL	/* recip_sqrt  */
 };
@@ -410,6 +412,7 @@ static const cpu_approx_modes exynosm1_approx_modes =
 /* Approximation modes for X-Gene 1.  */
 static const cpu_approx_modes xgene1_approx_modes =
 {
+  AARCH64_APPROX_NONE,	/* division  */
   AARCH64_APPROX_NONE,	/* sqrt  */
   AARCH64_APPROX_ALL	/* recip_sqrt  */
 };
@@ -7487,6 +7490,95 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
   return true;
 }
 
+typedef rtx (*recpe_type) (rtx, rtx);
+
+/* Select reciprocal initial estimate insn depending on machine mode.  */
+
+static recpe_type
+get_recpe_type (machine_mode mode)
+{
+  switch (mode)
+  {
+case SFmode:   return (gen_aarch64_frecpesf);
+case V2SFmode: return (gen_aarch64_frecpev2sf);
+case V4SFmode: return (gen_aarch64_frecpev4sf);
+case DFmode:   return (gen_aarch64_frecpedf);
+case V2DFmode: return (gen_aarch64_frecpev2df);
+default:   gcc_unreachable ();
+  }
+}
+
+typedef rtx (*recps_type) (rtx, rtx, rtx);
+
+/* Select reciprocal series step insn depending on machine mode.  */
+
+static recps_type
+get_recps_type (machine_mode mode)
+{
+  switch (mode)
+  {
+case SFmode:   return (gen_aarch64_frecpssf);
+case V2SFmode: return (gen_aarch64_frecpsv2sf);
+case V4SFmode: return (gen_aarch64_frecpsv4sf);
+case DFmode:   retur

Re: [PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation

2016-06-03 Thread Evandro Menezes

On 06/01/16 03:35, James Greenhalgh wrote:

On Fri, May 27, 2016 at 05:57:23PM -0500, Evandro Menezes wrote:

 From 86d7690632d03ec85fd69bfaef8e89c0542518ad Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH 1/3] [AArch64] Add more choices for the reciprocal square root
  approximation

Allow a target to prefer such operation depending on the operation mode.

2016-03-03  Evandro Menezes  

gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_APPROX_MODE): New macro.
(AARCH64_APPROX_{NONE,ALL}): Likewise.
(cpu_approx_modes): New structure.
(tune_params): New member "approx_modes".
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
* config/aarch64/aarch64.c
({generic,exynosm1,xgene1}_approx_modes): New core
"cpu_approx_modes" structures.
(generic_tunings): New member "approx_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(use_rsqrt_p): New argument for the mode and use new member from
"tune_params".
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.
* doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description.

We're almost there, just a couple of style comments left on this one
I think. Thanks for sticking with it so far.


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index f22a31c..6156281 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,23 @@ struct cpu_branch_cost
const int unpredictable;  /* Unpredictable branch or optimizing for speed.  
*/
  };
  
+/* Control approximate alternatives to certain FP operators.  */

+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
+ + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_ALL (-1)
+
+/* Allowed modes for approximations.  */
+struct cpu_approx_modes
+{
+  const unsigned int recip_sqrt; /* Reciprocal square root.  */
+};
+
  struct tune_params
  {
const struct cpu_cost_table *insn_extra_cost;
@@ -218,6 +235,8 @@ struct tune_params
} autoprefetcher_model;
  
unsigned int extra_tuning_flags;

+
+  const struct cpu_approx_modes *approx_modes;

Sorry to be irritating, but would you mind putting this up beside the
other "struct" members of tune_params. So directly under the branch_costs
member?


OK


  };
  
  #define AARCH64_FUSION_PAIR(x, name) \



diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..048c2a3 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,3 @@
   AARCH64_TUNE_ to give an enum name. */
  
  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)

-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
-
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9995494..e532cfc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -38,6 +38,7 @@
  #include "recog.h"
  #include "diagnostic.h"
  #include "insn-attr.h"
+#include "insn-modes.h"

Do we need this include? My build test of just this patch suggests not (maybe
you need it later in the series?). This can probably be dropped.


OK


  #include "alias.h"
  #include "fold-const.h"
  #include "stor-layout.h"
@@ -393,6 +394,24 @@ static const struct cpu_branch_cost cortexa57_branch_cost =
3   /* Unpredictable.  */
  };
  
+/* Generic approximation modes.  */

+static const cpu_approx_modes generic_approx_modes =
+{
+  AARCH64_APPROX_NONE  /* recip_sqrt  */
+};
+
+/* Approximation modes for Exynos M1.  */
+static const cpu_approx_modes exynosm1_approx_modes =
+{
+  AARCH64_APPROX_ALL   /* recip_sqrt  */
+};
+
+/* Approximation modes for Xgene1.  */

This should be "X-Gene 1" looking at how Applied Micro style it in their
marketing materials.


OK


+static const cpu_approx_modes xgene1_approx_modes =
+{
+  AARCH64_APPROX_ALL   /* recip_sqrt  */
+};
+
  static bool
-use_rsqrt_p (void)
+use_rsqrt_p (machine_mode mode)
  {
return (!flag_trapping_math
  && flag_unsafe_math_optimizations
- && ((aarch64_tune_params.extra_tuning_flags
-  & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
+ && ((aarch64_tune_params.approx_modes->recip_sqrt
+  & AARCH64_APPROX_MODE (mode)

Re: [PATCH][AArch64] Increase code alignment

2016-06-03 Thread Evandro Menezes

On 06/03/16 05:51, Wilco Dijkstra wrote:

It looks almost all AArch64 cores agree on alignment of 16 for function, and 8 
for loops and branches, so we should change -mcpu=generic as well if there is 
no disagreement - feedback welcome.


I'll see what sets of values Exynos M1 would be most comfortable with, 
but I also wonder if the -falign-labels shouldn't also be a parameter in 
tune_params.


Thoughts?

--
Evandro Menezes



Re: [PATCH][AArch64] Increase code alignment

2016-06-03 Thread Andrew Pinski
On Fri, Jun 3, 2016 at 3:51 AM, Wilco Dijkstra  wrote:
> Increase loop alignment on Cortex cores to 8 and set function alignment to 
> 16.  This makes things consistent across big.LITTLE cores, improves 
> performance of benchmarks with tight loops and reduces performance variations 
> due to small changes in code layout. It looks almost all AArch64 cores agree 
> on alignment of 16 for function, and 8 for loops and branches, so we should 
> change -mcpu=generic as well if there is no disagreement - feedback welcome.

This is actually might be better for ThunderX than the current set of
values for ThunderX.  I have tried 16 alignment for functions to see
if it is better but it should not hurt ThunderX that much as we have a
128 byte cache line anyways.

Thanks,
Andrew


>
> OK for commit?
>
> ChangeLog:
>
> 2016-05-03  Wilco Dijkstra  
>
> * gcc/config/aarch64/aarch64.c (cortexa53_tunings):
> Increase loop alignment to 8.  Set function alignment to 16.
> (cortexa35_tunings): Likewise.
> (cortexa57_tunings): Increase loop alignment to 8.
> (cortexa72_tunings): Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 12e5017a6d4b0ab15dcf932014980fdbd1a598ee..6ea10a187a1f895a399515b8cd0da0be63be827a
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -424,9 +424,9 @@ static const struct tune_params cortexa35_tunings =
>1, /* issue_rate  */
>(AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  8,   /* function_align.  */
> +  16,  /* function_align.  */
>8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
> @@ -449,9 +449,9 @@ static const struct tune_params cortexa53_tunings =
>2, /* issue_rate  */
>(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  8,   /* function_align.  */
> +  16,  /* function_align.  */
>8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
> @@ -476,7 +476,7 @@ static const struct tune_params cortexa57_tunings =
> | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
>16,  /* function_align.  */
>8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
> @@ -502,7 +502,7 @@ static const struct tune_params cortexa72_tunings =
> | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
>16,  /* function_align.  */
>8,   /* jump_align.  */
> -  4,   /* loop_align.  */
> +  8,   /* loop_align.  */
>2,   /* int_reassoc_width.  */
>4,   /* fp_reassoc_width.  */
>1,   /* vec_reassoc_width.  */
>


Re: [PR tree-optimization/71328] Fix off-by-one error in CFG/SSA updates for backward threading

2016-06-03 Thread Jeff Law

On 06/03/2016 01:13 AM, Jakub Jelinek wrote:

On Thu, Jun 02, 2016 at 11:24:49PM -0600, Jeff Law wrote:

commit 96a568909e429b0f24d61c8a2f3dd3c183d720d7
Author: law 
Date:   Fri Jun 3 05:20:16 2016 +

PR tree-optimization/71328
* tree-ssa-threadupdate.c (duplicate_thread_path): Fix off-by-one
error when checking for a jump back onto the copied path.  */


The C comment termination in the ChangeLog entry is weird.
Muscle memory...  I hit "." and naturally proceed with two spaces and a 
close comment marker ;-)


Thanks for fixing it up!

jeff


[PR tree-optimization/71316] Fix expected output in testcase after recent threading changes

2016-06-03 Thread Jeff Law


As outlined in the BZ, this test was partially compromised by the recent 
threading changes.  Those changes result in a jump thread being found 
during VRP1 rather than DOM2 (ie, earlier in the pipeline, which is 
good).  The expected output needed tweaking for the 
logical-op-short-circuit targets.


Installed on the trunk after verifying ppc64le passes the test.

Jeff
commit c1489ff56ba84d35a076117ca923261e24b01ce2
Author: law 
Date:   Fri Jun 3 23:12:39 2016 +

PR tree-optimization/71316
* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Update expected output.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237083 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 1a80a9c..cd45cda 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-03  Jeff Law  
+
+   PR tree-optimization/71316
+   * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Update expected output.
+
 2016-06-03  Jakub Jelinek  
 
* gcc.dg/vect/vect-live-1.c: Remove dg-options.  Add
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
index 4258fb5..2dd9177 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-dom2-details -std=gnu89" } */
+/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-dom2-details 
-std=gnu89" } */
 struct bitmap_head_def;
 typedef struct bitmap_head_def *bitmap;
 typedef const struct bitmap_head_def *const_bitmap;
@@ -76,8 +76,7 @@ bitmap_ior_and_compl (bitmap dst, const_bitmap a, 
const_bitmap b,
 skipping the known-true "b_elt && kill_elt" in the second
 condition.
 
-   However, 3 of those 4 opportunities are ultimately eliminated by
-   DOM optimizing away conditionals.  So there's only one jump threading
-   opportunity left.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 1 "dom2" { target 
logical_op_short_circuit } } } */
+   The !b_elt cases are picked up by VRP1 as jump threads.  The others
+   are optimized by DOM.  */
+/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" { target 
logical_op_short_circuit } } } */
 


Re: RFC [1/2] divmod transform

2016-06-03 Thread Jim Wilson
On Mon, May 30, 2016 at 12:45 AM, Richard Biener  wrote:
> Joseph - do you know sth about why there's not a full set of divmod
> libfuncs in libgcc?

Because udivmoddi4 isn't a libfunc, it is a helper function for the
div and mov libfuncs.  Since we can compute the signed div and mod
results from udivmoddi4, there was no need to also add a signed
version of it.  It was given a libfunc style name so that we had the
option of making it a libfunc in the future, but that never happened.
There was no support for calling any divmod libfunc until it was added
as a special case to call an ARM library (not libgcc) function.  This
happened here

2004-08-09  Mark Mitchell  

* config.gcc (arm*-*-eabi*): New target.
* defaults.h (TARGET_LIBGCC_FUNCS): New macro.
(TARGET_LIB_INT_CMP_BIASED): Likewise.
* expmed.c (expand_divmod): Try a two-valued divmod function as a
last resort.
...
* config/arm/arm.c (arm_init_libfuncs): New function.
(arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
(TARGET_INIT_LIBFUNCS): Define it.
...

Later, two ports added their own divmod libfuncs, but I don't see any
evidence that they were ever used, since there is no support for
calling divmod other than the expand_divmod last resort code that only
triggers for ARM.

It is only now that Prathamesh is adding gimple support for divmod
operations that we need to worry about getting this right, without
breaking the existing ARM library support or the existing udivmoddi4
support.

Jim


Re: [PATCH][AArch64] Increase code alignment

2016-06-03 Thread Evandro Menezes

On 06/03/16 17:22, Evandro Menezes wrote:

On 06/03/16 05:51, Wilco Dijkstra wrote:
It looks almost all AArch64 cores agree on alignment of 16 for 
function, and 8 for loops and branches, so we should change 
-mcpu=generic as well if there is no disagreement - feedback welcome.


I'll see what sets of values Exynos M1 would be most comfortable with, 
but I also wonder if the -falign-labels shouldn't also be a parameter 
in tune_params.


Thoughts?



FWIW, here are the values for the alignment of functions, branches and 
loops that fare better on Exynos M1 when -mcpu=generic, in order of 
preference:


1. 4-4-4
2. 16-4-16
3. 8-4-4

I also controlled the code size and, whenever the branch alignment was 8 
or 16 bytes, it would grow quickly, with no proportional improvement to 
performance on Exynos M1.


HTH

--
Evandro Menezes



[PR71281] ICE on gcc trunk on knl, wsm, ivb and bdw targets

2016-06-03 Thread kugan

Hi,

PR71281 happens when we use factored out negate stmt in other 
reassociations. Since we don't set the uid for this stmt, we hit the 
gcc_assert (in reassoc_stmt_dominates_stmt_p) which checks for uid being 
set. Attached patch fixes this.


Regression tested on x86-64-linux-gnu with no new regression. Is this OK 
for trunk?


Thanks,
Kugan

gcc/ChangeLog:

2016-06-04  Kugan Vivekanandarajah  

PR middle-end/71281
* tree-ssa-reassoc.c (reassociate_bb): Set uid for negate stmt.


gcc/testsuite/ChangeLog:

2016-06-04  Kugan Vivekanandarajah  

PR middle-end/71281
* g++.dg/torture/pr71281.C: New test.
diff --git a/gcc/testsuite/g++.dg/torture/pr71281.C 
b/gcc/testsuite/g++.dg/torture/pr71281.C
index e69de29..7d429a9 100644
--- a/gcc/testsuite/g++.dg/torture/pr71281.C
+++ b/gcc/testsuite/g++.dg/torture/pr71281.C
@@ -0,0 +1,63 @@
+// PR middle-end/71281
+// { dg-do compile }
+// { dg-additional-options "-std=c++11 -Ofast" }
+
+
+template  struct A;
+template  struct A<_Tp *> { typedef _Tp reference; };
+
+template  class B {
+public:
+  typename A<_Iterator>::reference operator*();
+};
+
+template  class C;
+template  struct D;
+
+template  struct D> {
+using value_type = _Tp;
+using const_pointer = _Tp *;
+template  using rebind_alloc = C<_Up>;
+};
+
+template  struct __alloc_traits : D<_Alloc> {
+typedef D<_Alloc> _Base_type;
+typedef typename _Base_type::value_type &reference;
+template  struct F {
+   typedef typename _Base_type::template rebind_alloc<_Tp> other;
+};
+};
+
+template  struct G {
+typedef typename __alloc_traits<_Alloc>::template F<_Tp>::other
+  _Tp_alloc_type;
+};
+
+int a, b;
+long d[1][1][1];
+void fn1() __attribute__((__noreturn__));
+template > class H {
+typedef __alloc_traits::_Tp_alloc_type> 
_Alloc_traits;
+typedef typename _Alloc_traits::reference reference;
+
+public:
+B m_fn1();
+long m_fn2();
+reference operator[](unsigned);
+reference m_fn3(unsigned){
+   if (m_fn2())
+ fn1();
+}
+};
+
+H>> c;
+void fn2() {
+H> e;
+for (int f = 1;;)
+  for (int g = 0;;)
+   for (int h = 0;;)
+ {
+   *d[0][h] =
+ c.m_fn3(f)[0][g] * a + -*(e).m_fn1() * b + 
(*c[f].m_fn1()).m_fn3(g);
+ }
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 1973077..096b24d 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5387,6 +5387,7 @@ reassociate_bb (basic_block bb)
  gimple_set_lhs (stmt, tmp);
  gassign *neg_stmt = gimple_build_assign (lhs, NEGATE_EXPR,
   tmp);
+ gimple_set_uid (neg_stmt, gimple_uid (stmt));
  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
  gsi_insert_after (&gsi, neg_stmt, GSI_NEW_STMT);
  update_stmt (stmt);


Re: [PING^3] Re: Updated autofdo bootstrap and testing patches

2016-06-03 Thread Andi Kleen
Andi Kleen  writes:

Ping^3!

> Andi Kleen  writes:
>
> Ping^2!
>
>> Andi Kleen  writes:
>>
>> Ping!
>>
>>> Here's an updated version of the patchkit to enable autofdo bootstrap
>>> and testing. It also fixes some autofdo issues. The last patch is more a 
>>> workaround
>>> (to make autofdo bootstrap not ICE), but may need a better fix.
>>>
>>> The main motivation is to get better test coverage for autofdo 
>>> and also an useful benchmark (speed of generated compiler) for it. 
>>> If you want the absolutely fastest compiler using profiledbootstrap
>>> is still the way to go.
>>>
>>> I addressed most of the earlier review comments. The python script
>>> is still python 2 for better compatibility with old systems.
>>>
>>> Ok to commit?
>>>
>>>
>>

-- 
a...@linux.intel.com -- Speaking for myself only


[SH][committed] Avoid potential slient wrong-code with reg+reg addr. modes

2016-06-03 Thread Oleg Endo
Hi,

The attached patch removes the hardcoded "r0" when printing reg+reg
addressing mode mems on SH.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r237088.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.c (sh_print_operand_address): Don't use hardcoded 'r0'
for reg+reg addressing mode.diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 2bd917a..74327aa 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -1038,8 +1038,16 @@ sh_print_operand_address (FILE *stream, machine_mode /*mode*/, rtx x)
 	  int base_num = true_regnum (base);
 	  int index_num = true_regnum (index);
 
-	  fprintf (stream, "@(r0,%s)",
-		   reg_names[MAX (base_num, index_num)]);
+	  /* If base or index is R0, make sure that it comes first.
+		 Usually one of them will be R0, but the order might be wrong.
+		 If neither base nor index are R0 it's an error and we just
+		 pass it on to the assembler.  This avoids silent wrong code
+		 bugs.  */
+	  if (base_num == 0 && index_num != 0)
+		std::swap (base_num, index_num);
+
+	  fprintf (stream, "@(%s,%s)", reg_names[index_num],
+	   reg_names[base_num]);
 	  break;
 	}
 


[PATCH/AARCH64] Add vulcan -mcpu support

2016-06-03 Thread Virendra Pathak
Hi gcc-patches group,

Please find the basic patch for adding -mcpu=vulcan support in the gcc.
Broadcom's vulcan is an armv8-a aarch64 based server processor.

At present we are using schedule model of cortex-a57 but soon we will
be submitting a schedule model for vulcan.

Please review the patch (attached with this mail) and kindly merge it
in the gcc-6-branch.

Tested the patch with aarch64-linux-gnu cross build,
aarch64-unknown-linux-gnu native build and make check.
We have also obtained company wide agreement with FSF for contributing
to gcc project.

Thanks.


ChangeLog:
* config/aarch64/aarch64-cores.def (vulcan): New core
* config/aarch64/aarch64-tune.md: Regenerate
* doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option.



with regards,
Virendra Pathak
From 8d065016856606740a3928518ed6a3f9933fb130 Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Wed, 1 Jun 2016 03:15:33 -0700
Subject: [PATCH] [AArch64] Add -mcpu vulcan support

---
 gcc/config/aarch64/aarch64-cores.def | 1 +
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 251a3eb..b0acad9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -46,6 +46,7 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  
AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
 AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
+AARCH64_CORE("vulcan",  vulcan,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, "0x50", "0x000")
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index cbc6f48..c758a5f 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,vulcan,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 821f8fd..146042d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12955,8 +12955,8 @@ processors implementing the target architecture.
 Specify the name of the target processor for which GCC should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
-@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
-@samp{xgene1}.
+@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{vulcan},
+@samp{thunderx}, @samp{xgene1}.
 
 Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible values for this
-- 
2.1.4