date:20160720

[PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Aurelien Jarno

On ARM soft-float, the float to double conversion doesn't convert a sNaN
to qNaN as the IEEE Std 754 standard mandates:

"Under default exception handling, any operation signaling an invalid
operation exception and for which a floating-point result is to be
delivered shall deliver a quiet NaN."

Given the soft float ARM code ignores exceptions and always provides a
result, a float to double conversion of a signaling NaN should return a
quiet NaN. Fix this in extendsfdf2.

gcc/ChangeLog:

PR target/59833
* config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.

gcc/testsuite/ChangeLog:

* gcc.dg/pr59833.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr59833.c | 18 ++
 libgcc/config/arm/ieee754-df.S | 10 +++---
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr59833.c

diff --git a/gcc/testsuite/gcc.dg/pr59833.c b/gcc/testsuite/gcc.dg/pr59833.c
new file mode 100644
index 000..e0e4ed5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr59833.c
@@ -0,0 +1,18 @@
+/* { dg-do run { target { *-*-linux* *-*-gnu* } } } */
+/* { dg-options "-O0 -lm" } */
+/* { dg-require-effective-target issignaling } */
+
+#define _GNU_SOURCE
+#include 
+
+int main (void)
+{
+  float sNaN = __builtin_nansf ("");
+  double x = (double) sNaN;
+  if (issignaling(x))
+  {
+__builtin_abort();
+  }
+
+  return 0;
+}
diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S
index a2aac70..1ecaa9d 100644
--- a/libgcc/config/arm/ieee754-df.S
+++ b/libgcc/config/arm/ieee754-df.S
@@ -507,11 +507,15 @@ ARM_FUNC_ALIAS aeabi_f2d extendsfdf2
eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
RETc(ne)@ and return it.
 
-   teq r2, #0  @ if actually 0
-   do_it   ne, e
-   teqne   r3, #0xff00 @ or INF or NAN
+   bicsr2, r2, #0xff00 @ isolate mantissa
+   do_it   eq  @ if 0, that is ZERO or INF,
RETc(eq)@ we are done already.
 
+   teq r3, #0xff00 @ check for NAN
+   do_it   eq, t
+   orreq   xh, xh, #0x0008 @ change to quiet NAN
+   RETc(eq)@ and return it.
+
@ value was denormalized.  We can normalize it now.
do_push {r4, r5, lr}
.cfi_adjust_cfa_offset 12   @ CFA is now sp + previousOffset + 12

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [PATCH, nds32] Enable GDB building for nds32--* target.

2016-07-20 Thread Nick Clifton

Hi Yan-Ting,

> After contributing our porting for GDB, we need a patch to enable GDB 
> building.
> 
> Is this patch OK for commit?

It is.  I have applied the patch to the binutils sources.

Please note however that the top level configure and configure.ac files are 
shared
with the gcc project, so a change like this needs to be enacted there as well.  
I
have taken the liberty of adding gcc-patches to the CC list for this email, and
just to keep things simple, I have gone ahead and applied your patch to the gcc
sources as well.

> 2016-07-18  Yan-Ting Lin  
> 
> * configure.ac (nds32*-*-*): Remove entry to enable gdb.
> * configure: Regenerated.

Approved and applied.

Cheers
  Nick

Re: [PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Ramana Radhakrishnan

On Wed, Jul 20, 2016 at 8:48 AM, Aurelien Jarno  wrote:
> On ARM soft-float, the float to double conversion doesn't convert a sNaN
> to qNaN as the IEEE Std 754 standard mandates:
>
> "Under default exception handling, any operation signaling an invalid
> operation exception and for which a floating-point result is to be
> delivered shall deliver a quiet NaN."
>
> Given the soft float ARM code ignores exceptions and always provides a
> result, a float to double conversion of a signaling NaN should return a
> quiet NaN. Fix this in extendsfdf2.
>
> gcc/ChangeLog:
>
> PR target/59833
> * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr59833.c: New testcase.


Ok - assuming this was tested appropriately with no regressions.

Thanks for following up.

Ramana


> ---
>  gcc/testsuite/gcc.dg/pr59833.c | 18 ++
>  libgcc/config/arm/ieee754-df.S | 10 +++---
>  2 files changed, 25 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr59833.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr59833.c b/gcc/testsuite/gcc.dg/pr59833.c
> new file mode 100644
> index 000..e0e4ed5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr59833.c
> @@ -0,0 +1,18 @@
> +/* { dg-do run { target { *-*-linux* *-*-gnu* } } } */
> +/* { dg-options "-O0 -lm" } */
> +/* { dg-require-effective-target issignaling } */
> +
> +#define _GNU_SOURCE
> +#include 
> +
> +int main (void)
> +{
> +  float sNaN = __builtin_nansf ("");
> +  double x = (double) sNaN;
> +  if (issignaling(x))
> +  {
> +__builtin_abort();
> +  }
> +
> +  return 0;
> +}
> diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S
> index a2aac70..1ecaa9d 100644
> --- a/libgcc/config/arm/ieee754-df.S
> +++ b/libgcc/config/arm/ieee754-df.S
> @@ -507,11 +507,15 @@ ARM_FUNC_ALIAS aeabi_f2d extendsfdf2
> eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
> RETc(ne)@ and return it.
>
> -   teq r2, #0  @ if actually 0
> -   do_it   ne, e
> -   teqne   r3, #0xff00 @ or INF or NAN
> +   bicsr2, r2, #0xff00 @ isolate mantissa
> +   do_it   eq  @ if 0, that is ZERO or INF,
> RETc(eq)@ we are done already.
>
> +   teq r3, #0xff00 @ check for NAN
> +   do_it   eq, t
> +   orreq   xh, xh, #0x0008 @ change to quiet NAN
> +   RETc(eq)@ and return it.
> +
> @ value was denormalized.  We can normalize it now.
> do_push {r4, r5, lr}
> .cfi_adjust_cfa_offset 12   @ CFA is now sp + previousOffset + 12
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://www.aurel32.net

[PATCH] Properly handly PHI stmts in later_of_the_two (PR, middle-end/71898)

2016-07-20 Thread Martin Liška

Hi.

Graphite uses comparison of gsi_stmt_iterators (later_of_the_two) to find a 
place where
to insert a new gimple statement. Problem of the function is that it does not 
distinguish between
PHI and non-PHI statements, where the former one always stands before the later 
one. The patch
fixes that.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 0ce169ea9201ca63f335404bb86a48ea98c11299 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 20 Jul 2016 09:13:40 +0200
Subject: [PATCH] Properly handly PHI stmts in later_of_the_two (PR
 middle-end/71898)

gcc/ChangeLog:

2016-07-20  Martin Liska  

	PR middle-end/71898
	* graphite-isl-ast-to-gimple.c (later_of_the_two):
	Properly handly PHI stmts.

gcc/testsuite/ChangeLog:

2016-07-20  Martin Liska  

	* gfortran.dg/graphite/pr71898.f90: New test.
---
 gcc/graphite-isl-ast-to-gimple.c   | 12 +++
 gcc/testsuite/gfortran.dg/graphite/pr71898.f90 | 45 ++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr71898.f90

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index fb9c846..07c88026 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1305,6 +1305,18 @@ later_of_the_two (gimple_stmt_iterator gsi1, gimple_stmt_iterator gsi2)
   /* Find the iterator which is the latest.  */
   if (bb1 == bb2)
 {
+  gimple *stmt1 = gsi_stmt (gsi1);
+  gimple *stmt2 = gsi_stmt (gsi2);
+
+  if (stmt1 != NULL && stmt2 != NULL)
+	{
+	  bool is_phi1 = gimple_code (stmt1) == GIMPLE_PHI;
+	  bool is_phi2 = gimple_code (stmt2) == GIMPLE_PHI;
+
+	  if (is_phi1 != is_phi2)
+	return is_phi1 ? gsi2 : gsi1;
+	}
+
   /* For empty basic blocks gsis point to the end of the sequence.  Since
 	 there is no operator== defined for gimple_stmt_iterator and for gsis
 	 not pointing to a valid statement gsi_next would assert.  */
diff --git a/gcc/testsuite/gfortran.dg/graphite/pr71898.f90 b/gcc/testsuite/gfortran.dg/graphite/pr71898.f90
new file mode 100644
index 000..01d6852
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/graphite/pr71898.f90
@@ -0,0 +1,45 @@
+! { dg-do compile }
+! { dg-options "-floop-nest-optimize -O1" }
+
+MODULE d3_poly
+INTEGER, PUBLIC, PARAMETER :: max_grad2=5
+INTEGER, PUBLIC, PARAMETER :: max_grad3=3
+INTEGER, PUBLIC, PARAMETER :: cached_dim2=(max_grad2+1)*(max_grad2+2)/2
+INTEGER, PUBLIC, PARAMETER :: cached_dim3=(max_grad3+1)*(max_grad3+2)*(max_grad3+3)/6
+INTEGER, SAVE, DIMENSION(3,cached_dim3) :: a_mono_exp3
+INTEGER, SAVE, DIMENSION(cached_dim2,cached_dim2) :: a_mono_mult2
+INTEGER, SAVE, DIMENSION(cached_dim3,cached_dim3) :: a_mono_mult3
+INTEGER, SAVE, DIMENSION(4,cached_dim3) :: a_mono_mult3a
+CONTAINS
+SUBROUTINE init_d3_poly_module()
+INTEGER  :: grad, i, ii, ij, j, subG
+INTEGER, DIMENSION(3):: monoRes3
+DO grad=0,max_grad2
+DO i=grad,0,-1
+DO j=grad-i,0,-1
+END DO
+END DO
+END DO
+DO ii=1,cached_dim3
+DO ij=ii,cached_dim2
+a_mono_mult2(ij,ii)=a_mono_mult2(ii,ij)
+END DO
+END DO
+DO ii=1,cached_dim3
+DO ij=ii,cached_dim3
+monoRes3=a_mono_exp3(:,ii)+a_mono_exp3(:,ij)
+a_mono_mult3(ii,ij)=mono_index3(monoRes3(1),monoRes3(2),monoRes3(3))+1
+a_mono_mult3(ij,ii)=a_mono_mult3(ii,ij)
+END DO
+END DO
+DO i=1,cached_dim3
+   DO j=1,4
+  a_mono_mult3a(j,i)=a_mono_mult3(j,i)
+   END DO
+END DO
+END SUBROUTINE
+PURE FUNCTION mono_index3(i,j,k) RESULT(res)
+INTEGER, INTENT(in)  :: i, j, k
+res=grad*(grad+1)*(grad+2)/6+(sgrad)*(sgrad+1)/2+k
+END FUNCTION
+END MODULE d3_poly
-- 
2.9.0

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Richard Earnshaw (lists)

On 19/07/16 16:31, Wilco Dijkstra wrote:
> When zero extending a 32-bit value to 64 bits, there should always be a
> SET operation on the outside, according to the patterns in aarch64.md.
> However, the mid-end can also ask for the cost of a made-up instruction,
> where the zero-extend is part of another operation, not SET.
> 
> In this case we currently cost the zero extend operation as a uxtb/uxth.
> Instead, cost it the same way we cost "normal" 32-to-64-bit zero
> extends: as a "mov" or the cost of the inner operation.
> 
> Bootstrapped and tested on aarch64-none-elf.
> 
> 2016-07-19  Kristina Martsenko  
> 
>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Fix cost of zero extend.
> 

I'm not sure about this, while rtx_cost is called recursively as it
walks the RTL, I'd normally expect the outer levels of the recursion to
catch the cases where zero-extend is folded into a more complex
operation.  Hitting a case like this suggests that something isn't doing
that correctly.

So what was the top-level RTX passed into rtx_cost?  I'd like to get a
better understanding about the use case before acking this patch.

A test-case would be really useful here, even if it can't be used in the
testsuite.

R.

> ---
>  gcc/config/aarch64/aarch64.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> bddffc3ab28cde3a996fd13c060de36227315fb5..a2621313d3278d39db0f1d5640b33201efefac21
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6421,12 +6421,11 @@ cost_plus:
>a 'w' register implicitly zeroes the upper bits of an 'x'
>register.  However, if this is
>  
> -(set (reg) (zero_extend (reg)))
> +(zero_extend (reg))
>  
>we must cost the explicit register move.  */
>if (mode == DImode
> -   && GET_MODE (op0) == SImode
> -   && outer == SET)
> +   && GET_MODE (op0) == SImode)
>   {
> int op_cost = rtx_cost (op0, VOIDmode, ZERO_EXTEND, 0, speed);
>  
>

Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-20 Thread Yuri Rumyantsev

Richard,

Jakub has already fixed it.
Sorry for troubles.
Yuri.

2016-07-19 18:29 GMT+03:00 Renlin Li :
> Hi Yuri,
>
> I saw this test case runs on arm platforms, and maybe other platforms as
> well.
>
> testsuite/g++.dg/vect/pr70729.cc:7:10: fatal error: xmmintrin.h: No such
> file or directory
>
> Before the change here, it's gated by vect_simd_clones target selector,
> which limit it to i?86/x86_64 platform only.
>
> Regards,
> Renlin Li
>
>
>
>
> On 08/07/16 15:07, Yuri Rumyantsev wrote:
>>
>> Hi Richard,
>>
>> Thanks for your help - your patch looks much better.
>> Here is new patch in which additional argument was added to determine
>> source loop of reference.
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>> ChangeLog:
>> 2016-07-08  Yuri Rumyantsev  
>>
>> PR tree-optimization/71734
>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which
>> contains REF, use it to check safelen, assume that safelen value
>> must be greater 1, fix style.
>> (ref_indep_loop_p_2): Add REF_LOOP argument.
>> (ref_indep_loop_p): Pass LOOP as additional argument to
>> ref_indep_loop_p_2.
>> gcc/testsuite/ChangeLog:
>>  * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.
>>
>> 2016-07-08 11:18 GMT+03:00 Richard Biener :
>>>
>>> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev 
>>> wrote:

 I checked simd3.f90 and found out that my additional check reject
 independence of references

 REF is independent in loop#3
 .istart0.19, .iend0.20
 which are defined in loop#1 which is outer for loop#3.
 Note that these references are defined by
 _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20);
 which is in loop#1.
 It is clear that both these references can not be independent for
 loop#3.
>>>
>>>
>>> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner
>>> loops
>>> of LOOP to catch memory references in those as well.  So the issue is
>>> really
>>> that we look at the wrong loop for safelen and we _do_ want to apply
>>> safelen
>>> to inner loops as well.
>>>
>>> So better track the loop we are ultimately asking the question for, like
>>> in the
>>> attached patch (fixes the testcase for me).
>>>
>>> Richard.
>>>
>>>
>>>
 2016-07-07 17:11 GMT+03:00 Richard Biener :
>
> On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev 
> wrote:
>>
>> I Added this check because of new failures in libgomp.fortran suite.
>> Here is copy of Jakub message:
>> --- Comment #29 from Jakub Jelinek  ---
>> The #c27 r237844 change looks bogus to me.
>> First of all, IMNSHO you can argue this way only if ref is a reference
>> seen in
>> loop LOOP,
>
>
> or inner loops of LOOP I guess.  I _think_ we never call
> ref_indep_loop_p_1 with
> a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would
> not make
> sense to do that, it would be a waste of time).
>
> So only if "or inner loops of LOOP" is not correct the check would be
> needed
> but then my issue with unrolling an inner loop and turning a ref that
> safelen
> does not apply to into a ref that it now applies to arises.
>
> I don't fully get what Jakub is hinting at.
>
> Can you install the safelen > 0 -> safelen > 1 fix please?  Jakub, can
> you
> explain that bitmap check with a simple testcase?
>
> Thanks,
> Richard.
>
>> which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2
>> -fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p
>> - the
>> D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the
>> outer loop
>> obviously can be dependent on many of the loads and/or stores in the
>> loop, be
>> it "omp simd array" or not.
>> Say for
>> void
>> foo (int *p, int *q)
>> {
>>#pragma omp simd
>>for (int i = 0; i < 1024; i++)
>>  p[i] += q[0];
>> }
>> sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could
>> write
>> something that changes its value, and then it would behave differently
>> from
>> using VF = 1024, where everything is performed in parallel.
>> Though, actually, it can alias, just it would have to write the same
>> value as
>> was there.  So, if this is used to determine if it is safe to hoist
>> the load
>> before the loop, it is fine, if it is used to determine if &q[0] >=
>> &p[0] &&
>> &q[0] <= &p[1023], then it is not fine.
>>
>> For aliasing of q[0] and p[1023], I don't see why they couldn't alias
>> in a
>> valid program.  #pragma omp simd I think guarantees that the last
>> iteration is
>> executed last, it isn't necessarily executed last alone, it could be,
>> or
>> together with one before last iteration, or (for simdlen INT_MAX) even
>> all
>> it

Re: [PATCH 3/3][AArch64] Improve zero extend

2016-07-20 Thread Richard Earnshaw (lists)

On 19/07/16 16:32, Wilco Dijkstra wrote:
> On AArch64 the UXTB and UXTH instructions are aliases of UBFM,
> which does a shift as part of its operation. An AND immediate is a
> simpler operation, and might be faster on some implementations, so it is
> better to emit this this instead of UBFM.
> 
> Benchmarking showed no difference on implementations where UBFM has
> the same performance as AND, and minor speedups across several
> benchmarks on an implementation where UBFM is slower than AND.
> 
> Bootstrapped and tested on aarch64-none-elf.
> 
> 2016-07-19  Kristina Martsenko  
> 2016-07-19  Wilco Dijkstra  
> 
> * config/aarch64/aarch64.md
> (zero_extend2_aarch64): Change output
> statement and type.
> (qihi2_aarch64): Likewise, and split into two.
> (extendqihi2_aarch64): New.
> (zero_extendqihi2_aarch64): New.
> * config/aarch64/iterators.md (ldrxt): Remove.
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Change cost of
> uxtb/uxth.

This looks sensible to me, but please wait 24 hours before committing,
just to allow the other folk with different implementations to comment
if they wish.

R.

> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c7249e8e98905bea4879bb2e2ee81d51a1004faa..e98e41521bfa8f807248b0147843de9e1f3447e3
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6886,8 +6886,8 @@ cost_plus:
>   }
> else
>   {
> -   /* UXTB/UXTH.  */
> -   *cost += extra_cost->alu.extend;
> +   /* We generate an AND instead of UXTB/UXTH.  */
> +   *cost += extra_cost->alu.logical;
>   }
>   }
>return false;
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 64f9ca1c4d1bec64cef769c9dbef9e4b5b00ba9e..5e8b1a815515eabc7e69c75574c2c300f50a6fe4
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1580,10 +1580,10 @@
>  (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m")))]
>""
>"@
> -   uxt\t%0, %w1
> +   and\t%0, %1, 
> ldr\t%w0, %1
> ldr\t%0, %1"
> -  [(set_attr "type" "extend,load1,load1")]
> +  [(set_attr "type" "logic_imm,load1,load1")]
>  )
>  
>  (define_expand "qihi2"
> @@ -1592,16 +1592,26 @@
>""
>  )
>  
> -(define_insn "*qihi2_aarch64"
> +(define_insn "*extendqihi2_aarch64"
>[(set (match_operand:HI 0 "register_operand" "=r,r")
> -(ANY_EXTEND:HI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
> + (sign_extend:HI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
>""
>"@
> -   xtb\t%w0, %w1
> -   b\t%w0, %1"
> +   sxtb\t%w0, %w1
> +   ldrsb\t%w0, %1"
>[(set_attr "type" "extend,load1")]
>  )
>  
> +(define_insn "*zero_extendqihi2_aarch64"
> +  [(set (match_operand:HI 0 "register_operand" "=r,r")
> + (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
> +  ""
> +  "@
> +   and\t%w0, %w1, 255
> +   ldrb\t%w0, %1"
> +  [(set_attr "type" "logic_imm,load1")]
> +)
> +
>  ;; ---
>  ;; Simple arithmetic
>  ;; ---
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> e8fbb1281dec2e8f37f58ef2ced792dd62e3b5aa..ef48ffda6f98a2d4aa29daaca206fef2bafcda48
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -888,9 +888,6 @@
>  ;; Similar, but when not(op)
>  (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
>  
> -;; Sign- or zero-extending load
> -(define_code_attr ldrxt [(sign_extend "ldrs") (zero_extend "ldr")])
> -
>  ;; Sign- or zero-extending data-op
>  (define_code_attr su [(sign_extend "s") (zero_extend "u")
> (sign_extract "s") (zero_extract "u")
>

[PATCH/AARCH64] Add scheduler for vulcan.

2016-07-20 Thread Virendra Pathak

Hi gcc-patches group,

Please find the patch for adding the basic scheduler for vulcan
in the aarch64 port.

Tested the patch with compiling cross aarch64-linux-gcc,
bootstrapped native aarch64-unknown-linux-gnu and
run gcc regression.

Kindly review and merge the patch to trunk, if the patch is okay.

There are few TODO in this patch which we have planned to
submit in the next submission e.g. crc and crypto
instructions, further improving integer & fp load/store
based on addressing mode of the address.

Thanks.

gcc/ChangeLog:

Virendra Pathak  
Julian Brown  

* config/aarch64/aarch64-cores.def: Change the scheduler
to vulcan.
* config/aarch64/aarch64.md: Include vulcan.md.
* config/aarch64/vulcan.md: New file.



with regards,
Virendra Pathak
From 3114daf5a5b4e7f1bbc57f0bf930823615c7d6aa Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Thu, 14 Jul 2016 23:07:10 -0700
Subject: [PATCH] AArch64: Add scheduler for vulcan.

---
 gcc/config/aarch64/aarch64-cores.def |   2 +-
 gcc/config/aarch64/aarch64.md|   1 +
 gcc/config/aarch64/vulcan.md | 490 +++
 3 files changed, 492 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/aarch64/vulcan.md

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index d9da257..b3bfece 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -52,7 +52,7 @@ AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  
AARCH64_FL_FOR_ARCH8, xge
 
 /* V8.1 Architecture Processors.  */
 
-AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
AARCH64_FL_CRYPTO, vulcan, "0x42", "0x516")
+AARCH64_CORE("vulcan",  vulcan, vulcan, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
AARCH64_FL_CRYPTO, vulcan, "0x42", "0x516")
 
 /* V8 big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bcb7db0..5d754b1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -219,6 +219,7 @@
 (include "../arm/exynos-m1.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
+(include "vulcan.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/vulcan.md b/gcc/config/aarch64/vulcan.md
new file mode 100644
index 000..d54044f
--- /dev/null
+++ b/gcc/config/aarch64/vulcan.md
@@ -0,0 +1,490 @@
+;; Broadcom Vulcan pipeline description
+;; Copyright (C) 2016 Free Software Foundation, Inc.
+;;
+;; Contributed by Broadcom and Mentor Embedded.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "vulcan")
+
+(define_cpu_unit "vulcan_i0" "vulcan")
+(define_cpu_unit "vulcan_i1" "vulcan")
+(define_cpu_unit "vulcan_i2" "vulcan")
+(define_cpu_unit "vulcan_ls0" "vulcan")
+(define_cpu_unit "vulcan_ls1" "vulcan")
+(define_cpu_unit "vulcan_sd" "vulcan")
+
+; Pseudo-units for multiply pipeline.
+
+(define_cpu_unit "vulcan_i1m1" "vulcan")
+(define_cpu_unit "vulcan_i1m2" "vulcan")
+(define_cpu_unit "vulcan_i1m3" "vulcan")
+
+; Pseudo-units for load delay (assuming dcache hit).
+
+(define_cpu_unit "vulcan_ls0d1" "vulcan")
+(define_cpu_unit "vulcan_ls0d2" "vulcan")
+(define_cpu_unit "vulcan_ls0d3" "vulcan")
+
+(define_cpu_unit "vulcan_ls1d1" "vulcan")
+(define_cpu_unit "vulcan_ls1d2" "vulcan")
+(define_cpu_unit "vulcan_ls1d3" "vulcan")
+
+; Make some aliases for f0/f1.
+(define_reservation "vulcan_f0" "vulcan_i0")
+(define_reservation "vulcan_f1" "vulcan_i1")
+
+(define_reservation "vulcan_i012" "vulcan_i0|vulcan_i1|vulcan_i2")
+(define_reservation "vulcan_ls01" "vulcan_ls0|vulcan_ls1")
+(define_reservation "vulcan_f01" "vulcan_f0|vulcan_f1")
+
+(define_reservation "vulcan_ls_both" "vulcan_ls0+vulcan_ls1")
+
+; A load with delay in the ls0/ls1 pipes.
+(define_reservation "vulcan_l0delay" "vulcan_ls0,vulcan_ls0d1,vulcan_ls0d2,\
+ vulcan_ls0d3")
+(define_reservation "vulcan_l1delay" "vulcan_ls1,vulcan_ls1d1,vulcan_ls1d2,\
+ vulcan_ls1d3")
+(define_reservation "vulcan_l01delay" "vulcan_l0delay|vulcan_l1delay")
+
+;; Branch and call instructions.
+
+(define_insn_reservation "vulcan_branch" 1
+  (and (eq_attr "tune" "vulcan")
+   (eq_attr "type" "call,branch"))
+  "vulcan_i2")
+

Re: [Fortran, Patch] First patch for coarray FAILED IMAGES (TS 18508)

2016-07-20 Thread Andre Vehreschild

Hi Mikael,


> > +  if(st == ST_FAIL_IMAGE)
> > +new_st.op = EXEC_FAIL_IMAGE;
> > +  else
> > +gcc_unreachable();  
> You can use
>   gcc_assert (st == ST_FAIL_IMAGE);
>   foo...;
> instead of
>   if (st == ST_FAIL_IMAGE)
>   foo...;
>   else
>   gcc_unreachable ();

Be careful, this is not 100% identical in the general case. For older
gcc version (gcc < 4008) gcc_assert() is mapped to nothing, esp. not to
an abort(), so the behavior can change. But in this case everything is
fine, because the patch is most likely not backported.

> > +
> > +  return MATCH_YES;
> > +
> > + syntax:
> > +  gfc_syntax_error (st);
> > +
> > +  return MATCH_ERROR;
> > +}
> > +
> > +match
> > +gfc_match_fail_image (void)
> > +{
> > +  /* if (!gfc_notify_std (GFC_STD_F2008_TS, "FAIL IMAGE statement
> > at %C")) */
> > +  /*   return MATCH_ERROR; */
> > +  
> Can this be uncommented?
> 
> > +  return fail_image_statement (ST_FAIL_IMAGE);
> > +}
> >
> >  /* Match LOCK/UNLOCK statement. Syntax:
> >   LOCK ( lock-variable [ , lock-stat-list ] )
> > diff --git a/gcc/fortran/trans-intrinsic.c
> > b/gcc/fortran/trans-intrinsic.c index 1aaf4e2..b2f5596 100644
> > --- a/gcc/fortran/trans-intrinsic.c
> > +++ b/gcc/fortran/trans-intrinsic.c
> > @@ -1647,6 +1647,24 @@ trans_this_image (gfc_se * se, gfc_expr
> > *expr) m, lbound));
> >  }
> >
> > +static void
> > +gfc_conv_intrinsic_image_status (gfc_se *se, gfc_expr *expr)
> > +{
> > +  unsigned int num_args;
> > +  tree *args,tmp;
> > +
> > +  num_args = gfc_intrinsic_argument_list_length (expr);
> > +  args = XALLOCAVEC (tree, num_args);
> > +
> > +  gfc_conv_intrinsic_function_args (se, expr, args, num_args);
> > +
> > +  if (flag_coarray == GFC_FCOARRAY_LIB)
> > +{  
> Can everything be put under the if?
> Does it work with -fcoarray=single?

IMO coarray=single should not generate code here, therefore putting
everything under the if should to fine.

Sorry for the comments ...

- Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de

[Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost

2016-07-20 Thread James Greenhalgh

Splicing replies to Bernd, Bernhard and Jeff.

Jeff, thanks for reviewing the patch set, I appreciate the ack, though I've
held off committing while I was working through Bernd's criticism of the
size cost model that this patch introduced and trying to get that right.
Sorry to cause extra reviewing work, but I have respun the patch set to
try to improve the consistency of how we're costing things, and to better
handle the size cases. I'm a bit happier with how it has turned out and I
think the approach is now a little easier to justify. Hopefully it will
still be acceptable for trunk.

There are essentially two families of cost models in this file. The true
before/after comparisons (noce_cmove_arith, noce_convert_multiple_sets),
and the "magic numbers" comparisons (noce_try_store_flag_constants,
noce_try_addcc, noce_try_store_flag_mask, noce_try_cmove). In the first
revisions of this patch set, I refactored the magic numbers comparisons,
but I didn't try to solve their "magic" as comparing two integers was
a suitably fast routine, and the comparison seemed accurate enough.

But the magic numbers are potentially inaccurate for a variable-length
instruction architecture, and given the number of times we actually manage to
spot these if-convert opportunities, the compile time overhead of moving
every cost model to a before/after comparison is probably not all that
high. Then we have everything going through one single function, making

Additionally, if we can rework most of the costs to actually calculate
the before/after costs, we can then drop the "size" case from this hook
entirely - we can just look at the size of the sequences directly rather
than asking the target to guess at an acceptable size growth.

This is good as it will completely remove magic numbers from ifcvt and
make everything dependent on a simple question to the target, when
compiling for speed; "What is the maximum cost of extra execution that
you'd like to see on the unconditional path?"

Unfortunately, disentangling this makes it harder to layout the patch set
quite as neatly as before. The changes follow the same structure, but I've
had to squash all the cost changes in to patch 2/2. Fortunately these now
look reasonably mechanical, and consequently the patch is not much more
difficult to review.

Patches 3/2 and 4/2 are not strictly needed as part of the cost model work,
but they do help the cost model by performing some simplifications early.
This reduces the chance of us rejecting if-conversion based on too many
simple moves that a future pass would have cleared up anyway. The csibe
numbers below rely on these two patches having been applied. Without them,
we get a couple of decisions wrong and some files from csibe increase
by < 3%.

On Tue, Jun 21, 2016 at 11:30:17PM +0200, Bernhard Reutner-Fischer wrote:
>
> >For the default implementation, if the parameters are not set, I just
> >multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
> >COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still
> >short
> >of ideas on how best to form the default implementation.
>
> How bad is it in e.g. CSiBE?

I'm not completely sure I've set it up right, but these are the >0.5% size
 differences for an x86_64 compiler I built last Friday using -Os:

Smaller:

Relative size   Test name

93.33   flex-2.5.31,tables_shared
94.37   teem-1.6.0-src,src/limn/qn
97.27   teem-1.6.0-src,src/nrrd/kernel
98.31   teem-1.6.0-src,src/ten/miscTen
98.60   teem-1.6.0-src,src/ell/genmat
98.69   teem-1.6.0-src,src/nrrd/measure
99.03   teem-1.6.0-src,src/ten/mod
99.04   libpng-1.2.5,pngwtran
99.08   jpeg-6b,jdcoefct
99.14   teem-1.6.0-src,src/dye/convertDye
99.15   teem-1.6.0-src,src/ten/glyph
99.16   teem-1.6.0-src,src/bane/gkmsPvg
99.20   teem-1.6.0-src,src/limn/splineEval
99.25   teem-1.6.0-src,src/nrrd/accessors
99.28   teem-1.6.0-src,src/hest/parseHest
99.33   teem-1.6.0-src,src/limn/transform
99.40   teem-1.6.0-src,src/alan/coreAlan
99.48   teem-1.6.0-src,src/air/miscAir

Larger:

Relative size   Test name

101.43  teem-1.6.0-src,src/ten/tendEvec
101.57  teem-1.6.0-src,src/ten/tendEval

However, the total size difference is indistinguishable from noise
(< 0.08%).

Running the same experiment with an AArch64 cross compiler, I get the
following changes:

Smaller:

Relative size   Test name

97.78   libpng-1.2.5,pngrio
98.02   libpng-1.2.5,pngwio
98.82   replaypc-0.4.0.preproc,ReplayPC
99.21   lwip-0.5.3.preproc,src/core/inet
99.48   jpeg-6b,wrppm

Larger:

Relative size   Test name

100.52  jpeg-6b,wrbmp
100.82  libpng-1.2.5,pngwtran
100.91  zlib-1.1.4,infcodes

And the overall size difference was tiny (< 0.01%).

There were no >0.5% changes for the ARM port (expected as it doesn't use
noce).

I looked in to each of the regressions, and generally they occur where
we relying on a future pass to clean up after us. This is especially true
for the large x86_64 regressions, which as far as I can see are a
consequence of x86_64's floating-point co

[Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources

2016-07-20 Thread James Greenhalgh


Hi,

This patch adds a short-circuit to optabs.c for the case where both
source operands are identical (i.e. we would be assigning the same
value in both branches).

This can show up for the memory optimisation in noce_cmove_arith in ifcvt.c,
if both branches would load from the same address. This is an odd situation
to arrise. It showed up in my csibe runs, but I couldn't reproduce it in
a small test case.

Bootstrapped on x86_64-none-linux-gnu and aarch64-none-linux-gnu with no
issues.

OK?

Thanks,
James

---
2016-07-20  James Greenhalgh  

* optabs.c (emit_condiitonal_move): Short circuit for identical
sources.

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 51e10e2..87b4f97 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4214,6 +4214,17 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   enum insn_code icode;
   enum rtx_code reversed;
 
+  /* If the two source operands are identical, that's just a move.  */
+
+  if (rtx_equal_p (op2, op3))
+{
+  if (!target)
+	target = gen_reg_rtx (mode);
+
+  emit_move_insn (target, op3);
+  return target;
+}
+
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */

[RFC: Patch 2/2 v3] Introduce a new cost model for ifcvt.

2016-07-20 Thread James Greenhalgh


Hi,

This patch modifies the way we calculate costs in ifcvt.c. Rather than
using a combination of magic numbers and approximations to descide if we
should perform the transformation before constructing the new RTL, we
instead construct the new RTL and use the cost of that to form our cost model.

We want slightly different behaviour when compiling for speed than what we
want when compiling for size.

For size, we just want to look at what the size of code would have been before
the transformation, and what we plan to generate now. We need a little bit of
guess work to try to figure what cost to assign to the compare (for which we
don't keep track of the full insn) and branch (which insn_rtx_cost won't
handle), but otherwise the cost model is easy to calculate.

For speed, we want to use the max_noce_ifcvt_seq_cost hook defined in
patch 1/4. Here we don't care about the original cost, our hook is defined
in terms of how expensive the instructions which are brought on to the
unconditional path are permitted to be. For speed then, we have a simple
numerical comparison between the new cost and the cost returned by the
hook.

To acheieve this, first we abstract all the cost logic in to
noce_conversion_profitable_p.  To get the size cost logic right, we need a few
modifications to the fields of noce_if_info. We're going to drop "then_cost"
and "else_cost", which will instead be covered by "original_cost" which is the
sum of these costs, plus an extra magic COSTS_N_INSNS (2) to cover a compare
and branch. We're going to drop branch_cost which was used by the old cost
model, and add max_seq_cost which is defined in the new model. Finally, we can
factor out the repeated calculation of speed_p, and just store it once in
noce_if_info. This last point fixes the inconsistency of which basic block
we check optimize_bb_for_speed_p against.

To build the sum for "original_cost" we need to update
bb_valid_for_noce_process_p such that it adds to the cost pointer it takes
rather than overwriting it.

Having done that, we need to update all the cost models in the file to
check for profitability just after we check that if-conversion has
succeeded.

Finally, we use the params added in 1/4 to allow us to do something
sensible with the testcases that look for if-conversion. With these tests
we only care that the mechanics would work if the cost model were permissive
enough, not that a traget has actually set the cost model high enough, so
we just set the parameters to their maximum values.

Bootstrapped on x86-64 and aarch64.

OK?

Thanks,
James

---

gcc/

2016-07-20  James Greenhalgh  

* ifcvt.c (noce_if_info): New fields: speed_p, original_cost,
max_seq_cost.  Removed fields: then_cost, else_cost, branch_cost.
(noce_conversion_profitable_p): New.
(noce_try_store_flag_constants): Use it.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_cmove_arith): Likewise.
(bb_valid_for_noce_process_p): Add to the cost parameter rather than
overwriting it.
(noce_convert_multiple_sets): Move cost model to here, from...
(bb_ok_for_noce_convert_multiple_sets) ...here.
(noce_process_if_block): Update calls for above changes.
(noce_find_if_block): Record new noce_if_info parameters.

gcc/testsuite/

2016-07-18  James Greenhalgh  

* gcc.dg/ifcvt-2.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-3.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/pr68435.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-4.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-5.c: Use parameter to guide if-conversion heuristics.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a92ab6d..4e3d8f3 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -807,12 +807,17 @@ struct noce_if_info
   bool then_simple;
   bool else_simple;
 
-  /* The total rtx cost of the instructions in then_bb and else_bb.  */
-  unsigned int then_cost;
-  unsigned int else_cost;
+  /* True if we're optimisizing the control block for speed, false if
+ we're optimizing for size.  */
+  bool speed_p;
 
-  /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
+  /* The combined cost of COND, JUMP and the costs for THEN_BB and
+ ELSE_BB.  */
+  unsigned int original_cost;
+
+  /* Maximum permissible cost for the unconditional sequence we should
+ generate to replace this branch.  */
+  unsigned int max_seq_cost;
 
   /* The name of the noce transform that succeeded in if-converting
  this structure.  Used for debugging.  */
@@ -835,6 +840,27 @@ static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* Return TRUE if SEQ is a good candidate as a replacement for the
+   if-convertible sequ

[Patch RFC 4/2 v3] Refactor noce_try_cmove_arith

2016-07-20 Thread James Greenhalgh


Hi,

This patch pulls some duplicate logic out from noce_try_cmove_arith.
We do this in order to make reasoning about the code easier.

Some of the natural simplification that comes from this process improves
the generation of temporaries in the code, which is good as it reduces
the size and speed costs of the generated sequence.  We want to do this
as the more useless register moves we can remove early, the more accurate
our profitability analysis will be.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-07-20  James Greenhalgh  

* ifcvt.c (noce_arith_helper): New.
(noce_try_cmove_arith): Refactor.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4e3d8f3..f2e7ac6 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2068,23 +2068,127 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
   return true;
 }
 
-/* Try more complex cases involving conditional_move.  */
+/* Helper for noce_try_cmove_arith.  This gets called twice, once for the
+   then branch, once for the else branch X_BB gives the basic block for the
+   branch we are currently interested in.  X is the destination for this
+   branch.  If X is complex, we need to move it in to a register first, by
+   possibly copying from INSN_X such that we preserve clobbers etc from the
+   original instruction.  EMIT_X is the target register for this branch
+   result.  ORIG_OTHER_DEST gives the original destination from the
+   opposite branch.  OTHER_BB_EXISTS_P is true if there was an opposite
+   branch for us to consider.  */
+
+bool
+noce_arith_helper (rtx *x, rtx *emit_x, rtx_insn *insn_x,
+		   basic_block x_bb, rtx orig_other_dest,
+		   bool other_bb_exists_p)
+{
+  rtx set_tmp = NULL_RTX;
+
+  machine_mode x_mode = GET_MODE (*x);
+
+  /* Two cases to catch here.  Either X is not yet a general operand, in
+ which case we need to move it to an appropriate register.  Or, the other
+ block is empty, in which case ORIG_OTHER_DEST came from the test block.
+ The non-empty complex block that we will emit might clobber the register
+ used by ORIG_OTHER_DEST, so move it to a pseudo first.  */
+  if (! general_operand (*x, x_mode)
+  || !other_bb_exists_p)
+{
+  rtx reg = gen_reg_rtx (x_mode);
+  if (insn_x)
+	{
+	  rtx_insn *copy_of_x = as_a  (copy_rtx (insn_x));
+	  rtx set = single_set (copy_of_x);
+	  SET_DEST (set) = reg;
+	  set_tmp = PATTERN (copy_of_x);
+	}
+  else
+	{
+	  set_tmp = gen_rtx_SET (reg, *x);
+	}
+  *x = reg;
+}
+
+  /* Check that our new insn isn't going to clobber ORIG_OTHER_DEST.  */
+  bool modified_in_x = (set_tmp != NULL_RTX)
+			&& modified_in_p (orig_other_dest, set_tmp);
+
+  /* If we have a X_BB to check, go through it and make sure the insns we'd
+ duplicate don't write ORIG_OTHER_DEST.  */
+  if (x_bb)
+{
+  rtx_insn *tmp_insn = NULL;
+  FOR_BB_INSNS (x_bb, tmp_insn)
+	/* Don't check inside the destination insn, we will have changed
+	   it to use a register that doesn't conflict.  */
+	if (!(insn_x && tmp_insn == insn_x)
+	&& modified_in_p (orig_other_dest, tmp_insn))
+	  {
+	modified_in_x = true;
+	break;
+	  }
+}
+
+  /* Store the SET back in EMIT_X.  */
+  *emit_x = set_tmp;
+  return modified_in_x;
+}
+
+/* Try more complex cases involving conditional_move.
+
+   We have:
+
+  if (test)
+	x = a + b;
+  else
+	x = c - d;
+
+Make it:
+
+  t1 = a + b;
+  t2 = c - d;
+  x = (test) ? t1 : t2;
+
+   Alternatively, we have:
+
+  if (test)
+	x = *y;
+  else
+	x = *z;
+
+   Make it:
+
+ p1 = (test) ? y : z;
+ x = *p1;
+*/
 
 static int
 noce_try_cmove_arith (struct noce_if_info *if_info)
 {
+  /* SET_SRC from the two branches.  */
   rtx a = if_info->a;
   rtx b = if_info->b;
+  /* SET_DEST of both branches.  */
   rtx x = if_info->x;
-  rtx orig_a, orig_b;
-  rtx_insn *insn_a, *insn_b;
+  /* Full insns from the two branches.  */
+  rtx_insn *insn_a = if_info->insn_a;
+  rtx_insn *insn_b = if_info->insn_b;
+  /* Whether the branches are single set.  */
   bool a_simple = if_info->then_simple;
   bool b_simple = if_info->else_simple;
+  /* Our two basic blocks.  */
   basic_block then_bb = if_info->then_bb;
   basic_block else_bb = if_info->else_bb;
+  /* Whether we're handling the transformation of a load.  */
+  bool is_mem = false;
+  /* Copies of A and B before we modified them.  */
+  rtx orig_a = a, orig_b = b;
+  /* A new target to be used by the conditional select.  */
   rtx target;
-  int is_mem = 0;
-  enum rtx_code code;
+  /* The RTX code for the condition in the test block.  */
+  enum rtx_code code = GET_CODE (if_info->cond);
+  /* Our generated sequence.  */
   rtx_insn *ifcvt_seq;
 
   /* A conditional move from two memory sources is equivalent to a
@@ -2094,33 +2198,19 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (cse_not_expected
   && MEM_P (a) && MEM_P (b)
   && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b))
-{
-  mac

Re: [PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Aurelien Jarno

On 2016-07-20 10:10, Ramana Radhakrishnan wrote:
> On Wed, Jul 20, 2016 at 8:48 AM, Aurelien Jarno  wrote:
> > On ARM soft-float, the float to double conversion doesn't convert a sNaN
> > to qNaN as the IEEE Std 754 standard mandates:
> >
> > "Under default exception handling, any operation signaling an invalid
> > operation exception and for which a floating-point result is to be
> > delivered shall deliver a quiet NaN."
> >
> > Given the soft float ARM code ignores exceptions and always provides a
> > result, a float to double conversion of a signaling NaN should return a
> > quiet NaN. Fix this in extendsfdf2.
> >
> > gcc/ChangeLog:
> >
> > PR target/59833
> > * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/pr59833.c: New testcase.
> 
> 
> Ok - assuming this was tested appropriately with no regressions.

Given it only touches arm code, I only tested it on arm and I have seen
no regression. That said I wouldn't be surprised if the new testcase
fails on some other architectures.

Also I have done the copyright assignment for GCC, but I do not have SVN
write access. Could someone please commit this for me?

Thanks,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [PATCH GCC]Improve no-overflow check in SCEV using value range info.

2016-07-20 Thread Richard Biener

On Tue, Jul 19, 2016 at 6:15 PM, Bin.Cheng  wrote:
> On Tue, Jul 19, 2016 at 1:10 PM, Richard Biener
>  wrote:
>> On Mon, Jul 18, 2016 at 6:27 PM, Bin Cheng  wrote:
>>> Hi,
>>> Scalar evolution needs to prove no-overflow for source variable when 
>>> handling type conversion.  This is important because otherwise we would 
>>> fail to recognize result of the conversion as SCEV, resulting in missing 
>>> loop optimizations.  Take case added by this patch as an example, the loop 
>>> can't be distributed as memset call because address of memory reference is 
>>> not recognized.  At the moment, we rely on type overflow semantics and loop 
>>> niter info for no-overflow checking, unfortunately that's not enough.  This 
>>> patch introduces new method checking no-overflow using value range 
>>> information.  As commented in the patch, value range can only be used when 
>>> source operand variable evaluates on every loop iteration, rather than 
>>> guarded by some conditions.
>>>
>>> This together with patch improving loop niter analysis 
>>> (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00736.html) can help various 
>>> loop passes like vectorization.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> @@ -3187,7 +3187,8 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>>/* If access is not executed on every iteration, we must ensure that 
>> overlow
>>   may not make the access valid later.  */
>>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>> -  && scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
>> loop->num),
>> +  && scev_probably_wraps_p (NULL,
>>
>> use NULL_TREE for the null pointer constant of tree.
>>
>> +  /* Check if VAR evaluates in every loop iteration.  */
>> +  gimple *def;
>> +  if ((def = SSA_NAME_DEF_STMT (var)) != NULL
>>
>> def is never NULL but it might be a GIMPLE_NOP which has a NULL gimple_bb.
>> Better check for ! SSA_DEFAULT_DEF_P (var)
>>
>> +  if (TREE_CODE (step) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE (var)))
>> +return false;
>>
>> this looks like a cheaper test so please do that first.
>>
>> +  step_wi = step;
>> +  type = TREE_TYPE (var);
>> +  if (tree_int_cst_sign_bit (step))
>> +{
>> +  diff = lower_bound_in_type (type, type);
>> +  diff = minv - diff;
>> +  step_wi = - step_wi;
>> +}
>> +  else
>> +{
>> +  diff = upper_bound_in_type (type, type);
>> +  diff = diff - maxv;
>> +}
>>
>> this lacks a comment - it's not obvious to me what the gymnastics
>> with lower/upper_bound_in_type are supposed to achieve.
>
> Thanks for reviewing, I will prepare another version of patch.
>>
>> As VRP uses niter analysis itself I wonder how this fires back-to-back 
>> between
> I am not sure if I mis-understood the question.  If the VRP
> information comes from loop niter, I think it will not change loop
> niter or VRP2 in back because that's the best information we got in
> the first place in niter.  If the VRP information comes from other
> places (guard conditions?)  SCEV and loop niter after vrp1 might be
> improved and thus VRP2.  There should be no problems in either case,
> as long as GCC breaks the recursive chain among niter/scev/vrp
> correctly.

Ok.

>> VRP1 and VRP2?  If the def of var dominates the latch isn't it enough to do
>> a + 1 to check whether VRP bumped the range up to INT_MAX/MIN?  That is,
>> why do we need to add step if not for the TYPE_OVERFLOW_UNDEFINED case
>> of VRP handling the ranges optimistically?
> Again, please correct me if I mis-understood.  Considering a variable
> whose type is unsigned int and scev is {0, 4}_loop, the value range
> could be computed as [0, 0xfffc], thus MAX + 1 is smaller than
> type_MAX, but the scev could be overflow.

Yes.  I was wondering about the case where VRP bumps the range to +INF
because it gave up during iteration or because overflow behavior is undefined.
Do I understand correctly that the code is mostly to improve the not
undefined-overflow case?

Also I was wondering if the range DEF dominates the latch then why
do we necessarily need to add step to verify overflow?  Can't we do better
if we for example see that the DEF is the loop header PHI?

Richard.

> Thanks,
> bin

Re: [PATCH] Add qsort comparator consistency checking (PR71702)

2016-07-20 Thread Richard Biener

On Tue, Jul 19, 2016 at 3:27 PM, Alexander Monakov  wrote:
> On Tue, 19 Jul 2016, Richard Biener wrote:
>> Yes.  The other option is to enable this checking not with ENABLE_CHECKING
>> but some new checking option, say ENABLE_CHECKING_ALGORITHMS, and
>> do full checking in that case.
>
> Thanks - I'm going to fold in this idea when redoing the patch (i.e. check a
> subset of pairs under normal checking, all pairs under this option macro).
>
> While the topic is fresh, I'd like to mention a small complication with
> extending this checking to cover all qsort calls.  I mentioned in the opening
> mail that I was going to do that with a '#define qsort(..) qsort_chk (..)' in
> gcc/system.h, but I missed that vec::qsort would be subject to this macro
> expansion as well.
>
> I see two possible solutions.  The first is to use the argument counting trick
> to disambiguate between libc qsort(base, n, sz, cmp) and vec::qsort(cmp) on 
> the
> preprocessor level.  I don't see a reason it wouldn't work, but in this 
> context
> I'd consider that a last-resort measure rather than an appropriate solution.
>
> The second is to rename vec::qsort to vec::sort. While mass renaming is not 
> very
> nice, I hope it is acceptable in this case (I think formally vec::qsort
> declaration in GCC is not portable, because it implicitly expects that 
> stdlib.h
> wouldn't shadow qsort with a function-like macro).
>
>
> Actually, thinking about it more, instead of redirecting qsort in system.h, it
> may be more appropriate to introduce gcc_qsort that wraps qsort and does
> checking, add gcc_qsort_nochk as an escape hatch for cases where checking
> shouldn't be done, and poison qsort in system.h (this again depends on doing 
> the
> vec::sort mass-rename).

I don't think macro expansion applies to vec.qsort or vec::qsort and if it did
such macro would be bogus and we'd simply undefine it.

To capture non-vec qsorts simply wrap qsort everywhere like you suggested.

Richard.

>
> Alexander

Re: [PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Ramana Radhakrishnan

On Wed, Jul 20, 2016 at 10:56 AM, Aurelien Jarno  wrote:
> On 2016-07-20 10:10, Ramana Radhakrishnan wrote:
>> On Wed, Jul 20, 2016 at 8:48 AM, Aurelien Jarno  wrote:
>> > On ARM soft-float, the float to double conversion doesn't convert a sNaN
>> > to qNaN as the IEEE Std 754 standard mandates:
>> >
>> > "Under default exception handling, any operation signaling an invalid
>> > operation exception and for which a floating-point result is to be
>> > delivered shall deliver a quiet NaN."
>> >
>> > Given the soft float ARM code ignores exceptions and always provides a
>> > result, a float to double conversion of a signaling NaN should return a
>> > quiet NaN. Fix this in extendsfdf2.
>> >
>> > gcc/ChangeLog:
>> >
>> > PR target/59833
>> > * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > * gcc.dg/pr59833.c: New testcase.
>>
>>
>> Ok - assuming this was tested appropriately with no regressions.
>
> Given it only touches arm code, I only tested it on arm and I have seen
> no regression. That said I wouldn't be surprised if the new testcase
> fails on some other architectures.

I was assuming you tested it on ARM :)  In this case given the change
is only in the backend I would have expected this patch to have been
tested for soft-float ARM or an appropriate multilib. Saying what
configuration the patch was tested on is useful for the audit trail.
For e.g. it's no use testing this patch on armhf ( i.e.
--with-float=hard --with-fpu=vfpv3/neon --with-arch=armv7-a) as by
default the test would never generate the call to the library function
but I'm sure you know all that anyway.

I don't know if this would pass or fail on other architectures - if it
fails it indicates a bug in their ports and for them to fix up as
appropriate.


>
> Also I have done the copyright assignment for GCC, but I do not have SVN
> write access. Could someone please commit this for me?

I will take care of it. Thanks again for following up on the patch.

Thanks,
Ramana

>
> Thanks,
> Aurelien
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://www.aurel32.net

Re: [PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Aurelien Jarno

On 2016-07-20 11:04, Ramana Radhakrishnan wrote:
> On Wed, Jul 20, 2016 at 10:56 AM, Aurelien Jarno  wrote:
> > On 2016-07-20 10:10, Ramana Radhakrishnan wrote:
> >> On Wed, Jul 20, 2016 at 8:48 AM, Aurelien Jarno  
> >> wrote:
> >> > On ARM soft-float, the float to double conversion doesn't convert a sNaN
> >> > to qNaN as the IEEE Std 754 standard mandates:
> >> >
> >> > "Under default exception handling, any operation signaling an invalid
> >> > operation exception and for which a floating-point result is to be
> >> > delivered shall deliver a quiet NaN."
> >> >
> >> > Given the soft float ARM code ignores exceptions and always provides a
> >> > result, a float to double conversion of a signaling NaN should return a
> >> > quiet NaN. Fix this in extendsfdf2.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> > PR target/59833
> >> > * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> > * gcc.dg/pr59833.c: New testcase.
> >>
> >>
> >> Ok - assuming this was tested appropriately with no regressions.
> >
> > Given it only touches arm code, I only tested it on arm and I have seen
> > no regression. That said I wouldn't be surprised if the new testcase
> > fails on some other architectures.
> 
> I was assuming you tested it on ARM :)  In this case given the change
> is only in the backend I would have expected this patch to have been
> tested for soft-float ARM or an appropriate multilib. Saying what
> configuration the patch was tested on is useful for the audit trail.
> For e.g. it's no use testing this patch on armhf ( i.e.
> --with-float=hard --with-fpu=vfpv3/neon --with-arch=armv7-a) as by
> default the test would never generate the call to the library function
> but I'm sure you know all that anyway.

Indeed I should have given more details. I tested it on a Debian armel
machine, and I configured GCC the same way as the Debian package, that
is using --with-arch=armv4t --with-float=soft.

I built it once with the new test but without the fix and a second time
with both the test and the fix. I have verified that the test fails in
the first case and pass in the second case.

> I don't know if this would pass or fail on other architectures - if it
> fails it indicates a bug in their ports and for them to fix up as
> appropriate.
> 
> 
> >
> > Also I have done the copyright assignment for GCC, but I do not have SVN
> > write access. Could someone please commit this for me?
> 
> I will take care of it. Thanks again for following up on the patch.

Thanks!

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Ramana Radhakrishnan

On Wed, Jul 20, 2016 at 11:14 AM, Aurelien Jarno  wrote:
> On 2016-07-20 11:04, Ramana Radhakrishnan wrote:
>> On Wed, Jul 20, 2016 at 10:56 AM, Aurelien Jarno  
>> wrote:
>> > On 2016-07-20 10:10, Ramana Radhakrishnan wrote:
>> >> On Wed, Jul 20, 2016 at 8:48 AM, Aurelien Jarno  
>> >> wrote:
>> >> > On ARM soft-float, the float to double conversion doesn't convert a sNaN
>> >> > to qNaN as the IEEE Std 754 standard mandates:
>> >> >
>> >> > "Under default exception handling, any operation signaling an invalid
>> >> > operation exception and for which a floating-point result is to be
>> >> > delivered shall deliver a quiet NaN."
>> >> >
>> >> > Given the soft float ARM code ignores exceptions and always provides a
>> >> > result, a float to double conversion of a signaling NaN should return a
>> >> > quiet NaN. Fix this in extendsfdf2.
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> > PR target/59833
>> >> > * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> > * gcc.dg/pr59833.c: New testcase.
>> >>
>> >>
>> >> Ok - assuming this was tested appropriately with no regressions.
>> >
>> > Given it only touches arm code, I only tested it on arm and I have seen
>> > no regression. That said I wouldn't be surprised if the new testcase
>> > fails on some other architectures.
>>
>> I was assuming you tested it on ARM :)  In this case given the change
>> is only in the backend I would have expected this patch to have been
>> tested for soft-float ARM or an appropriate multilib. Saying what
>> configuration the patch was tested on is useful for the audit trail.
>> For e.g. it's no use testing this patch on armhf ( i.e.
>> --with-float=hard --with-fpu=vfpv3/neon --with-arch=armv7-a) as by
>> default the test would never generate the call to the library function
>> but I'm sure you know all that anyway.
>
> Indeed I should have given more details. I tested it on a Debian armel
> machine, and I configured GCC the same way as the Debian package, that
> is using --with-arch=armv4t --with-float=soft.
>
> I built it once with the new test but without the fix and a second time
> with both the test and the fix. I have verified that the test fails in
> the first case and pass in the second case.

Thanks for the info - what about all the other regression tests ? Did
you do a full make check and ensure that no other tests regressed in
comparison ?  Patches need to be tested against the entire regression
testsuite and not just what was added.


regards
Ramana

Re: [VRP] Use alloc-pool and obstack for value_range and vr->equiv allocations

2016-07-20 Thread Richard Biener

On Wed, Jul 20, 2016 at 4:16 AM, kugan
 wrote:
> Hi Richard,
>
> As discussed in IPA-VRP discussion, this patch makes tree-vrp allocations
> use alloc-pool and obstack for value_range and vr->equiv respectively. Other
> allocations are rare and left as it is.
>
> Bootstrapped and regression tested on x86-64-linux with no new regressions.
> Is this OK for trunk.

Ok.

Thanks,
Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-07-20  Kugan Vivekanandarajah  
>
> * tree-vrp.c (set_value_range): Use vrp_equiv_obstack with
> BITMAP_ALLOC.
> (add_equivalence): Likewise.
> (get_value_range): Allocate value range with vrp_value_range_pool.
> (vrp_initialize): Initialize vrp_equiv_obstack for equiv allocation.
> (vrp_finalize): Relase vrp_equiv_obstack and vrp_value_range_pool.
>
>

Re: [PATCH] Properly handly PHI stmts in later_of_the_two (PR, middle-end/71898)

2016-07-20 Thread Richard Biener

On Wed, Jul 20, 2016 at 11:24 AM, Martin Liška  wrote:
> Hi.
>
> Graphite uses comparison of gsi_stmt_iterators (later_of_the_two) to find a 
> place where
> to insert a new gimple statement. Problem of the function is that it does not 
> distinguish between
> PHI and non-PHI statements, where the former one always stands before the 
> later one. The patch
> fixes that.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok.  (this is a gross function - graphite should have computed UIDs and use
those instead of linear walks...)

Richard.

> Martin

Re: [patch] Add new hook to diagnose address space usage (take #2)

2016-07-20 Thread Bernd Schmidt




On 07/19/2016 10:16 AM, Georg-Johann Lay wrote:


Done.  Attached is the updated version of the change, log entry is the
same as before.

Johann


LGTM.


Bernd

Re: Merge switch statements in tree-cfgcleanup

2016-07-20 Thread Bernd Schmidt


On 07/19/2016 01:18 PM, Richard Biener wrote:

On Tue, Jul 19, 2016 at 1:07 PM, Bernd Schmidt  wrote:

On 07/19/2016 12:35 PM, Richard Biener wrote:


I think that start/end_recording_case_labels also merged adjacent labels
via group_case_labels_stmt.  Not sure why you need to stop recording
case labels during the transform.  Is this because you are building a new
switch stmt?



It's because the cached mapping gets invalidated. Look in tree-cfg, it has a
edge_to_cases map which I think cannot be maintained if you modify the
structure. I certainly got lots of internal errors until I added that pair
of calls.


Yeah, I see that.  OTOH cfgcleanup relies on this cache to be efficient and
you (repeatedly) clear it.  Clearing parts of it should be sufficient and if you
used redirect_edge_and_branch instead of redirect_edge_pred it would have
maintained the cache as far as I can see,


I don't think that would work, since we're modifying and/or discarding 
case labels as well and they can't remain part of the cache.



or you can make sure to maintain
it yourself or just clear the info associated with the edges you redirect from
one switch to another.


How's this? Tested as before.


Bernd
	* tree-cfgcleanup.c (try_merge_switches): New static function.
	(cleanup_tree_cfg_bb): Call it.
	* tree-cfg.c (discard_case_labels_for): New function.
	* tree-cfg.h (discard_case_labels_for): Declare it.

	* c-c++-common/merge-switch-1.c: New test.
	* c-c++-common/merge-switch-2.c: New test.
	* c-c++-common/merge-switch-3.c: New test.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c	(revision 237797)
+++ gcc/tree-cfg.c	(working copy)
@@ -1185,6 +1185,32 @@ end_recording_case_labels (void)
   BITMAP_FREE (touched_switch_bbs);
 }
 
+/* Discard edge information for a single switch.  */
+void
+discard_case_labels_for (gswitch *t)
+{
+  if (!recording_case_labels_p ())
+return;
+
+  basic_block bb = gimple_bb (t);
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+{
+  tree *slot = edge_to_cases->get (e);
+  if (!slot)
+	continue;
+  edge_to_cases->remove (e);
+  tree t, next;
+  for (t = *slot; t; t = next)
+	{
+	  next = CASE_CHAIN (t);
+	  CASE_CHAIN (t) = NULL;
+	}
+  *slot = NULL;
+}
+}
+
 /* If we are inside a {start,end}_recording_cases block, then return
a chain of CASE_LABEL_EXPRs from T which reference E.
 
Index: gcc/tree-cfg.h
===
--- gcc/tree-cfg.h	(revision 237797)
+++ gcc/tree-cfg.h	(working copy)
@@ -33,6 +33,7 @@ extern void init_empty_tree_cfg_for_func
 extern void init_empty_tree_cfg (void);
 extern void start_recording_case_labels (void);
 extern void end_recording_case_labels (void);
+extern void discard_case_labels_for (gswitch *);
 extern basic_block label_to_block_fn (struct function *, tree);
 #define label_to_block(t) (label_to_block_fn (cfun, t))
 extern void cleanup_dead_labels (void);
Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c	(revision 237797)
+++ gcc/tree-cfgcleanup.c	(working copy)
@@ -630,6 +630,233 @@ fixup_noreturn_call (gimple *stmt)
   return changed;
 }
 
+/* Look for situations where we have a switch inside the default case of
+   another, and they switch on the same condition.  We look for the
+   second switch in BB.  If we find such a situation, merge the two
+   switch statements.  */
+
+static bool
+try_merge_switches (basic_block bb)
+{
+  if (!single_pred_p (bb))
+return false;
+  basic_block pred_bb = single_pred (bb);
+
+  /* Look for a structure with two switch statements on the same value.  */
+  gimple_stmt_iterator gsi1, gsi2;
+  gsi1 = gsi_last_nondebug_bb (pred_bb);
+  gimple *pred_end = last_stmt (pred_bb);
+  if (! pred_end || gimple_code (pred_end) != GIMPLE_SWITCH)
+return false;
+
+  gsi2 = gsi_start_nondebug_after_labels_bb (bb);
+  if (gsi_end_p (gsi2))
+return false;
+
+  gimple *stmt = gsi_stmt (gsi2);
+  if (gimple_code (stmt) != GIMPLE_SWITCH)
+return false;
+
+  gswitch *sw1 = as_a  (pred_end);
+  gswitch *sw2 = as_a  (stmt);
+  tree idx1 = gimple_switch_index (sw1);
+  tree idx2 = gimple_switch_index (sw2);
+  if (TREE_CODE (idx1) != SSA_NAME || idx1 != idx2)
+return false;
+  size_t n1 = gimple_switch_num_labels (sw1);
+  size_t n2 = gimple_switch_num_labels (sw2);
+  if (n1 <= 1 || n2 <= 1)
+return false;
+  tree sw1_default = gimple_switch_default_label (sw1);
+  if (label_to_block (CASE_LABEL (sw1_default)) != bb)
+return false;
+
+  /* We know we have the basic structure of what we are looking for.  Sort
+ out some special cases regarding phi nodes.  */
+  if (!gsi_end_p (gsi_start_phis (bb)))
+return false;
+
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+{
+  basic_block dest = e->dest;
+  if (find_edge (pred_bb, dest))
+	{
+	  /

Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns

2016-07-20 Thread Bernd Schmidt




On 07/19/2016 05:35 PM, Segher Boessenkool wrote:

On Tue, Jul 19, 2016 at 04:49:26PM +0200, Bernd Schmidt wrote:

But you need the profile to make even reasonably good decisions.


I'm not worried about making cost decisions: as far as I'm concerned
it's perfectly fine for that. I'm worried about correctness - you can't
validly save registers inside a loop.


Of course you can.  It needs to be paired with a restore; and we do
that just fine.

> Pretty much *all* implementations in the literature do this, fwiw.

I, however, fail to see where this happens. If you have references to 
somewhere where this algorithm is described, that would be helpful, 
because at this stage I think I really don't understand what you're 
trying to achieve. The submission lacks examples.


So I could see things could work if you place an epilogue part in the 
last block of a loop if the start of the loop contains a corresponding 
part of the prologue, but taking just the comment in the code:

   Prologue concerns are placed in such a way that they are executed as
   infrequently as possible.  Epilogue concerns are put everywhere where
   there is an edge from a bb dominated by such a prologue concern to a
   bb not dominated by one.

this describes no mechanism by which such a thing would happen. And I 
fail to see how moving parts of the prologue into a loop would be 
beneficial as an optimization.



Bernd

Re: [PATCH v3] S/390: Add splitter for "and" with complement.

2016-07-20 Thread Dominik Vogt

Version 3 of the patch.  See below for changes.  Regression tested
on s390x and s390.

On Tue, Jul 19, 2016 at 01:05:52PM +0200, Andreas Krebbel wrote:
> On 07/19/2016 11:37 AM, Dominik Vogt wrote:
> > +(define_insn_and_split "*andc_split"
> 
> Please append  here to make the insn name unique.

Done.

> > +  if (reg_overlap_mentioned_p (operands[0], operands[2]))
> > +{
> > +  gcc_assert (can_create_pseudo_p ());
> 
> Is it really safe to assume we will never get here after reload? I don't see 
> where this is
> prevented. Btw. the very same assertion is in gen_reg_rtx anyway so no need 
> to duplicate it.

Added "! reload_completed" to the pattern condition as discussed
internally.

> > +(define_insn_and_split "*andc_split2"
> 
>  missing

Done.

> Looks like these testcase could be merged by putting the lp64 conditions at 
> the scan-assembler
> directives.

As discussed internally, leave them in separate files but also run
the second one.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("*andc_split", "*andc_split2"): New splitters
for and with complement.
gcc/testsuite/ChangeLog

* gcc.target/s390/md/andc-splitter-1.c: New test case.
* gcc.target/s390/md/andc-splitter-2.c: Likewise.
>From e6e6f187aaac1c9368c1c2168b74ee642168a095 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 14 Mar 2016 17:48:17 +0100
Subject: [PATCH] S/390: Add splitter for "and" with complement.

Force splitting of logical operator expressions ...  with three operands, a
register destination and a memory operand because there are no instructions for
that and combine results in inefficient code.
---
 gcc/config/s390/s390.md| 43 +++
 gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c | 61 ++
 gcc/testsuite/gcc.target/s390/md/andc-splitter-2.c | 61 ++
 3 files changed, 165 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/andc-splitter-2.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index f8c61a8..8f43dfa 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -7262,6 +7262,49 @@
(set_attr "z10prop" "z10_super_E1,z10_super,*")])
 
 ;
+; And with complement
+;
+; c = ~b & a = (b & a) ^ a
+
+(define_insn_and_split "*andc_split_"
+  [(set (match_operand:GPR 0 "nonimmediate_operand" "")
+   (and:GPR (not:GPR (match_operand:GPR 1 "nonimmediate_operand" ""))
+(match_operand:GPR 2 "general_operand" "")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_ZARCH && ! reload_completed && s390_logical_operator_ok_p (operands)"
+  "#"
+  "&& 1"
+  [
+  (parallel
+   [(set (match_dup 3) (and:GPR (match_dup 1) (match_dup 2)))
+   (clobber (reg:CC CC_REGNUM))])
+  (parallel
+   [(set (match_dup 0) (xor:GPR (match_dup 3) (match_dup 2)))
+   (clobber (reg:CC CC_REGNUM))])]
+{
+  if (reg_overlap_mentioned_p (operands[0], operands[2]))
+operands[3] = gen_reg_rtx (mode);
+  else
+operands[3] = operands[0];
+})
+
+; Convert "(xor (operand) (-1))" to "(not (operand))" for low optimization
+; levels so that "*andc_split" matches.
+(define_insn_and_split "*andc_split2_"
+  [(set (match_operand:GPR 0 "nonimmediate_operand" "")
+(and:GPR (xor:GPR (match_operand:GPR 1 "nonimmediate_operand" "")
+ (const_int -1))
+(match_operand:GPR 2 "general_operand" "")))
+(clobber (reg:CC CC_REGNUM))]
+  "TARGET_ZARCH && s390_logical_operator_ok_p (operands)"
+  "#"
+  "&& 1"
+  [(parallel
+[(set (match_dup 0) (and:GPR (not:GPR (match_dup 1)) (match_dup 2)))
+(clobber (reg:CC CC_REGNUM))])]
+)
+
+;
 ; Block and (NC) patterns.
 ;
 
diff --git a/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c 
b/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c
new file mode 100644
index 000..ed78921
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/andc-splitter-1.c
@@ -0,0 +1,61 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do run { target { lp64 } } } */
+/* { dg-options "-mzarch -save-temps -dP" } */
+/* Skip test if -O0 is present on the command line:
+
+{ dg-skip-if "" { *-*-* } { "-O0" } { "" } }
+
+   Skip test if the -O option is missing from the command line
+{ dg-skip-if "" { *-*-* } { "*" } { "-O*" } }
+*/
+
+__attribute__ ((noinline))
+unsigned long andc_vv(unsigned long a, unsigned long b)
+{ return ~b & a; }
+/* { dg-final { scan-assembler ":15 .\* \{\\*anddi3\}" } } */
+/* { dg-final { scan-assembler ":15 .\* \{\\*xordi3\}" } } */
+
+__attribute__ ((noinline))
+unsigned long andc_pv(unsigned long *a, unsigned long b)
+{ return ~b & *a; }
+/* { dg-final { scan-assembler ":21 .\* \{\\*anddi3\}" } } */
+/* { dg-final { scan-assembler ":21 .\* \{\\*xordi3\}" } } */
+
+__attribute__ ((noinline))
+unsigned long andc_vp(unsigned long a, unsigned long *b)
+{ return ~*b & a; }
+/* { dg

Re: [PATCH] Avoid invoking ranlib on libbackend.a

2016-07-20 Thread Bernd Schmidt


On 07/19/2016 10:20 AM, Richard Biener wrote:

I like it.  Improving re-build time in my dev tree is very much
welcome, and yes,
libbackend build time is a big part of it usually (plus of course cc1
link time).


Since that wasn't an entirely explicit ack, I'll add mine. Thank you for 
doing this.



Bernd

Re: [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost

2016-07-20 Thread Bernd Schmidt


On 07/20/2016 11:51 AM, James Greenhalgh wrote:



2016-07-20  James Greenhalgh  

* target.def (max_noce_ifcvt_seq_cost): New.
* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
* doc/tm.texi: Regenerate.
* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
* doc/invoke.texi: Document new params.


I think this is starting to look like a clear improvement, so I'll ack 
patches 1-3 with a few minor comments, and with the expectation that 
you'll address performance regressions on other targets if they occur. 
Number 4 I still need to figure out.


Minor details:


+  if (!speed_p)
+{
+  return cost <= if_info->original_cost;
+}


No braces around single statements in ifs. There's an instance of this 
in patch 4 as well.



+  if (global_options_set.x_param_values[param])
+return PARAM_VALUE (param);


How about wrapping the param value into COSTS_N_INSNS, to make the value 
of the param less dependent on compiler internals?


In patch 4:


+  /* Check that our new insn isn't going to clobber ORIG_OTHER_DEST.  */
+  bool modified_in_x = (set_tmp != NULL_RTX)
+   && modified_in_p (orig_other_dest, set_tmp);


Watch line wrapping. No parens around the first subexpression (there are 
other examples of unnecessary ones in invocations of noce_arith_helper), 
but around the full one.



Bernd

Re: [PATCH] Fix unsafe function attributes for special functions (PR 71876)

2016-07-20 Thread Bernd Edlinger

On 07/20/16 12:46, Richard Biener wrote:
> On Wed, 20 Jul 2016, Richard Biener wrote:
>
>> On Tue, 19 Jul 2016, Bernd Edlinger wrote:
>>
>>> Hi!
>>>
>>> As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71876,
>>> we have a _very_ old hack in gcc, that recognizes certain functions by
>>> name, and inserts in some cases unsafe attributes, that don't work for
>>> a freestanding environment.
>>>
>>> It is unsafe to return ECF_MAY_BE_ALLOCA, ECF_LEAF and ECF_NORETURN
>>> from special_function_p, just by the name of the function, especially
>>> for less well known functions, like "getcontext" or "savectx", which
>>> could easily used for something completely different.
>>
>> Returning ECF_MAY_BE_ALLOCA is safe.  Just wanted to mention this,
>> regardless of the followups you already received.
>
> Oh, and maybe you can factor out the less controversical parts,
> namely ignoring the __builtin_ prefix.  I don't think that
> calling __builtin_setjmp in an environment where setjmp is not a
> builtin should beave like setjmp (it will call a function named
> '__builtin_setjmp').


I wonder how I manage to dig out such contriversical things ;)

But you are right, that would at least be a start.

So this patch is what you requested:

Remove the handling of the __builtin_ prefix from special_function_p
and add the returns_twice attribute to the __builtin_setjmp declaration
instead.

Is it OK after boot-strap and regression-testing?


Thanks
Bernd.
2016-07-19  Bernd Edlinger  

	PR middle-end/71876
	* builtin-attrs.def (ATTR_RT_NOTHROW_LEAF_LIST): New return twice
	attribute.
	* builtins.def (BUILT_IN_SETJMP): Use ATTR_RT_NOTHROW_LEAF_LIST here.
	* calls.c (special_function_p): Remove the special handling of the 
	"__builtin_" prefix.

Index: gcc/builtin-attrs.def
===
--- gcc/builtin-attrs.def	(Revision 238382)
+++ gcc/builtin-attrs.def	(Arbeitskopie)
@@ -131,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LIST, AT
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_RT_NOTHROW_LEAF_LIST, ATTR_RETURNS_TWICE,\
+			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_COLD_NOTHROW_LEAF_LIST, ATTR_COLD,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
Index: gcc/builtins.def
===
--- gcc/builtins.def	(Revision 238382)
+++ gcc/builtins.def	(Arbeitskopie)
@@ -837,7 +837,7 @@ DEF_LIB_BUILTIN(BUILT_IN_REALLOC, "realloc
 DEF_GCC_BUILTIN(BUILT_IN_RETURN, "return", BT_FN_VOID_PTR, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_RETURN_ADDRESS, "return_address", BT_FN_PTR_UINT, ATTR_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_SAVEREGS, "saveregs", BT_FN_PTR_VAR, ATTR_NULL)
-DEF_GCC_BUILTIN(BUILT_IN_SETJMP, "setjmp", BT_FN_INT_PTR, ATTR_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_SETJMP, "setjmp", BT_FN_INT_PTR, ATTR_RT_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_STRFMON, "strfmon", BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR, ATTR_FORMAT_STRFMON_NOTHROW_3_4)
 DEF_LIB_BUILTIN(BUILT_IN_STRFTIME, "strftime", BT_FN_SIZE_STRING_SIZE_CONST_STRING_CONST_PTR, ATTR_FORMAT_STRFTIME_NOTHROW_3_0)
 DEF_GCC_BUILTIN(BUILT_IN_TRAP, "trap", BT_FN_VOID, ATTR_NORETURN_NOTHROW_LEAF_LIST)
Index: gcc/calls.c
===
--- gcc/calls.c	(Revision 238382)
+++ gcc/calls.c	(Arbeitskopie)
@@ -514,14 +514,10 @@ special_function_p (const_tree fndecl, int flags)
 	  && ! strcmp (name, "alloca"))
 	flags |= ECF_MAY_BE_ALLOCA;
 
-  /* Disregard prefix _, __, __x or __builtin_.  */
+  /* Disregard prefix _, __ or __x.  */
   if (name[0] == '_')
 	{
-	  if (name[1] == '_'
-	  && name[2] == 'b'
-	  && !strncmp (name + 3, "uiltin_", 7))
-	tname += 10;
-	  else if (name[1] == '_' && name[2] == 'x')
+	  if (name[1] == '_' && name[2] == 'x')
 	tname += 3;
 	  else if (name[1] == '_')
 	tname += 2;

Re: move increase_alignment from simple to regular ipa pass

2016-07-20 Thread Prathamesh Kulkarni

ping * 3 https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html

Thanks,
Prathamesh

On 5 July 2016 at 10:53, Prathamesh Kulkarni
 wrote:
> ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
>
> Thanks,
> Prathamesh
>
> On 28 June 2016 at 14:49, Prathamesh Kulkarni
>  wrote:
>> ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
>>
>> Thanks,
>> Prathamesh
>>
>> On 23 June 2016 at 22:51, Prathamesh Kulkarni
>>  wrote:
>>> On 17 June 2016 at 19:52, Prathamesh Kulkarni
>>>  wrote:
 On 14 June 2016 at 18:31, Prathamesh Kulkarni
  wrote:
> On 13 June 2016 at 16:13, Jan Hubicka  wrote:
>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>> index ecafe63..41ac408 100644
>>> --- a/gcc/cgraph.h
>>> +++ b/gcc/cgraph.h
>>> @@ -1874,6 +1874,9 @@ public:
>>>   if we did not do any inter-procedural code movement.  */
>>>unsigned used_by_single_function : 1;
>>>
>>> +  /* Set if -fsection-anchors is set.  */
>>> +  unsigned section_anchor : 1;
>>> +
>>>  private:
>>>/* Assemble thunks and aliases associated to varpool node.  */
>>>void assemble_aliases (void);
>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>>> index 4bfcad7..e75d5c0 100644
>>> --- a/gcc/cgraphunit.c
>>> +++ b/gcc/cgraphunit.c
>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl)
>>>   it is available to notice_global_symbol.  */
>>>node->definition = true;
>>>notice_global_symbol (decl);
>>> +
>>> +  node->section_anchor = flag_section_anchors;
>>> +
>>>if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl)
>>>/* Traditionally we do not eliminate static variables when not
>>>optimizing and when not doing toplevel reoder.  */
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index f0d7196..e497795 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -1590,6 +1590,10 @@ fira-algorithm=
>>>  Common Joined RejectNegative Enum(ira_algorithm) 
>>> Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization
>>>  -fira-algorithm=[CB|priority] Set the used IRA algorithm.
>>>
>>> +fipa-increase_alignment
>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization
>>> +Option to gate increase_alignment ipa pass.
>>> +
>>>  Enum
>>>  Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA 
>>> algorithm %qs)
>>>
>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
>>> Init(1) Optimization
>>>  Enable the dependent count heuristic in the scheduler.
>>>
>>>  fsection-anchors
>>> -Common Report Var(flag_section_anchors) Optimization
>>> +Common Report Var(flag_section_anchors)
>>>  Access data in the same section from shared anchor points.
>>>
>>>  fsee
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index a0db3a4..1482566 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void)
>>>
>>>aarch64_register_fma_steering ();
>>>
>>> +  /* Enable increase_alignment pass.  */
>>> +  flag_ipa_increase_alignment = 1;
>>
>> I would rather enable it always on targets that do support anchors.
> AFAIK aarch64 supports section anchors.
>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
>>> index ce9e146..7f09f3a 100644
>>> --- a/gcc/lto/lto-symtab.c
>>> +++ b/gcc/lto/lto-symtab.c
>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, 
>>> symtab_node *entry)
>>>   The type compatibility checks or the completing of types has 
>>> properly
>>>   dealt with most issues.  */
>>>
>>> +  /* ??? is this assert necessary ?  */
>>> +  varpool_node *v_prevailing = dyn_cast (prevailing);
>>> +  varpool_node *v_entry = dyn_cast (entry);
>>> +  gcc_assert (v_prevailing && v_entry);
>>> +  /* section_anchor of prevailing_decl wins.  */
>>> +  v_entry->section_anchor = v_prevailing->section_anchor;
>>> +
>> Other flags are merged in lto_varpool_replace_node so please move this 
>> there.
> Ah indeed, thanks for the pointers.
> I wonder though if we need to set
> prevailing_node->section_anchor = vnode->section_anchor ?
> IIUC, the function merges flags from vnode into prevailing_node
> and removes vnode. However we want prevailing_node->section_anchor
> to always take precedence.
>>> +/* Return true if alignment should be increased for this vnode.
>>> +   This is done if every function that references/referring to vnode
>>> +   has flag_tree_loop_vectorize set.  */
>>> +
>>> +static bool
>>> +increase_alignment_p (varpool_node *vnode)
>>> +{
>>> +  ipa_ref *ref;
>>> +
>>> +  fo

Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases

2016-07-20 Thread Prathamesh Kulkarni

ping * 3 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html

Thanks,
Prathamesh

On 29 June 2016 at 22:09, Prathamesh Kulkarni
 wrote:
> ping * 2 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html
>
> Thanks,
> Prathamesh
>
> On 7 June 2016 at 13:56, Prathamesh Kulkarni
>  wrote:
>> ping https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html
>>
>> Thanks,
>> Prathamesh
>>
>> On 25 May 2016 at 18:19, Prathamesh Kulkarni
>>  wrote:
>>> On 23 May 2016 at 14:28, Prathamesh Kulkarni
>>>  wrote:
 Hi,
 This patch overrides expand_divmod_libfunc for ARM port and adds 
 test-cases.
 I separated the SImode tests into separate file from DImode tests
 because certain arm configs (cortex-15) have hardware div insn for
 SImode but not for DImode,
 and for that config we want SImode tests to be disabled but not DImode 
 tests.
 The patch therefore has two target-effective checks: divmod and 
 divmod_simode.
 Cross-tested on arm*-*-*.
 Bootstrap+test on arm-linux-gnueabihf in progress.
 Does this patch look OK ?
>>> Hi,
>>> This version adds couple of more test-cases and fixes typo in
>>> divmod-3-simode.c, divmod-4-simode.c
>>>
>>> Thanks,
>>> Prathamesh

 Thanks,
 Prathamesh

[PING] C/C++: Simplify handling of location information for OpenACC routine directives

2016-07-20 Thread Thomas Schwinge

Hi!

Ping.

On Wed, 13 Jul 2016 11:25:46 +0200, I wrote:
> Working on something else regarding the C/C++ OpenACC routine directive,
> I couldn't but untangle that arcane location_t handling, currently using
> a dummy OMP_CLAUSE_SEQ.  Along the way, I also updated some comments, and
> simplified some code.  OK for trunk?  (Another C/C++ OpenACC routine
> cleanup patch is emerging, depending on this one.)
> 
> commit 9ae5f6d868db42b585de8a1d5ec3c2746619
> Author: Thomas Schwinge 
> Date:   Fri Jul 8 18:30:45 2016 +0200
> 
> C/C++: Simplify handling of location information for OpenACC routine 
> directives
> 
>   gcc/c/
>   * c-parser.c (struct oacc_routine_data): New.
>   (c_parser_declaration_or_fndef, c_parser_oacc_routine): Use it.
>   Simplify code.
>   (c_finish_oacc_routine): Likewise.  Don't attach clauses to "omp
>   declare target" attribute.
>   gcc/cp/
>   * parser.h (struct cp_omp_declare_simd_data): New.
>   (struct cp_parser): Use it for oacc_routine member.
>   * parser.c (cp_ensure_no_oacc_routine, cp_parser_oacc_routine)
>   (cp_parser_late_parsing_oacc_routine, cp_finalize_oacc_routine):
>   Use it.  Simplify code.
>   (cp_parser_new): Initialize all members pointing to special
>   parsing data structures.
>   (cp_parser_cilk_simd_fn_vector_attrs): Initialize
>   parser->cilk_simd_fn_info->clauses.
>   (cp_parser_omp_declare_simd): Initialize
>   parser->omp_declare_simd->clauses.
>   (cp_parser_late_parsing_omp_declare_simd): Simplify code.
> ---
>  gcc/c/c-parser.c |  86 ++--
>  gcc/cp/parser.c  | 108 
> ---
>  gcc/cp/parser.h  |  21 ++-
>  3 files changed, 103 insertions(+), 112 deletions(-)
> 
> diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> index 1a50dea..cc9ee59 100644
> --- gcc/c/c-parser.c
> +++ gcc/c/c-parser.c
> @@ -1271,11 +1271,17 @@ enum c_parser_prec {
>NUM_PRECS
>  };
>  
> +/* Helper data structure for parsing #pragma acc routine.  */
> +struct oacc_routine_data {
> +  tree clauses;
> +  location_t loc;
> +};
> +
>  static void c_parser_external_declaration (c_parser *);
>  static void c_parser_asm_definition (c_parser *);
>  static void c_parser_declaration_or_fndef (c_parser *, bool, bool, bool,
>  bool, bool, tree *, vec,
> -tree = NULL_TREE);
> +struct oacc_routine_data * = NULL);
>  static void c_parser_static_assert_declaration_no_semi (c_parser *);
>  static void c_parser_static_assert_declaration (c_parser *);
>  static void c_parser_declspecs (c_parser *, struct c_declspecs *, bool, bool,
> @@ -1367,7 +1373,7 @@ static bool c_parser_omp_target (c_parser *, enum 
> pragma_context, bool *);
>  static void c_parser_omp_end_declare_target (c_parser *);
>  static void c_parser_omp_declare (c_parser *, enum pragma_context);
>  static bool c_parser_omp_ordered (c_parser *, enum pragma_context, bool *);
> -static void c_parser_oacc_routine (c_parser *parser, enum pragma_context);
> +static void c_parser_oacc_routine (c_parser *, enum pragma_context);
>  
>  /* These Objective-C parser functions are only ever called when
> compiling Objective-C.  */
> @@ -1559,7 +1565,8 @@ c_parser_external_declaration (c_parser *parser)
>  }
>  
>  static void c_finish_omp_declare_simd (c_parser *, tree, tree, vec);
> -static void c_finish_oacc_routine (c_parser *, tree, tree, bool, bool, bool);
> +static void c_finish_oacc_routine (struct oacc_routine_data *, tree, bool,
> +bool, bool);
>  
>  /* Parse a declaration or function definition (C90 6.5, 6.7.1, C99
> 6.7, 6.9.1).  If FNDEF_OK is true, a function definition is
> @@ -1638,7 +1645,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>  bool nested, bool start_attr_ok,
>  tree *objc_foreach_object_declaration,
>  vec omp_declare_simd_clauses,
> -tree oacc_routine_clauses)
> +struct oacc_routine_data *oacc_routine_data)
>  {
>struct c_declspecs *specs;
>tree prefix_attrs;
> @@ -1743,9 +1750,9 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
> pedwarn (here, 0, "empty declaration");
>   }
>c_parser_consume_token (parser);
> -  if (oacc_routine_clauses)
> - c_finish_oacc_routine (parser, NULL_TREE,
> -oacc_routine_clauses, false, true, false);
> +  if (oacc_routine_data)
> + c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false, true,
> +false);
>return;
>  }
>  
> @@ -1862,9 +1869,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
> || !vec_safe_is_empty (parser->ci

[PING] Rework C/C++ OpenACC routine parsing

2016-07-20 Thread Thomas Schwinge

Hi!

Ping.

On Wed, 13 Jul 2016 16:10:31 +0200, I wrote:
> On Wed, 13 Jul 2016 11:25:46 +0200, I wrote:
> > Working on something else regarding the C/C++ OpenACC routine directive,
> > I couldn't but untangle [...]
> 
> > (Another C/C++ OpenACC routine
> > cleanup patch is emerging, depending on this one.)
> 
> Here it is; likewise, OK for trunk?  (Further cleanup especially of C++
> OpenACC routine handling seems to be possible, but I want to synchronize
> my work at this point.)
> 
> commit 0bd30acaf4dd634499b1c695ddee555e7675aa18
> Author: Thomas Schwinge 
> Date:   Thu Jun 23 13:28:09 2016 +0200
> 
> Rework C/C++ OpenACC routine parsing
> 
>   gcc/c/
>   * c-parser.c (struct oacc_routine_data): Add error_seen and
>   fndecl_seen members.
>   (c_finish_oacc_routine): Use these.
>   (c_parser_declaration_or_fndef): Adjust.
>   (c_parser_oacc_routine): Likewise.  Support more C language
>   constructs, and improve diagnostics.  Move pragma context
>   checking...
>   (c_parser_pragma): ... here.
>   gcc/cp/
>   * parser.c (cp_ensure_no_oacc_routine): Improve diagnostics.
>   (cp_parser_late_parsing_cilk_simd_fn_info): Fix diagnostics.
>   (cp_parser_late_parsing_oacc_routine, cp_finalize_oacc_routine):
>   Simplify code, and improve diagnostics.
>   (cp_parser_oacc_routine): Likewise.  Move pragma context
>   checking...
>   (cp_parser_pragma): ... here.
>   gcc/testsuite/
>   * c-c++-common/goacc/routine-5.c: Update.
> ---
>  gcc/c/c-parser.c | 161 +++---
>  gcc/cp/parser.c  | 182 +++-
>  gcc/testsuite/c-c++-common/goacc/routine-5.c | 199 
> +++
>  3 files changed, 369 insertions(+), 173 deletions(-)
> 
> diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> index 7f84ce9..809118a 100644
> --- gcc/c/c-parser.c
> +++ gcc/c/c-parser.c
> @@ -1273,6 +1273,8 @@ enum c_parser_prec {
>  
>  /* Helper data structure for parsing #pragma acc routine.  */
>  struct oacc_routine_data {
> +  bool error_seen; /* Set if error has been reported.  */
> +  bool fndecl_seen; /* Set if one fn decl/definition has been seen already.  
> */
>tree clauses;
>location_t loc;
>  };
> @@ -1565,8 +1567,7 @@ c_parser_external_declaration (c_parser *parser)
>  }
>  
>  static void c_finish_omp_declare_simd (c_parser *, tree, tree, vec);
> -static void c_finish_oacc_routine (struct oacc_routine_data *, tree, bool,
> -bool, bool);
> +static void c_finish_oacc_routine (struct oacc_routine_data *, tree, bool);
>  
>  /* Parse a declaration or function definition (C90 6.5, 6.7.1, C99
> 6.7, 6.9.1).  If FNDEF_OK is true, a function definition is
> @@ -1751,8 +1752,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>   }
>c_parser_consume_token (parser);
>if (oacc_routine_data)
> - c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false, true,
> -false);
> + c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false);
>return;
>  }
>  
> @@ -1850,7 +1850,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>prefix_attrs = specs->attrs;
>all_prefix_attrs = prefix_attrs;
>specs->attrs = NULL_TREE;
> -  for (bool first = true;; first = false)
> +  while (true)
>  {
>struct c_declarator *declarator;
>bool dummy = false;
> @@ -1870,8 +1870,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>   c_finish_omp_declare_simd (parser, NULL_TREE, NULL_TREE,
>  omp_declare_simd_clauses);
> if (oacc_routine_data)
> - c_finish_oacc_routine (oacc_routine_data, NULL_TREE,
> -false, first, false);
> + c_finish_oacc_routine (oacc_routine_data, NULL_TREE, false);
> c_parser_skip_to_end_of_block_or_statement (parser);
> return;
>   }
> @@ -1987,8 +1986,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
> finish_init ();
>   }
> if (oacc_routine_data)
> - c_finish_oacc_routine (oacc_routine_data, d,
> -false, first, false);
> + c_finish_oacc_routine (oacc_routine_data, d, false);
> if (d != error_mark_node)
>   {
> maybe_warn_string_init (init_loc, TREE_TYPE (d), init);
> @@ -2033,8 +2031,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>   temp_pop_parm_decls ();
>   }
> if (oacc_routine_data)
> - c_finish_oacc_routine (oacc_routine_data, d,
> -false, first, false);
> + c_finish_oacc_routine (oacc_routine_data, d, false);
> if (d)
>   finish_decl (d,

[PING] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-07-20 Thread Thomas Schwinge

Hi!

Ping.

On Wed, 13 Jul 2016 12:37:07 +0200, I wrote:
> As discussed before, "offloading compilation is slow; I suppose because
> of having to invoke several tools (LTO streaming -> mkoffload -> offload
> compilers, assemblers, linkers -> combine the resulting images; but I
> have not done a detailed analysis on that)".  For this reason it is
> beneficial (that is, it is measurable in libgomp testing wall time) to
> limit offload compilation to the one (in the OpenACC case) offload target
> that we're actually going to test (that is, execute).  Another reason is
> that -foffload=-fdump-tree-[...] produces clashes (that is,
> unpredicatable outcome) in the file names of offload compilations' dump
> files' names.  Here is a patch to implement that, to specify
> -foffload=[...] during libgomp OpenACC testing.  As that has been
> challenged before:
> 
> | [...] there actually is a difference between offload_plugins and
> | offload_targets (for example, "intelmic"
> | vs. "x86_64-intelmicemul-linux-gnu"), and I'm using both variables --
> | to avoid having to translate the more specific
> | "x86_64-intelmicemul-linux-gnu" (which we required in the test harness)
> | into the less specific "intelmic" (for plugin loading) in
> | libgomp/target.c.  I can do that, so that we can continue to use just a
> | single offload_targets variable, but I consider that a less elegant
> | solution.
> 
> OK for trunk?
> 
> commit 5fdb515826769ebb36bc5c49a3ffac4d17a8a589
> Author: Thomas Schwinge 
> Date:   Wed Jul 13 11:37:16 2016 +0200
> 
> libgomp: In OpenACC testing, cycle though $offload_targets, and by 
> default only build for the offload target that we're actually going to test
> 
>   libgomp/
>   * plugin/configfrag.ac: Enumerate both offload plugins and offload
>   targets.
>   (OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
>   * target.c (gomp_target_init): Adjust to that.
>   * testsuite/lib/libgomp.exp: Likewise.
>   (offload_targets_s, offload_targets_s_openacc): Remove variables.
>   (offload_target_to_openacc_device_type): New proc.
>   (check_effective_target_openacc_nvidia_accel_selected)
>   (check_effective_target_openacc_host_selected): Examine
>   $openacc_device_type instead of $offload_target_openacc.
>   * Makefile.in: Regenerate.
>   * config.h.in: Likewise.
>   * configure: Likewise.
>   * testsuite/Makefile.in: Likewise.
>   * testsuite/libgomp.oacc-c++/c++.exp: Cycle through
>   $offload_targets (plus "disable") instead of
>   $offload_targets_s_openacc, and add "-foffload=$offload_target" to
>   tagopt.
>   * testsuite/libgomp.oacc-c/c.exp: Likewise.
>   * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> ---
>  libgomp/Makefile.in|  1 +
>  libgomp/config.h.in|  4 +-
>  libgomp/configure  | 44 +++--
>  libgomp/plugin/configfrag.ac   | 39 +++-
>  libgomp/target.c   |  8 +--
>  libgomp/testsuite/Makefile.in  |  1 +
>  libgomp/testsuite/lib/libgomp.exp  | 72 
> ++
>  libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
>  libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
>  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
>  10 files changed, 142 insertions(+), 109 deletions(-)
> 
> diff --git libgomp/Makefile.in libgomp/Makefile.in
> index 88c8517..33be8c7 100644
> --- libgomp/Makefile.in
> +++ libgomp/Makefile.in
> @@ -380,6 +380,7 @@ mkdir_p = @mkdir_p@
>  multi_basedir = @multi_basedir@
>  offload_additional_lib_paths = @offload_additional_lib_paths@
>  offload_additional_options = @offload_additional_options@
> +offload_plugins = @offload_plugins@
>  offload_targets = @offload_targets@
>  oldincludedir = @oldincludedir@
>  pdfdir = @pdfdir@
> diff --git libgomp/config.h.in libgomp/config.h.in
> index 226ac53..28f7b2d 100644
> --- libgomp/config.h.in
> +++ libgomp/config.h.in
> @@ -98,8 +98,8 @@
> */
>  #undef LT_OBJDIR
>  
> -/* Define to offload targets, separated by commas. */
> -#undef OFFLOAD_TARGETS
> +/* Define to offload plugins, separated by commas. */
> +#undef OFFLOAD_PLUGINS
>  
>  /* Name of package */
>  #undef PACKAGE
> diff --git libgomp/configure libgomp/configure
> index 8d03eb6..4baab20 100755
> --- libgomp/configure
> +++ libgomp/configure
> @@ -633,6 +633,8 @@ PLUGIN_NVPTX_FALSE
>  PLUGIN_NVPTX_TRUE
>  offload_additional_lib_paths
>  offload_additional_options
> +offload_targets
> +offload_plugins
>  PLUGIN_HSA_LIBS
>  PLUGIN_HSA_LDFLAGS
>  PLUGIN_HSA_CPPFLAGS
> @@ -646,7 +648,6 @@ PLUGIN_NVPTX_CPPFLAGS
>  PLUGIN_NVPTX
>  CUDA_DRIVER_LIB
>  CUDA_DRIVER_INCLUDE
> -offload_targets
>  libtool_VERSION
>  ac_ct_FC
>  FCFLAGS
> @@ -11145,7 +11146,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_

Re: [PATCH] Fix unsafe function attributes for special functions (PR 71876)

2016-07-20 Thread Richard Biener

On Wed, 20 Jul 2016, Bernd Edlinger wrote:

> On 07/20/16 12:46, Richard Biener wrote:
> > On Wed, 20 Jul 2016, Richard Biener wrote:
> >
> >> On Tue, 19 Jul 2016, Bernd Edlinger wrote:
> >>
> >>> Hi!
> >>>
> >>> As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71876,
> >>> we have a _very_ old hack in gcc, that recognizes certain functions by
> >>> name, and inserts in some cases unsafe attributes, that don't work for
> >>> a freestanding environment.
> >>>
> >>> It is unsafe to return ECF_MAY_BE_ALLOCA, ECF_LEAF and ECF_NORETURN
> >>> from special_function_p, just by the name of the function, especially
> >>> for less well known functions, like "getcontext" or "savectx", which
> >>> could easily used for something completely different.
> >>
> >> Returning ECF_MAY_BE_ALLOCA is safe.  Just wanted to mention this,
> >> regardless of the followups you already received.
> >
> > Oh, and maybe you can factor out the less controversical parts,
> > namely ignoring the __builtin_ prefix.  I don't think that
> > calling __builtin_setjmp in an environment where setjmp is not a
> > builtin should beave like setjmp (it will call a function named
> > '__builtin_setjmp').
> 
> 
> I wonder how I manage to dig out such contriversical things ;)
> 
> But you are right, that would at least be a start.
> 
> So this patch is what you requested:
> 
> Remove the handling of the __builtin_ prefix from special_function_p
> and add the returns_twice attribute to the __builtin_setjmp declaration
> instead.
> 
> Is it OK after boot-strap and regression-testing?

I think the __builtin_setjmp change is wrong - __builtin_setjmp is
_not_ 'setjmp' it is part of the GCC internal machinery (using setjmp
and longjmp in the end) for SJLJ exception handing.

Am I correct Eric?

Thanks,
Richard.

> 
> Thanks
> Bernd.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH] S/390: Fix pr67443.c.

2016-07-20 Thread Dominik Vogt

The attached patch rewrites the pr67443.c testcase in a different
way so that the test still works with the changed allocation of
globals pinned to registers.  The test ist hopefully more robust
now.  The test ist hopefully more robust now.  Tested on s390 and s390x biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog

* gcc.target/s390/pr67443.c: Fix test case.
>From fe5dd36da6cea172a5cebdbc33a8a60cb5e0e9ad Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 20 Jul 2016 12:50:52 +0100
Subject: [PATCH] S/390: Fix pr67443.c.

---
 gcc/testsuite/gcc.target/s390/pr67443.c | 42 ++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/pr67443.c 
b/gcc/testsuite/gcc.target/s390/pr67443.c
index e011a11..771b56f 100644
--- a/gcc/testsuite/gcc.target/s390/pr67443.c
+++ b/gcc/testsuite/gcc.target/s390/pr67443.c
@@ -2,21 +2,10 @@
 
 /* { dg-do run { target s390*-*-* } } */
 /* { dg-prune-output "call-clobbered register used for global register 
variable" } */
-/* { dg-options "-march=z900 -fPIC -fomit-frame-pointer -O3" } */
+/* { dg-options "-march=z900 -fPIC -fomit-frame-pointer -O3 -save-temps" } */
 
 #include 
 
-/* Block all registers except the first three argument registers.  */
-register long r0 asm ("r0");
-register long r1 asm ("r1");
-register long r5 asm ("r5");
-register long r6 asm ("r6");
-register long r7 asm ("r7");
-register long r8 asm ("r8");
-register long r9 asm ("r9");
-register long r10 asm ("r10");
-register long r11 asm ("r11");
-
 struct s_t
 {
   unsigned f1 : 8;
@@ -24,25 +13,40 @@ struct s_t
 };
 
 __attribute__ ((noinline))
-void foo (struct s_t *ps, int c, int i)
+int bar ()
 {
+  return 0;
+}
+
+__attribute__ ((noinline))
+void foo (struct s_t *ps, int c)
+{
+  int tmp;
+
   /* Uses r2 as address register.  */
   ps->f1 = c;
-  /* The calculation of the value is so expensive that it's cheaper to spill ps
- to the stack and reload it later (into a different register).
- ==> Uses r4 as address register.*/
-  ps->f2 = i + i % 3;
+  /* Clobber all registers that r2 could be stored into.  */
+  __asm__ __volatile__ ("" : : : "memory",
+   "r0","r1","r6","r7","r8","r9","r10","r11");
+  /* Force that the pointer is evicted from r2 and stored on the stack.  */
+  tmp = bar ();
+  /* User the pointer again.  It gets reloaded to a different register because
+ r2 is already occupied.  */
+  ps->f2 = tmp;
   /* If dead store elimination fails to detect that the address in r2 during
- the first assignment is an alias of the address in r4 during the second
+ the first assignment is an alias of the address in rX during the second
  assignment, it eliminates the first assignment and the f1 field is not
  written (bug).  */
 }
+/* Make sure that r2 is used only once as an address register for storing.
+   If this check fails, the test case needs to be fixed.
+   { dg-final { scan-assembler-times "\tst.\?\t.*,0\\(%r2\\)" 1 } } */
 
 int main (void)
 {
   struct s_t s = { 0x01u, 0x020304u };
 
-  foo (&s, 0, 0);
+  foo (&s, 0);
   assert (s.f1 == 0&& s.f2 == 0);
 
   return 0;
-- 
2.3.0

Re: [PING 3, PATCH] Remove xfail from thread_local-order2.C.

2016-07-20 Thread Dominik Vogt

On Mon, Jun 20, 2016 at 02:41:21PM +0100, Dominik Vogt wrote:
> Patch:
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01587.html
> 
> On Wed, Jan 27, 2016 at 10:39:44AM +0100, Dominik Vogt wrote:
> > g++.dg/tls/thread_local-order2.C no longer fail with Glibc-2.18 or
> > newer since this commit:
> > 
> >   2014-08-01  Zifei Tong  
> > 
> > * libsupc++/atexit_thread.cc (HAVE___CXA_THREAD_ATEXIT_IMPL): 
> > Add
> > _GLIBCXX_ prefix to macro.
> > 
> >   git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@213504 
> > 138bc75d-0d04-0410-96
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2014-07/msg02091.html
> > 
> > So, is it time to remove the xfail from the test case?

> > gcc/testsuite/ChangeLog
> > 
> > * g++.dg/tls/thread_local-order2.C: Remove xfail.
> 
> > >From 0b0abbd2e6d9d8b6857622065bdcbdde31b5ddb0 Mon Sep 17 00:00:00 2001
> > From: Dominik Vogt 
> > Date: Wed, 27 Jan 2016 09:54:07 +0100
> > Subject: [PATCH] Remove xfail from thread_local-order2.C.
> > 
> > This should work with Glibc-2.18 or newer.
> > ---
> >  gcc/testsuite/g++.dg/tls/thread_local-order2.C | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C 
> > b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
> > index f8df917..d3351e6 100644
> > --- a/gcc/testsuite/g++.dg/tls/thread_local-order2.C
> > +++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
> > @@ -2,7 +2,6 @@
> >  // that isn't reverse order of construction.  We need to move
> >  // __cxa_thread_atexit into glibc to get this right.
> >  
> > -// { dg-do run { xfail *-*-* } }
> >  // { dg-require-effective-target c++11 }
> >  // { dg-add-options tls }
> >  // { dg-require-effective-target tls_runtime }
> > -- 
> > 2.3.0

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH] Fix unsafe function attributes for special functions (PR 71876)

2016-07-20 Thread Bernd Edlinger

On 07/20/16 13:53, Richard Biener wrote:
> On Wed, 20 Jul 2016, Bernd Edlinger wrote:
>
>> On 07/20/16 12:46, Richard Biener wrote:
>>> On Wed, 20 Jul 2016, Richard Biener wrote:
>>>
 On Tue, 19 Jul 2016, Bernd Edlinger wrote:

> Hi!
>
> As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71876,
> we have a _very_ old hack in gcc, that recognizes certain functions by
> name, and inserts in some cases unsafe attributes, that don't work for
> a freestanding environment.
>
> It is unsafe to return ECF_MAY_BE_ALLOCA, ECF_LEAF and ECF_NORETURN
> from special_function_p, just by the name of the function, especially
> for less well known functions, like "getcontext" or "savectx", which
> could easily used for something completely different.

 Returning ECF_MAY_BE_ALLOCA is safe.  Just wanted to mention this,
 regardless of the followups you already received.
>>>
>>> Oh, and maybe you can factor out the less controversical parts,
>>> namely ignoring the __builtin_ prefix.  I don't think that
>>> calling __builtin_setjmp in an environment where setjmp is not a
>>> builtin should beave like setjmp (it will call a function named
>>> '__builtin_setjmp').
>>
>>
>> I wonder how I manage to dig out such contriversical things ;)
>>
>> But you are right, that would at least be a start.
>>
>> So this patch is what you requested:
>>
>> Remove the handling of the __builtin_ prefix from special_function_p
>> and add the returns_twice attribute to the __builtin_setjmp declaration
>> instead.
>>
>> Is it OK after boot-strap and regression-testing?
>
> I think the __builtin_setjmp change is wrong - __builtin_setjmp is
> _not_ 'setjmp' it is part of the GCC internal machinery (using setjmp
> and longjmp in the end) for SJLJ exception handing.
>

I do think that part is correct.

DEF_GCC_BUILTIN  adds the __builtin_ prefix, but does not overload the
standard setjmp function.

/* A GCC builtin (like __builtin_saveregs) is provided by the
compiler, but does not correspond to a function in the standard
library.  */
#undef DEF_GCC_BUILTIN
#define DEF_GCC_BUILTIN(ENUM, NAME, TYPE, ATTRS)\
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, BT_LAST, \
false, false, false, ATTRS, true, true)

And to define a builtin without __builtin_, there is a DEF_SYNC_BUILTIN
macro.  And DEF_LIB_BUILTIN defines both ways.

However, there is no simple way to define setjmp and __builtin_setjmp
to different builtin functions.


Thanks
Bernd.


> Am I correct Eric?
>
> Thanks,
> Richard.
>
>>
>> Thanks
>> Bernd.
>>
>

Re: Fix PR44281 (bad RA with global regs)

2016-07-20 Thread Dominik Vogt

On Thu, Jul 14, 2016 at 10:24:38AM +0100, Dominik Vogt wrote:
> On Wed, Jul 13, 2016 at 07:43:13PM +0200, Bernd Schmidt wrote:
> > On 07/13/2016 05:29 PM, Dominik Vogt wrote:
> > 
> > >Unfortunately this patch (or whatever got actually committed) has
> > >broken the gcc.target/s390/pr679443.c test case, which is a bit
> > >fishy (see code snippet below).  I assign most registers to global
> > >variables and then use some complicated arithmetics with the goal
> > >that the pointer stored in the first argument gets saved on the
> > >stack and reloaded to a different register.  Before this patch the
> > >test case just needed three registers to do its work (r2, r3, r4).
> > >With the patch it currently causes an error in the reload pass
> > >
> > >  error: unable to find a register to spill
> > 
> > Might be useful to see the dump_reload output.
> 
> Attached.
> 
> > >If a fourth register is available, the ICE goes away, but the
> > >pointer remains in r2, rendering the test case useless.
> > 
> > I don't think I quite understand what you're trying to do here,
> 
> Alias detection of the memory pointed to by the first register.
> There was some hard to trigger bug where writing a bitfield in a
> struct would also overwrite the unselected bits of the
> corresponding word.  See here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

I've made a patch for the testcase:
  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01232.html
That fixes the problem for s390/s390x, but I cannot tell wether
the patch to global register variable allocation has a problem or
not.  If you need any more information just give me a shout.
Otherwise I'll not track this issue any further.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH]: Use HOST_WIDE_INT_{,M}1{,U} some more

2016-07-20 Thread Uros Bizjak

2016-07-19 14:46 GMT+02:00 Uros Bizjak :
> The result of exercises with sed in gcc/ directory.

Some more conversions:

2016-07-20  Uros Bizjak  

* cse.c: Use HOST_WIDE_INT_M1 instead of ~(HOST_WIDE_INT) 0.
* combine.c: Use HOST_WIDE_INT_M1U instead of
~(unsigned HOST_WIDE_INT) 0.
* double-int.h: Ditto.
* dse.c: Ditto.
* dwarf2asm.c:Ditto.
* expmed.c: Ditto.
* genmodes.c: Ditto.
* match.pd: Ditto.
* read-rtl.c: Ditto.
* tree-ssa-loop-ivopts.c: Ditto.
* tree-ssa-loop-prefetch.c: Ditto.
* tree-vect-generic.c: Ditto.
* tree-vect-patterns.c: Ditto.
* tree.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
diff --git a/gcc/combine.c b/gcc/combine.c
index 1e5ee8e..1becc3c 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1660,7 +1660,7 @@ update_rsp_from_reg_equal (reg_stat_type *rsp, rtx_insn 
*insn, const_rtx set,
 }
 
   /* Don't call nonzero_bits if it cannot change anything.  */
-  if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0)
+  if (rsp->nonzero_bits != HOST_WIDE_INT_M1U)
 {
   bits = nonzero_bits (src, nonzero_bits_mode);
   if (reg_equal && bits)
@@ -6541,7 +6541,7 @@ simplify_set (rtx x)
 
   if (GET_MODE_CLASS (mode) == MODE_INT && HWI_COMPUTABLE_MODE_P (mode))
 {
-  src = force_to_mode (src, mode, ~(unsigned HOST_WIDE_INT) 0, 0);
+  src = force_to_mode (src, mode, HOST_WIDE_INT_M1U, 0);
   SUBST (SET_SRC (x), src);
 }
 
@@ -7446,7 +7446,7 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   else
new_rtx = force_to_mode (inner, tmode,
 len >= HOST_BITS_PER_WIDE_INT
-? ~(unsigned HOST_WIDE_INT) 0
+? HOST_WIDE_INT_M1U
 : (HOST_WIDE_INT_1U << len) - 1,
 0);
 
@@ -7635,7 +7635,7 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   inner = force_to_mode (inner, wanted_inner_mode,
 pos_rtx
 || len + orig_pos >= HOST_BITS_PER_WIDE_INT
-? ~(unsigned HOST_WIDE_INT) 0
+? HOST_WIDE_INT_M1U
 : (((HOST_WIDE_INT_1U << len) - 1)
<< orig_pos),
 0);
@@ -8110,7 +8110,7 @@ make_compound_operation (rtx x, enum rtx_code in_code)
&& subreg_lowpart_p (x))
  {
rtx newer
- = force_to_mode (tem, mode, ~(unsigned HOST_WIDE_INT) 0, 0);
+ = force_to_mode (tem, mode, HOST_WIDE_INT_M1U, 0);
 
/* If we have something other than a SUBREG, we might have
   done an expansion, so rerun ourselves.  */
@@ -8390,7 +8390,7 @@ force_to_mode (rtx x, machine_mode mode, unsigned 
HOST_WIDE_INT mask,
  do not know, we need to assume that all bits up to the highest-order
  bit in MASK will be needed.  This is how we form such a mask.  */
   if (mask & (HOST_WIDE_INT_1U << (HOST_BITS_PER_WIDE_INT - 1)))
-fuller_mask = ~(unsigned HOST_WIDE_INT) 0;
+fuller_mask = HOST_WIDE_INT_M1U;
   else
 fuller_mask = ((HOST_WIDE_INT_1U << (floor_log2 (mask) + 1))
   - 1);
@@ -8733,7 +8733,7 @@ force_to_mode (rtx x, machine_mode mode, unsigned 
HOST_WIDE_INT mask,
 
  if (GET_MODE_PRECISION (GET_MODE (x)) > HOST_BITS_PER_WIDE_INT)
{
- nonzero = ~(unsigned HOST_WIDE_INT) 0;
+ nonzero = HOST_WIDE_INT_M1U;
 
  /* GET_MODE_PRECISION (GET_MODE (x)) - INTVAL (XEXP (x, 1))
 is the number of bits a full-width mask would have set.
@@ -9496,7 +9496,7 @@ make_field_assignment (rtx x)
   dest);
   src = force_to_mode (src, mode,
   GET_MODE_PRECISION (mode) >= HOST_BITS_PER_WIDE_INT
-  ? ~(unsigned HOST_WIDE_INT) 0
+  ? HOST_WIDE_INT_M1U
   : (HOST_WIDE_INT_1U << len) - 1,
   0);
 
diff --git a/gcc/cse.c b/gcc/cse.c
index 6a5ccb5..61d2d7e 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4565,7 +4565,7 @@ cse_insn (rtx_insn *insn)
  else
shift = INTVAL (pos);
  if (INTVAL (width) == HOST_BITS_PER_WIDE_INT)
-   mask = ~(HOST_WIDE_INT) 0;
+   mask = HOST_WIDE_INT_M1;
  else
mask = (HOST_WIDE_INT_1 << INTVAL (width)) - 1;
  val = (val >> shift) & mask;
@@ -5233,7 +5233,7 @@ cse_insn (rtx_insn *insn)
  else
shift = INTVAL (pos);
  if (INTVAL (width) == HOST_BITS_PER_WIDE_INT)
-   mask = ~(HOST_WIDE_INT) 0;
+   mask = HOST_WIDE_INT_M1;
  else
mask = (HOST_WIDE_INT_1 << INTVAL (width)) - 1;
  val &= ~(mask << shift);
diff --git a/gcc/

Re: [PATCH] nvptx: do not implicitly enable -ftoplevel-reorder

2016-07-20 Thread Nathan Sidwell


On 07/19/16 14:34, Alexander Monakov wrote:

Hi,

I've recently committed a middle-end patch that adds handling of undefined
variables (that the nvptx backend needs) under -fno-toplevel-reorder (svn rev.
238371).  With that change, it's no longer necessary to implicitly enable
-ftoplevel-reorder in the backend, and the following patch removes that.

Tested with nvptx-none-run, OK for trunk?


ok thanks

Re: [PATCH, PR59833]: Fix sNaN handling in ARM float to double conversion

2016-07-20 Thread Aurelien Jarno

On 2016-07-20 11:22, Ramana Radhakrishnan wrote:
> On Wed, Jul 20, 2016 at 11:14 AM, Aurelien Jarno  wrote:
> > On 2016-07-20 11:04, Ramana Radhakrishnan wrote:
> >> On Wed, Jul 20, 2016 at 10:56 AM, Aurelien Jarno  
> >> wrote:
> >> > On 2016-07-20 10:10, Ramana Radhakrishnan wrote:
> >> >> On Wed, Jul 20, 2016 at 8:48 AM, Aurelien Jarno  
> >> >> wrote:
> >> >> > On ARM soft-float, the float to double conversion doesn't convert a 
> >> >> > sNaN
> >> >> > to qNaN as the IEEE Std 754 standard mandates:
> >> >> >
> >> >> > "Under default exception handling, any operation signaling an invalid
> >> >> > operation exception and for which a floating-point result is to be
> >> >> > delivered shall deliver a quiet NaN."
> >> >> >
> >> >> > Given the soft float ARM code ignores exceptions and always provides a
> >> >> > result, a float to double conversion of a signaling NaN should return 
> >> >> > a
> >> >> > quiet NaN. Fix this in extendsfdf2.
> >> >> >
> >> >> > gcc/ChangeLog:
> >> >> >
> >> >> > PR target/59833
> >> >> > * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
> >> >> >
> >> >> > gcc/testsuite/ChangeLog:
> >> >> >
> >> >> > * gcc.dg/pr59833.c: New testcase.
> >> >>
> >> >>
> >> >> Ok - assuming this was tested appropriately with no regressions.
> >> >
> >> > Given it only touches arm code, I only tested it on arm and I have seen
> >> > no regression. That said I wouldn't be surprised if the new testcase
> >> > fails on some other architectures.
> >>
> >> I was assuming you tested it on ARM :)  In this case given the change
> >> is only in the backend I would have expected this patch to have been
> >> tested for soft-float ARM or an appropriate multilib. Saying what
> >> configuration the patch was tested on is useful for the audit trail.
> >> For e.g. it's no use testing this patch on armhf ( i.e.
> >> --with-float=hard --with-fpu=vfpv3/neon --with-arch=armv7-a) as by
> >> default the test would never generate the call to the library function
> >> but I'm sure you know all that anyway.
> >
> > Indeed I should have given more details. I tested it on a Debian armel
> > machine, and I configured GCC the same way as the Debian package, that
> > is using --with-arch=armv4t --with-float=soft.
> >
> > I built it once with the new test but without the fix and a second time
> > with both the test and the fix. I have verified that the test fails in
> > the first case and pass in the second case.
> 
> Thanks for the info - what about all the other regression tests ? Did
> you do a full make check and ensure that no other tests regressed in
> comparison ?  Patches need to be tested against the entire regression
> testsuite and not just what was added.

Yes, I compared the testsuite result between the two runs, and there are
identical beside this new test (hence my "I have seen no regression" in
my first answer).

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: fold x ^ y to 0 if x == y

2016-07-20 Thread Prathamesh Kulkarni

On 8 July 2016 at 12:29, Richard Biener  wrote:
> On Fri, 8 Jul 2016, Richard Biener wrote:
>
>> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
>>
>> > Hi Richard,
>> > For the following test-case:
>> >
>> > int f(int x, int y)
>> > {
>> >int ret;
>> >
>> >if (x == y)
>> >  ret = x ^ y;
>> >else
>> >  ret = 1;
>> >
>> >return ret;
>> > }
>> >
>> > I was wondering if x ^ y should be folded to 0 since
>> > it's guarded by condition x == y ?
>> >
>> > optimized dump shows:
>> > f (int x, int y)
>> > {
>> >   int iftmp.0_1;
>> >   int iftmp.0_4;
>> >
>> >   :
>> >   if (x_2(D) == y_3(D))
>> > goto ;
>> >   else
>> > goto ;
>> >
>> >   :
>> >   iftmp.0_4 = x_2(D) ^ y_3(D);
>> >
>> >   :
>> >   # iftmp.0_1 = PHI 
>> >   return iftmp.0_1;
>> >
>> > }
>> >
>> > The attached patch tries to fold for above case.
>> > I am checking if op0 and op1 are equal using:
>> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
>> >&& operand_equal_p (vr1->min, vr1->max)
>> >&& operand_equal_p (vr2->min, vr2->max))
>> >   { /* equal /* }
>> >
>> > I suppose intersection would check if op0 and op1 have equivalent ranges,
>> > and added operand_equal_p check to ensure that there is only one
>> > element within the range. Does that look correct ?
>> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>>
>> I think VRP is the wrong place to catch this and DOM should have but it
>> does
>>
>> Optimizing block #3
>>
>> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
>> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
>> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
>> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
>> 0>>> COPY x_2(D) = y_3(D)
>> 0>>> COPY y_3(D) = x_2(D)
>> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>>   Replaced 'x_2(D)' with variable 'y_3(D)'
>>   Replaced 'y_3(D)' with variable 'x_2(D)'
>>   Folded to: ret_4 = x_2(D) ^ y_3(D);
>> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
>>
>> heh, registering both reqivalencies is obviously not going to help...
>>
>> The 2nd equivalence is from doing
>>
>>   /* We already recorded that LHS = RHS, with canonicalization,
>>  value chain following, etc.
>>
>>  We also want to record RHS = LHS, but without any
>> canonicalization
>>  or value chain following.  */
>>   if (TREE_CODE (rhs) == SSA_NAME)
>> const_and_copies->record_const_or_copy_raw (rhs, lhs,
>> SSA_NAME_VALUE (rhs));
>>
>> generally recording both is not helpful.  Jeff?  This seems to be
>> r233207 (fix for PR65917) which must have regressed this testcase.
>
> Just verified it works fine on the GCC 5 branch:
>
> Optimizing block #3
>
> 0>>> COPY y_3(D) = x_2(D)
> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>   Replaced 'y_3(D)' with variable 'x_2(D)'
> Applying pattern match.pd:240, gimple-match.c:11346
> gimple_simplified to ret_4 = 0;
>   Folded to: ret_4 = 0;
I have reported it as PR71947.
Could you help me point out how to fix this ?

Thanks,
Prathamesh
>
> Richard.

[patch,avr] More insns to handle (inverted) bit-bit moves

2016-07-20 Thread Georg-Johann Lay

This adds some insns that set a destination bit expressed as zero_extract to a 
source bit expressed as extract, right shift, and simple combinations thereof.


Purpose is smaller code and to avoid costly extracts or shifts.  This applies 
mostly to bitfields; for open-coded bit insertions the patterns that I'm seeing 
are sometimes too complicated, i.e. IOR of AND and SHIFTRT and XOR and SUBREGs 
and all sorts of arithmetic that are not canonicalized in any way by the middle 
end (insn combiner, if conversion, ...)


Ok for trunk?

Johann

* config/avr/avr.md (any_extract, any_shiftrt): New code iterators.
(*insv.extract, *insv.shiftrt, *insv.not-bit.0, *insv.not-bit.7)
(*insv.xor-extract, *insv.xor1-bit.0): New insns.
(adjust_len) [insv_notbit, insv_notbit_0, insv_notbit_7]: New
values for insn attribute.
* config/avr/avr.c (avr_out_insert_notbit): New function.
(avr_adjust_insn_length): Handle ADJUST_LEN_INSV_NOTBIT,
ADJUST_LEN_INSV_NOTBIT_0/_7.
* config/avr/avr-protos.h (avr_out_insert_notbit): New proto.

Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 238425)
+++ config/avr/avr-protos.h	(working copy)
@@ -57,6 +57,7 @@ extern const char *avr_out_compare64 (rt
 extern const char *ret_cond_branch (rtx x, int len, int reverse);
 extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*);
 extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*);
+extern const char *avr_out_insert_notbit (rtx_insn *, rtx*, rtx, int*);
 
 extern const char *ashlqi3_out (rtx_insn *insn, rtx operands[], int *len);
 extern const char *ashlhi3_out (rtx_insn *insn, rtx operands[], int *len);
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 238425)
+++ config/avr/avr.c	(working copy)
@@ -7928,6 +7928,76 @@ avr_out_addto_sp (rtx *op, int *plen)
 }
 
 
+/* Output instructions to insert an inverted bit into OPERANDS[0]:
+   $0.$1 = ~$2.$3  if XBITNO = NULL
+   $0.$1 = ~$2.XBITNO  if XBITNO != NULL.
+   If PLEN = NULL then output the respective instruction sequence which
+   is a combination of BST / BLD and some instruction(s) to invert the bit.
+   If PLEN != NULL then store the length of the sequence (in words) in *PLEN.
+   Return "".  */
+
+const char*
+avr_out_insert_notbit (rtx_insn *insn, rtx operands[], rtx xbitno, int *plen)
+{
+  rtx op[4] = { operands[0], operands[1], operands[2],
+xbitno == NULL_RTX ? operands [3] : xbitno };
+
+  if (INTVAL (op[1]) == 7
+  && test_hard_reg_class (LD_REGS, op[0]))
+{
+  /* If the inserted bit number is 7 and we have a d-reg, then invert
+ the bit after the insertion by means of SUBI *,0x80.  */
+
+  if (INTVAL (op[3]) == 7
+  && REGNO (op[0]) == REGNO (op[2]))
+{
+  avr_asm_len ("subi %0,0x80", op, plen, -1);
+}
+  else
+{
+  avr_asm_len ("bst %2,%3" CR_TAB
+   "bld %0,%1" CR_TAB
+   "subi %0,0x80", op, plen, -3);
+}
+}
+  else if (test_hard_reg_class (LD_REGS, op[0])
+   && (INTVAL (op[1]) != INTVAL (op[3])
+   || !reg_overlap_mentioned_p (op[0], op[2])))
+{
+  /* If the destination bit is in a d-reg we can jump depending
+ on the source bit and use ANDI / ORI.  This just applies if we
+ have not an early-clobber situation with the bit.  */
+
+  avr_asm_len ("andi %0,~(1<<%1)" CR_TAB
+   "sbrs %2,%3"   CR_TAB
+   "ori %0,1<<%1", op, plen, -3);
+}
+  else
+{
+  /* Otherwise, invert the bit by means of COM before we store it with
+ BST and then undo the COM if needed.  */
+
+  avr_asm_len ("com %2" CR_TAB
+   "bst %2,%3", op, plen, -2);
+
+  if (!reg_unused_after (insn, op[2])
+  // A simple 'reg_unused_after' is not enough because that function
+  // assumes that the destination register is overwritten completely
+  // and hence is in order for our purpose.  This is not the case
+  // with BLD which just changes one bit of the destination.
+  || reg_overlap_mentioned_p (op[0], op[2]))
+{
+  /* Undo the COM from above.  */
+  avr_asm_len ("com %2", op, plen, 1);
+}
+
+  avr_asm_len ("bld %0,%1", op, plen, 1);
+}
+  
+  return "";
+}
+
+
 /* Outputs instructions needed for fixed point type conversion.
This includes converting between any fixed point type, as well
as converting to any integer type.  Conversion between integer
@@ -8765,6 +8835,16 @@ avr_adjust_insn_length (rtx_insn *insn,
 
 case ADJUST_LEN_INSERT_BITS: avr_out_insert_bits (op, &len); break;
 
+case ADJUST_LEN_INSV_NOTBIT:
+  avr_out_insert_notbit (insn, op, NULL_RTX, &len);
+  break;
+case ADJUST

[AArch64][1/3] Migrate aarch64_add_constant to new interface & kill aarch64_build_constant

2016-07-20 Thread Jiong Wang


Currently aarch64_add_constant is using aarch64_build_constant to move
an immediate into the destination register.

It has considered the following situations:

  * immediate can fit into bitmask pattern that only needs single
instruction.
  * immediate can fit into single movz/movn.
  * immediate needs single movz/movn, and multiply movk.


Actually we have another constant building helper function
"aarch64_internal_mov_immediate" which cover all these situations and
more.

This patch thus migrate aarch64_add_constant to
aarch64_internal_mov_immediate so that we can kill the old
aarch64_build_constant.

OK for trunk?

gcc/
2016-07-20  Jiong Wang  

* config/aarch64/aarch64.c (aarch64_add_constant): New
parameter "mode".  Use aarch64_internal_mov_immediate
instead of aarch64_build_constant.
(aarch64_build_constant): Delete.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 512ef10d158d2eaa1384d28c43b9a8f90387099d..aeea3b3ebc514663043ac8d7cd13361f06f78502 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3337,98 +3337,20 @@ aarch64_final_eh_return_addr (void)
    - 2 * UNITS_PER_WORD));
 }
 
-/* Possibly output code to build up a constant in a register.  For
-   the benefit of the costs infrastructure, returns the number of
-   instructions which would be emitted.  GENERATE inhibits or
-   enables code generation.  */
-
-static int
-aarch64_build_constant (int regnum, HOST_WIDE_INT val, bool generate)
-{
-  int insns = 0;
-
-  if (aarch64_bitmask_imm (val, DImode))
-{
-  if (generate)
-	emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val));
-  insns = 1;
-}
-  else
-{
-  int i;
-  int ncount = 0;
-  int zcount = 0;
-  HOST_WIDE_INT valp = val >> 16;
-  HOST_WIDE_INT valm;
-  HOST_WIDE_INT tval;
-
-  for (i = 16; i < 64; i += 16)
-	{
-	  valm = (valp & 0x);
-
-	  if (valm != 0)
-	++ zcount;
-
-	  if (valm != 0x)
-	++ ncount;
-
-	  valp >>= 16;
-	}
-
-  /* zcount contains the number of additional MOVK instructions
-	 required if the constant is built up with an initial MOVZ instruction,
-	 while ncount is the number of MOVK instructions required if starting
-	 with a MOVN instruction.  Choose the sequence that yields the fewest
-	 number of instructions, preferring MOVZ instructions when they are both
-	 the same.  */
-  if (ncount < zcount)
-	{
-	  if (generate)
-	emit_move_insn (gen_rtx_REG (Pmode, regnum),
-			GEN_INT (val | ~(HOST_WIDE_INT) 0x));
-	  tval = 0x;
-	  insns++;
-	}
-  else
-	{
-	  if (generate)
-	emit_move_insn (gen_rtx_REG (Pmode, regnum),
-			GEN_INT (val & 0x));
-	  tval = 0;
-	  insns++;
-	}
-
-  val >>= 16;
-
-  for (i = 16; i < 64; i += 16)
-	{
-	  if ((val & 0x) != tval)
-	{
-	  if (generate)
-		emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum),
-	   GEN_INT (i),
-	   GEN_INT (val & 0x)));
-	  insns++;
-	}
-	  val >>= 16;
-	}
-}
-  return insns;
-}
-
 static void
-aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
+aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
+		  HOST_WIDE_INT delta)
 {
   HOST_WIDE_INT mdelta = delta;
-  rtx this_rtx = gen_rtx_REG (Pmode, regnum);
-  rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg);
+  rtx this_rtx = gen_rtx_REG (mode, regnum);
+  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
 
   if (mdelta < 0)
 mdelta = -mdelta;
 
   if (mdelta >= 4096 * 4096)
 {
-  (void) aarch64_build_constant (scratchreg, delta, true);
+  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
   emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
 }
   else if (mdelta > 0)
@@ -3436,19 +3358,19 @@ aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
   if (mdelta >= 4096)
 	{
 	  emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
-	  rtx shift = gen_rtx_ASHIFT (Pmode, scratch_rtx, GEN_INT (12));
+	  rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12));
 	  if (delta < 0)
 	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_MINUS (Pmode, this_rtx, shift)));
+gen_rtx_MINUS (mode, this_rtx, shift)));
 	  else
 	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_PLUS (Pmode, this_rtx, shift)));
+gen_rtx_PLUS (mode, this_rtx, shift)));
 	}
   if (mdelta % 4096 != 0)
 	{
 	  scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096));
 	  emit_insn (gen_rtx_SET (this_rtx,
-  gen_rtx_PLUS (Pmode, this_rtx, scratch_rtx)));
+  gen_rtx_PLUS (mode, this_rtx, scratch_rtx)));
 	}
 }
 }
@@ -3473,7 +3395,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
   emit_note (NOTE_INSN_PROLOGUE_END);
 
   if (vcall_offset == 0)
-aarch64_add_constant (this_regno, IP1_REGNUM, delta);
+aarch64_add_constant (Pmode

[AArch64][2/3] Optimize aarch64_add_constant to generate better addition sequences

2016-07-20 Thread Jiong Wang


This patch optimize immediate addition sequences generated by
aarch64_add_constant.

The current addition sequences generated are:

  * If immediate fit into unsigned 12bit range, generate single add/sub.

  * Otherwise if it fit into unsigned 24bit range, generate two add/sub.


  * Otherwise invoke general constant build function.


This haven't considered the situation where immedate can't fit into
unsigned 12bit range, but can fit into single mov instruction for which
case we generate one move and one addition.  The move won't touch the
destination register thus the sequences is better than two additions
which both touch the destination register.


This patch thus optimize the addition sequences into:

  * If immediate fit into unsigned 12bit range, generate single add/sub.
 
  * Otherwise if it fit into unsigned 24bit range, generate two add/sub.

And don't do this if it fit into single move instruction, in which case
move the immedaite to scratch register firstly, then generate one
addition to add the scratch register to the destination register.

  * Otherwise invoke general constant build function.



OK for trunk?

gcc/
2016-07-20  Jiong Wang  

* config/aarch64/aarch64.c (aarch64_add_constant): Optimize
instruction sequences.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aeea3b3ebc514663043ac8d7cd13361f06f78502..41844a101247c939ecb31f8a8c17cf79759255aa 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1865,6 +1865,47 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   aarch64_internal_mov_immediate (dest, imm, true, GET_MODE (dest));
 }
 
+/* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if
+   it is necessary.  */
+
+static void
+aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
+		  HOST_WIDE_INT delta)
+{
+  HOST_WIDE_INT mdelta = abs_hwi (delta);
+  rtx this_rtx = gen_rtx_REG (mode, regnum);
+
+  /* Do nothing if mdelta is zero.  */
+  if (!mdelta)
+return;
+
+  /* We only need single instruction if the offset fit into add/sub.  */
+  if (aarch64_uimm12_shift (mdelta))
+{
+  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  return;
+}
+
+  /* We need two add/sub instructions, each one perform part of the
+ addition/subtraction, but don't this if the addend can be loaded into
+ register by single instruction, in that case we prefer a move to scratch
+ register following by addition.  */
+  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
+{
+  HOST_WIDE_INT low_off = mdelta & 0xfff;
+
+  low_off = delta < 0 ? -low_off : low_off;
+  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  return;
+}
+
+  /* Otherwise use generic function to handle all other situations.  */
+  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
+  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
+  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+}
+
 static bool
 aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
  tree exp ATTRIBUTE_UNUSED)
@@ -3337,44 +3378,6 @@ aarch64_final_eh_return_addr (void)
    - 2 * UNITS_PER_WORD));
 }
 
-static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
-		  HOST_WIDE_INT delta)
-{
-  HOST_WIDE_INT mdelta = delta;
-  rtx this_rtx = gen_rtx_REG (mode, regnum);
-  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
-
-  if (mdelta < 0)
-mdelta = -mdelta;
-
-  if (mdelta >= 4096 * 4096)
-{
-  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
-}
-  else if (mdelta > 0)
-{
-  if (mdelta >= 4096)
-	{
-	  emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
-	  rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12));
-	  if (delta < 0)
-	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_MINUS (mode, this_rtx, shift)));
-	  else
-	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_PLUS (mode, this_rtx, shift)));
-	}
-  if (mdelta % 4096 != 0)
-	{
-	  scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096));
-	  emit_insn (gen_rtx_SET (this_rtx,
-  gen_rtx_PLUS (mode, this_rtx, scratch_rtx)));
-	}
-}
-}
-
 /* Output code to add DELTA to the first argument, and then jump
to FUNCTION.  Used for C++ multiple inheritance.  */
 static void

[AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-20 Thread Jiong Wang


Those stack adjustment sequences inside aarch64_expand_prologue/epilogue
are doing exactly what's aarch64_add_constant offered, except they also
need to be aware of dwarf generation.

This patch teach existed aarch64_add_constant about dwarf generation and
currently SP register is supported.  Whenever SP is updated, there
should be CFA update, we then mark these instructions as frame related,
and if the update is too complex for gcc to guess the adjustment, we
attach explicit annotation.

Both dwarf frame info size and pro/epilogue scheduling are improved after
this patch as aarch64_add_constant has better utilization of scratch register.

OK for trunk?

gcc/
2016-07-20  Jiong Wang  

* config/aarch64/aarch64.c (aarch64_add_constant): Mark
instruction as frame related when it is.  Generate CFA
annotation when it's necessary.
(aarch64_expand_prologue): Use aarch64_add_constant.
(aarch64_expand_epilogue): Likewise.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 41844a101247c939ecb31f8a8c17cf79759255aa..b38f3f1e8f85a5f3191d0c96080327dac7b2eaed 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1874,6 +1874,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
+  bool frame_related_p = (regnum == SP_REGNUM);
+  rtx_insn *insn;
 
   /* Do nothing if mdelta is zero.  */
   if (!mdelta)
@@ -1882,7 +1884,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   /* We only need single instruction if the offset fit into add/sub.  */
   if (aarch64_uimm12_shift (mdelta))
 {
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
@@ -1895,15 +1898,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
   low_off = delta < 0 ? -low_off : low_off;
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
   aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (frame_related_p)
+{
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  rtx adj = plus_constant (mode, this_rtx, delta);
+  add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+}
 }
 
 static bool
@@ -3038,36 +3049,7 @@ aarch64_expand_prologue (void)
   frame_size -= (offset + crtl->outgoing_args_size);
   fp_offset = 0;
 
-  if (frame_size >= 0x100)
-	{
-	  rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
-	  emit_move_insn (op0, GEN_INT (-frame_size));
-	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
-
-	  add_reg_note (insn, REG_CFA_ADJUST_CFA,
-			gen_rtx_SET (stack_pointer_rtx,
- plus_constant (Pmode, stack_pointer_rtx,
-		-frame_size)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-  else if (frame_size > 0)
-	{
-	  int hi_ofs = frame_size & 0xfff000;
-	  int lo_ofs = frame_size & 0x000fff;
-
-	  if (hi_ofs)
-	{
-	  insn = emit_insn (gen_add2_insn
-(stack_pointer_rtx, GEN_INT (-hi_ofs)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-	  if (lo_ofs)
-	{
-	  insn = emit_insn (gen_add2_insn
-(stack_pointer_rtx, GEN_INT (-lo_ofs)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-	}
+  aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -frame_size);
 }
   else
 frame_size = -1;
@@ -3287,31 +3269,7 @@ aarch64_expand_epilogue (bool for_sibcall)
   if (need_barrier_p)
 	emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
 
-  if (frame_size >= 0x100)
-	{
-	  rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
-	  emit_move_insn (op0, GEN_INT (frame_size));
-	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
-	}
-  else
-	{
-  int hi_ofs = frame_size & 0xfff000;
-  int lo_ofs = frame_size & 0x000fff;
-
-	  if (hi_ofs && lo_ofs)
-	{
-	  insn = emit_insn (gen_add2_insn
-(stack_pointer_rtx, GEN_INT (hi_ofs)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	  frame_size = lo_ofs;
-	}
-	  insn = emit_insn (gen_add2_insn
-			(stack_pointer_rtx, GEN_INT (frame_size)));
-	}
-
-  /* Reset the CFA t

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra

Richard Earnshaw wrote:
> I'm not sure about this, while rtx_cost is called recursively as it
> walks the RTL, I'd normally expect the outer levels of the recursion to
> catch the cases where zero-extend is folded into a more complex
> operation.  Hitting a case like this suggests that something isn't doing
> that correctly.

As mentioned, the query is about an non-existent instruction, so the existing
rtx_cost code won't handle it. In fact there is no other check for "outer" 
anywhere
in aarch64_rtx_cost. We either assume outer == SET or know that if it isn't, the
expression will be split.

> So what was the top-level RTX passed into rtx_cost?  I'd like to get a
> better understanding about the use case before acking this patch.

An example would be:

long f(unsigned x) { return (long)x * 20; }

Combine tries to merge the constant into the multiply, so we get this cost 
query:

(mult:DI (zero_extend:DI (reg/v:SI 74 [ x ]))
(const_int 20 [0x14]))

Given this is not a legal multiply, rtx_mult_cost recurses, assuming both the
zero_extend and the immediate are going to be split off. But then the 
zero_extend
is a SET, ie. a zero-cost operation. So not checking outer is correct.

Wilco

Re: [PATCH]: Use HOST_WIDE_INT_{,M}1{,U} some more

2016-07-20 Thread Bernd Schmidt




On 07/20/2016 02:25 PM, Uros Bizjak wrote:

2016-07-19 14:46 GMT+02:00 Uros Bizjak :

The result of exercises with sed in gcc/ directory.


Some more conversions:

2016-07-20  Uros Bizjak  

* cse.c: Use HOST_WIDE_INT_M1 instead of ~(HOST_WIDE_INT) 0.
* combine.c: Use HOST_WIDE_INT_M1U instead of
~(unsigned HOST_WIDE_INT) 0.
* double-int.h: Ditto.
* dse.c: Ditto.
* dwarf2asm.c:Ditto.
* expmed.c: Ditto.
* genmodes.c: Ditto.
* match.pd: Ditto.
* read-rtl.c: Ditto.
* tree-ssa-loop-ivopts.c: Ditto.
* tree-ssa-loop-prefetch.c: Ditto.
* tree-vect-generic.c: Ditto.
* tree-vect-patterns.c: Ditto.
* tree.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?


I think this is a good set of changes which makes the code easier to 
read. Can I impose one additional requirement, building before/after and 
verifying that all the object files are identical? If you do this, these 
and all other similar changes are preapproved.



Bernd

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 14:08, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
>> I'm not sure about this, while rtx_cost is called recursively as it
>> walks the RTL, I'd normally expect the outer levels of the recursion to
>> catch the cases where zero-extend is folded into a more complex
>> operation.  Hitting a case like this suggests that something isn't doing
>> that correctly.
> 
> As mentioned, the query is about an non-existent instruction, so the existing
> rtx_cost code won't handle it. In fact there is no other check for "outer" 
> anywhere
> in aarch64_rtx_cost. We either assume outer == SET or know that if it isn't, 
> the
> expression will be split.
> 
>> So what was the top-level RTX passed into rtx_cost?  I'd like to get a
>> better understanding about the use case before acking this patch.
> 
> An example would be:
> 
> long f(unsigned x) { return (long)x * 20; }
> 
> Combine tries to merge the constant into the multiply, so we get this cost 
> query:
> 
> (mult:DI (zero_extend:DI (reg/v:SI 74 [ x ]))
> (const_int 20 [0x14]))
> 
> Given this is not a legal multiply, rtx_mult_cost recurses, assuming both the
> zero_extend and the immediate are going to be split off. But then the 
> zero_extend
> is a SET, ie. a zero-cost operation. So not checking outer is correct.
> 
> Wilco
> 

Why does combine care what the cost is if the instruction isn't valid?

R.

Re: [PATCH] c++/58796 Make nullptr match exception handlers of pointer type

2016-07-20 Thread Jonathan Wakely


On 19/07/16 10:32 +0100, Jonathan Wakely wrote:

On 18/07/16 12:49 -0400, Jason Merrill wrote:

Perhaps the right answer is to drop support for catching nullptr as a
pointers to member from the language.


Yes, I've been drafting a ballot comment along those lines.


On the CWG reflector Richard Smith suggested using static objects as
the result for pointer to member handlers. I had tried that
unsuccessfully, but must have done something wrong because it works
fine, and avoids any races.

Tested x86_64-linux. I'll commit this to trunk later today.

commit 6cc1a2bca8dddb8ff5994849fcd3ee22de8776ed
Author: Jonathan Wakely 
Date:   Wed Jul 20 12:49:50 2016 +0100

Use static pointer to member when catching nullptr

libstdc++-v3:

	* libsupc++/pbase_type_info.cc (__pbase_type_info::__do_catch): Use
	static objects for catching nullptr as pointer to member types.

gcc/testsuite:

	* g++.dg/cpp0x/nullptr35.C: Change expected result for catching as
	pointer to member function and also test catching by reference.

diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr35.C b/gcc/testsuite/g++.dg/cpp0x/nullptr35.C
index c84966f..d932114 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nullptr35.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr35.C
@@ -39,7 +39,7 @@ int main()
   caught(4);
 throw;
   }
-} catch (int (A::*pmf)()) {  // FIXME: currently unsupported
+} catch (int (A::*pmf)()) {
   if (pmf == nullptr)
 caught(8);
   throw;
@@ -47,6 +47,35 @@ int main()
   } catch (nullptr_t) {
   }
 
-  if (result != 7) // should be 15
+  try {
+try {
+  try {
+try {
+  try {
+throw nullptr;
+  } catch (void* const& p) {
+if (p == nullptr)
+  caught(16);
+throw;
+  }
+} catch (void(* const& pf)()) {
+  if (pf == nullptr)
+caught(32);
+  throw;
+}
+  } catch (int A::* const& pm) {
+if (pm == nullptr)
+  caught(64);
+throw;
+  }
+} catch (int (A::* const& pmf)()) {
+  if (pmf == nullptr)
+caught(128);
+  throw;
+}
+  } catch (nullptr_t) {
+  }
+
+  if (result != 255)
 abort ();
 }
diff --git a/libstdc++-v3/libsupc++/pbase_type_info.cc b/libstdc++-v3/libsupc++/pbase_type_info.cc
index a2993e4..ff6b756 100644
--- a/libstdc++-v3/libsupc++/pbase_type_info.cc
+++ b/libstdc++-v3/libsupc++/pbase_type_info.cc
@@ -50,14 +50,16 @@ __do_catch (const type_info *thr_type,
 {
   if (__pointee->__is_function_p ())
 {
-  // A pointer-to-member-function is two words  but the
-  // nullptr_t exception object at *(nullptr_t*)*thr_obj is only
-  // one word, so we can't safely return it as a PMF. FIXME.
-  return false;
+  using pmf_type = void (__pbase_type_info::*)();
+  static const pmf_type pmf = nullptr;
+  *thr_obj = const_cast(&pmf);
+  return true;
 }
   else
 {
-  *(ptrdiff_t*)*thr_obj = -1; // null pointer to data member
+  using pm_type = int __pbase_type_info::*;
+  static const pm_type pm = nullptr;
+  *thr_obj = const_cast(&pm);
   return true;
 }
 }

[PATCH] Consider functions with xloc.file == NULL (PR, gcov-profile/69028)

2016-07-20 Thread Martin Liška

Hi.

Following patch addresses ICE which happens when coverage.c computes checksum
of a function w/o xloc.file. My patch assumes it's a valid state having a 
function
w/o xloc.file, which is situation exposed by cilkplus functions.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From ac1ba622f394d9914c5f8250719780595f54b571 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 20 Jul 2016 09:54:12 +0200
Subject: [PATCH] Consider functions with xloc.file == NULL (PR
 gcov-profile/69028)

gcc/testsuite/ChangeLog:

2016-07-20  Martin Liska  

	PR gcov-profile/69028
	PR gcov-profile/62047
	* g++.dg/cilk-plus/pr69028.C: New test.

gcc/ChangeLog:

2016-07-20  Martin Liska  

	* coverage.c (coverage_compute_lineno_checksum): Do not
	calculate checksum for fns w/o xloc.file.
	(coverage_compute_profile_id): Likewise.
---
 gcc/coverage.c   |  6 --
 gcc/testsuite/g++.dg/cilk-plus/pr69028.C | 13 +
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cilk-plus/pr69028.C

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 67cc908..d4d371e 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -553,7 +553,8 @@ coverage_compute_lineno_checksum (void)
 = expand_location (DECL_SOURCE_LOCATION (current_function_decl));
   unsigned chksum = xloc.line;
 
-  chksum = coverage_checksum_string (chksum, xloc.file);
+  if (xloc.file)
+chksum = coverage_checksum_string (chksum, xloc.file);
   chksum = coverage_checksum_string
 (chksum, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (current_function_decl)));
 
@@ -580,7 +581,8 @@ coverage_compute_profile_id (struct cgraph_node *n)
   bool use_name_only = (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID) == 0);
 
   chksum = (use_name_only ? 0 : xloc.line);
-  chksum = coverage_checksum_string (chksum, xloc.file);
+  if (xloc.file)
+	chksum = coverage_checksum_string (chksum, xloc.file);
   chksum = coverage_checksum_string
 	(chksum, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (n->decl)));
   if (!use_name_only && first_global_object_name)
diff --git a/gcc/testsuite/g++.dg/cilk-plus/pr69028.C b/gcc/testsuite/g++.dg/cilk-plus/pr69028.C
new file mode 100644
index 000..31542f3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/pr69028.C
@@ -0,0 +1,13 @@
+// PR c++/69028
+// { dg-require-effective-target c++11 }
+// { dg-options "-fcilkplus -fprofile-arcs" }
+
+void parallel()
+{
+}
+
+int main()
+{
+   _Cilk_spawn parallel();
+   _Cilk_sync;
+}
-- 
2.9.0

[PATCH] report supported function classes correctly on *-musl

2016-07-20 Thread Szabolcs Nagy

All function classes listed in gcc/coretypes.h are supported by musl.

Most of the optimizations based on these function classes are not
relevant for standard conform c code, but this is required to get
rid of some test system noise.

gcc/
2016-07-20  Szabolcs Nagy  

* config/linux.c (linux_libc_has_function): Return true on musl.
From 294b908f9a7577bcfe8036a601262ca0bc7c2ca2 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy 
Date: Fri, 6 Nov 2015 23:59:20 +
Subject: [PATCH 2/7] linux_libc_has_function

---
 gcc/config/linux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/linux.c b/gcc/config/linux.c
index 2081e34..37515bf 100644
--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -26,7 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 bool
 linux_libc_has_function (enum function_class fn_class)
 {
-  if (OPTION_GLIBC)
+  if (OPTION_GLIBC || OPTION_MUSL)
 return true;
   if (OPTION_BIONIC)
 if (fn_class == function_c94
-- 
2.4.1

Re: [C++ PATCH] Allow frexp etc. builtins in c++14 constexpr (PR c++/50060)

2016-07-20 Thread Jason Merrill

OK.

On Mon, Jul 18, 2016 at 5:07 PM, Jakub Jelinek  wrote:
> On Mon, Jul 18, 2016 at 02:42:43PM -0400, Jason Merrill wrote:
>> Ah, I guess we need to check cxx_dialect in cxx_eval_store_expression,
>> not just in potential_constant_expression.
>
> Here is an updated version, bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?
>
> 2016-07-18  Jakub Jelinek  
>
> PR c++/50060
> * constexpr.c (cxx_eval_builtin_function_call): Pass false as lval
> when evaluating call arguments.  Use fold_builtin_call_array instead
> of fold_build_call_array_loc, return t if it returns NULL.  Otherwise
> check the result with potential_constant_expression and call
> cxx_eval_constant_expression on it.
>
> * g++.dg/cpp0x/constexpr-50060.C: New test.
> * g++.dg/cpp1y/constexpr-50060.C: New test.
>
> --- gcc/cp/constexpr.c.jj   2016-07-18 20:42:51.163955883 +0200
> +++ gcc/cp/constexpr.c  2016-07-18 20:55:47.246152938 +0200
> @@ -1105,7 +1105,7 @@ cxx_eval_builtin_function_call (const co
>for (i = 0; i < nargs; ++i)
>  {
>args[i] = cxx_eval_constant_expression (&new_ctx, CALL_EXPR_ARG (t, i),
> - lval, &dummy1, &dummy2);
> + false, &dummy1, &dummy2);
>if (bi_const_p)
> /* For __built_in_constant_p, fold all expressions with constant 
> values
>even if they aren't C++ constant-expressions.  */
> @@ -1114,13 +1114,31 @@ cxx_eval_builtin_function_call (const co
>
>bool save_ffbcp = force_folding_builtin_constant_p;
>force_folding_builtin_constant_p = true;
> -  new_call = fold_build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
> -   CALL_EXPR_FN (t), nargs, args);
> -  /* Fold away the NOP_EXPR from fold_builtin_n.  */
> -  new_call = fold (new_call);
> +  new_call = fold_builtin_call_array (EXPR_LOCATION (t), TREE_TYPE (t),
> + CALL_EXPR_FN (t), nargs, args);
>force_folding_builtin_constant_p = save_ffbcp;
> -  VERIFY_CONSTANT (new_call);
> -  return new_call;
> +  if (new_call == NULL)
> +{
> +  if (!*non_constant_p && !ctx->quiet)
> +   {
> + new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
> +  CALL_EXPR_FN (t), nargs, args);
> + error ("%q+E is not a constant expression", new_call);
> +   }
> +  *non_constant_p = true;
> +  return t;
> +}
> +
> +  if (!potential_constant_expression (new_call))
> +{
> +  if (!*non_constant_p && !ctx->quiet)
> +   error ("%q+E is not a constant expression", new_call);
> +  *non_constant_p = true;
> +  return t;
> +}
> +
> +  return cxx_eval_constant_expression (&new_ctx, new_call, lval,
> +  non_constant_p, overflow_p);
>  }
>
>  /* TEMP is the constant value of a temporary object of type TYPE.  Adjust
> --- gcc/testsuite/g++.dg/cpp0x/constexpr-50060.C.jj 2016-07-18 
> 21:03:12.505532831 +0200
> +++ gcc/testsuite/g++.dg/cpp0x/constexpr-50060.C2016-07-18 
> 21:05:41.306655422 +0200
> @@ -0,0 +1,21 @@
> +// PR c++/50060
> +// { dg-do compile { target c++11 } }
> +
> +extern "C" double frexp (double, int *);
> +
> +struct S
> +{
> +  constexpr S (double a) : y {}, x (frexp (a, &y)) {}  // { dg-error "is not 
> a constant expression" "S" { target { ! c++14 } } }
> +  double x;
> +  int y;
> +};
> +
> +struct T
> +{
> +  constexpr T (double a) : y {}, x ((y = 1, 0.8125)) {}// { dg-error 
> "is not a constant-expression" "T" { target { ! c++14 } } }
> +  double x;
> +  int y;
> +};
> +
> +static_assert (S (6.5).x == 0.8125, "");   // { dg-error "non-constant 
> condition for static assertion|in constexpr expansion" "" { target { ! c++14 
> } } }
> +static_assert (T (6.5).x == 0.8125, "");   // { dg-error "non-constant 
> condition for static assertion|called in a constant expression" "" { target { 
> ! c++14 } } }
> --- gcc/testsuite/g++.dg/cpp1y/constexpr-50060.C.jj 2016-07-18 
> 20:46:00.992553765 +0200
> +++ gcc/testsuite/g++.dg/cpp1y/constexpr-50060.C2016-07-18 
> 20:46:00.992553765 +0200
> @@ -0,0 +1,100 @@
> +// PR c++/50060
> +// { dg-do compile { target c++14 } }
> +
> +// sincos and lgamma_r aren't available in -std=c++14,
> +// only in -std=gnu++14.  Use __builtin_* in that case.
> +extern "C" void sincos (double, double *, double *);
> +extern "C" double frexp (double, int *);
> +extern "C" double modf (double, double *);
> +extern "C" double remquo (double, double, int *);
> +extern "C" double lgamma_r (double, int *);
> +
> +constexpr double
> +f0 (double x)
> +{
> +  double y {};
> +  double z {};
> +  __builtin_sincos (x, &y, &z);
> +  return y;
> +}
> +
> +constexpr double
> +f1 (double x)
> +{
> +  double y {};
> +  double z {};
> +  __builtin_sincos (x, &y, &z);
> +  return z;
> +}
>

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra

Richard Earnshaw wrote:
> Why does combine care what the cost is if the instruction isn't valid?

No idea. Combine does lots of odd things that don't make sense to me. 
Unfortunately the costs we give for cases like this need to be accurate or
they negatively affect code quality. The reason for this patch was to fix
some unexpected slowdowns caused by the cost for zero_extend being
too high.

Wilco

Re: [C++ PATCH] cp_parser_save_member_function_body fix (PR c++/71909)

2016-07-20 Thread Jason Merrill

OK.

On Mon, Jul 18, 2016 at 5:14 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch fixes two issues:
> 1) as shown in the first testcase, cp_parser_save_member_function_body
>adds the catch () { ... } tokens into the saved token range
>even when there is no function try block (missing try keyword)
> 2) if the method starts with __transaction_{atomic,relaxed}, and
>e.g. contains {}s somewhere in the mem-initializers, then
>cp_parser_save_member_function_body stops saving the tokens early
>instead of late
>
> The following patch attempts to handle the same cases
> cp_parser_function_definition_after_declarator handles (ok, ignores
> the already unsupported return extension) - note that
> cp_parser_txn_attribute_opt handles only a small subset of C++11 attributes
> (and only once, not multiple times).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-07-18  Jakub Jelinek  
>
> PR c++/71909
> * parser.c (cp_parser_save_member_function_body): Consume
> __transaction_relaxed or __transaction_atomic with optional
> attribute.  Only skip catch with block if try keyword is seen.
>
> * g++.dg/parse/pr71909.C: New test.
> * g++.dg/tm/pr71909.C: New test.
>
> --- gcc/cp/parser.c.jj  2016-07-16 10:41:04.0 +0200
> +++ gcc/cp/parser.c 2016-07-18 11:47:49.487748010 +0200
> @@ -26044,6 +26044,7 @@ cp_parser_save_member_function_body (cp_
>cp_token *first;
>cp_token *last;
>tree fn;
> +  bool function_try_block = false;
>
>/* Create the FUNCTION_DECL.  */
>fn = grokmethod (decl_specifiers, declarator, attributes);
> @@ -26065,9 +26066,43 @@ cp_parser_save_member_function_body (cp_
>/* Save away the tokens that make up the body of the
>   function.  */
>first = parser->lexer->next_token;
> +
> +  if (cp_lexer_next_token_is_keyword (parser->lexer, 
> RID_TRANSACTION_RELAXED))
> +cp_lexer_consume_token (parser->lexer);
> +  else if (cp_lexer_next_token_is_keyword (parser->lexer,
> +  RID_TRANSACTION_ATOMIC))
> +{
> +  cp_lexer_consume_token (parser->lexer);
> +  /* Match cp_parser_txn_attribute_opt [[ identifier ]].  */
> +  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_SQUARE)
> + && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_SQUARE)
> + && (cp_lexer_nth_token_is (parser->lexer, 3, CPP_NAME)
> + || cp_lexer_nth_token_is (parser->lexer, 3, CPP_KEYWORD))
> + && cp_lexer_nth_token_is (parser->lexer, 4, CPP_CLOSE_SQUARE)
> + && cp_lexer_nth_token_is (parser->lexer, 5, CPP_CLOSE_SQUARE))
> +   {
> + cp_lexer_consume_token (parser->lexer);
> + cp_lexer_consume_token (parser->lexer);
> + cp_lexer_consume_token (parser->lexer);
> + cp_lexer_consume_token (parser->lexer);
> + cp_lexer_consume_token (parser->lexer);
> +   }
> +  else
> +   while (cp_next_tokens_can_be_gnu_attribute_p (parser)
> +  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_PAREN))
> + {
> +   cp_lexer_consume_token (parser->lexer);
> +   if (cp_parser_cache_group (parser, CPP_CLOSE_PAREN, /*depth=*/0))
> + break;
> + }
> +}
> +
>/* Handle function try blocks.  */
>if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TRY))
> -cp_lexer_consume_token (parser->lexer);
> +{
> +  cp_lexer_consume_token (parser->lexer);
> +  function_try_block = true;
> +}
>/* We can have braced-init-list mem-initializers before the fn body.  */
>if (cp_lexer_next_token_is (parser->lexer, CPP_COLON))
>  {
> @@ -26085,8 +26120,9 @@ cp_parser_save_member_function_body (cp_
>  }
>cp_parser_cache_group (parser, CPP_CLOSE_BRACE, /*depth=*/0);
>/* Handle function try blocks.  */
> -  while (cp_lexer_next_token_is_keyword (parser->lexer, RID_CATCH))
> -cp_parser_cache_group (parser, CPP_CLOSE_BRACE, /*depth=*/0);
> +  if (function_try_block)
> +while (cp_lexer_next_token_is_keyword (parser->lexer, RID_CATCH))
> +  cp_parser_cache_group (parser, CPP_CLOSE_BRACE, /*depth=*/0);
>last = parser->lexer->next_token;
>
>/* Save away the inline definition; we will process it when the
> --- gcc/testsuite/g++.dg/parse/pr71909.C.jj 2016-07-18 11:55:51.169600236 
> +0200
> +++ gcc/testsuite/g++.dg/parse/pr71909.C2016-07-18 11:57:09.99364 
> +0200
> @@ -0,0 +1,22 @@
> +// PR c++/71909
> +// { dg-do compile }
> +
> +struct S
> +{
> +  S () try : m (0) {}
> +  catch (...) {}
> +  void foo () try {}
> +  catch (int) {}
> +  catch (...) {}
> +  int m;
> +};
> +
> +struct T
> +{
> +  T () : m (0) {}
> +  catch (...) {}   // { dg-error "expected unqualified-id before" }
> +  void foo () {}
> +  catch (int) {}   // { dg-error "expected unqualified-id before" }
> +  catch (...) {}   // { dg-error "expected unqualified-id before" }
> +  int

Re: [PATCH] disable ifunc on *-musl by default

2016-07-20 Thread David Edelsohn

> Musl libc does not support gnu ifunc, so disable it by default.
> (not disabled on s390-* since that has no musl support yet.)

Musl libc now supports PPC64. Support for s390 is in progress.

- David

[PATCH] disable ifunc on *-musl by default

2016-07-20 Thread Szabolcs Nagy

Musl libc does not support gnu ifunc, so disable it by default.
(not disabled on s390-* since that has no musl support yet.)

gcc/
2016-07-20  Szabolcs Nagy  

* config.gcc (*-*-*musl*): Disable gnu-indirect-function.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f75f17..f3f6e14 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1465,7 +1465,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-gnu* | i[34567]8
 		extra_options="${extra_options} linux-android.opt"
 		# Assume modern glibc if not targeting Android nor uclibc.
 		case ${target} in
-		*-*-*android*|*-*-*uclibc*)
+		*-*-*android*|*-*-*uclibc*|*-*-*musl*)
 		  ;;
 		*)
 		  default_gnu_indirect_function=yes
@@ -1531,7 +1531,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
 		extra_options="${extra_options} linux-android.opt"
 		# Assume modern glibc if not targeting Android nor uclibc.
 		case ${target} in
-		*-*-*android*|*-*-*uclibc*)
+		*-*-*android*|*-*-*uclibc*|*-*-*musl*)
 		  ;;
 		*)
 		  default_gnu_indirect_function=yes

Re: [PATCH] c++/60760 - arithmetic on null pointers should not be allowed in constant expressions

2016-07-20 Thread Jason Merrill

On Mon, Jul 18, 2016 at 6:15 PM, Martin Sebor  wrote:
> On 07/18/2016 11:51 AM, Jason Merrill wrote:
>>
>> On 07/06/2016 06:20 PM, Martin Sebor wrote:
>>>
>>> @@ -2911,6 +2923,14 @@ cxx_eval_indirect_ref (const constexpr_ctx
>>> *ctx, tree t,
>>>if (*non_constant_p)
>>>  return t;
>>>
>>> +  if (integer_zerop (op0))
>>> +{
>>> +  if (!ctx->quiet)
>>> +error ("dereferencing a null pointer");
>>> +  *non_constant_p = true;
>>> +  return t;
>>> +}
>>
>> I'm skeptical of checking this here, since *p is valid for null p; &*p
>> is even a constant expression.  And removing this hunk doesn't seem to
>> break any of your tests.
>>
>> OK with that hunk removed.
>
> With it removed the constexpr-nullptr-2.C test fails on line 64:
>
>   constexpr const int *pi0 = &pa2->pa1->pa0->i;   // { dg-error "null
> pointer|not a constant" }
>
> Here, pa2 and pa1 are non-null but pa0 is null.

It doesn't fail for me; that line hits the error in
cxx_eval_component_reference.  I'm only talking about removing the
cxx_eval_indirect_ref hunk.

Jason

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 14:40, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
>> Why does combine care what the cost is if the instruction isn't valid?
> 
> No idea. Combine does lots of odd things that don't make sense to me. 
> Unfortunately the costs we give for cases like this need to be accurate or
> they negatively affect code quality. The reason for this patch was to fix
> some unexpected slowdowns caused by the cost for zero_extend being
> too high.
> 
> Wilco
> 

Well if I take your testcase and plug it into a fairly recent gcc I get:

x:
mov w1, 20
umull   x0, w0, w1
ret

If I change the constant to 33, I then get:

x:
uxtwx0, w0
add x0, x0, x0, lsl 5
ret

Both of which look reasonable to me.

[PATCH] check -nopie in configure

2016-07-20 Thread Szabolcs Nagy

since gcc can be built with --enable-default-pie, there
is a -no-pie flag to turn off PIE.

gcc cannot be built as PIE (pr 71934), so the gcc build
system has to detect the -no-pie flag to disable PIE.

historically default pie toolchains used the -nopie flag
(e.g. gentoo hardened), those toolchains cannot build
gcc anymore, so detect -nopie too.

gcc/
2016-07-20  Szabolcs Nagy  

* configure.ac: Detect -nopie flag just like -no-pie.
* configure: Regenerate.
diff --git a/gcc/configure b/gcc/configure
index ed44472..ca16e66 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -29566,6 +29566,33 @@ fi
 $as_echo "$gcc_cv_no_pie" >&6; }
 if test "$gcc_cv_no_pie" = "yes"; then
   NO_PIE_FLAG="-no-pie"
+else
+  # Check if -nopie works.
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for -nopie option" >&5
+$as_echo_n "checking for -nopie option... " >&6; }
+if test "${gcc_cv_nopie+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  saved_LDFLAGS="$LDFLAGS"
+ LDFLAGS="$LDFLAGS -nopie"
+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int main(void) {return 0;}
+_ACEOF
+if ac_fn_cxx_try_link "$LINENO"; then :
+  gcc_cv_nopie=yes
+else
+  gcc_cv_nopie=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+ LDFLAGS="$saved_LDFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_nopie" >&5
+$as_echo "$gcc_cv_nopie" >&6; }
+  if test "$gcc_cv_nopie" = "yes"; then
+NO_PIE_FLAG="-nopie"
+  fi
 fi
 
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 086d0fc..98ab5cb 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6200,6 +6200,19 @@ AC_CACHE_CHECK([for -no-pie option],
LDFLAGS="$saved_LDFLAGS"])
 if test "$gcc_cv_no_pie" = "yes"; then
   NO_PIE_FLAG="-no-pie"
+else
+  # Check if -nopie works.
+  AC_CACHE_CHECK([for -nopie option],
+[gcc_cv_nopie],
+[saved_LDFLAGS="$LDFLAGS"
+ LDFLAGS="$LDFLAGS -nopie"
+ AC_LINK_IFELSE([int main(void) {return 0;}],
+   [gcc_cv_nopie=yes],
+   [gcc_cv_nopie=no])
+ LDFLAGS="$saved_LDFLAGS"])
+  if test "$gcc_cv_nopie" = "yes"; then
+NO_PIE_FLAG="-nopie"
+  fi
 fi
 AC_SUBST([NO_PIE_FLAG])

[PATCH] target lib tests with build sysroot PR testsuite/71931

2016-07-20 Thread Szabolcs Nagy

Fix target library tests when gcc is built using --with-build-sysroot.

The dejagnu find_gcc function cannot handle if CC needs extra flags
like --sysroot. So for testing target libraries use the same CC that
was used for building the target libs. This change assumes the test
is ran from make.

Another approach would be to pass down the sysroot flags
separately and add
set TEST_ALWAYS_FLAGS "$(SYSROOT_CFLAGS_FOR_TARGET)"
to site.exp like the gcc site.exp does, but that's more
changes.

libatomic/
2016-07-20  Szabolcs Nagy  

PR testsuite/71931
* testuite/lib/libatomic.exp (libatomic_init): Use CC.
* testuite/Makefile.am: Export CC.
* testuite/Makefile.in: Regenerated.

libgomp/
2016-07-20  Szabolcs Nagy  

PR testsuite/71931
* testuite/lib/libgomp.exp (libgomp_init): Use CC.
* testuite/Makefile.am: Export CC.
* testuite/Makefile.in: Regenerated.

libitm/
2016-07-20  Szabolcs Nagy  

PR testsuite/71931
* testuite/lib/libitm.exp (libitm_init): Use CC.
* testuite/Makefile.am: Export CC.
* testuite/Makefile.in: Regenerated.

libvtv/
2016-07-20  Szabolcs Nagy  

PR testsuite/71931
* testuite/lib/libvtv.exp (libvtv_init): Use CC.
* testuite/Makefile.am: Export CC.
* testuite/Makefile.in: Regenerated.
diff --git a/libatomic/testsuite/Makefile.am b/libatomic/testsuite/Makefile.am
index 561b7e2..d9af02a 100644
--- a/libatomic/testsuite/Makefile.am
+++ b/libatomic/testsuite/Makefile.am
@@ -11,3 +11,5 @@ EXPECT = $(shell if test -f $(top_builddir)/../expect/expect; then \
 _RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \
 	 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 RUNTEST = "$(_RUNTEST) $(AM_RUNTESTFLAGS)"
+
+export CC
diff --git a/libatomic/testsuite/Makefile.in b/libatomic/testsuite/Makefile.in
index 34f83e0..8392e01 100644
--- a/libatomic/testsuite/Makefile.in
+++ b/libatomic/testsuite/Makefile.in
@@ -428,6 +428,8 @@ uninstall-am:
 	uninstall uninstall-am
 
 
+export CC
+
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
diff --git a/libatomic/testsuite/lib/libatomic.exp b/libatomic/testsuite/lib/libatomic.exp
index cafab54..6ba67e8 100644
--- a/libatomic/testsuite/lib/libatomic.exp
+++ b/libatomic/testsuite/lib/libatomic.exp
@@ -90,7 +90,7 @@ proc libatomic_init { args } {
 	if [info exists TOOL_EXECUTABLE] {
 	set GCC_UNDER_TEST $TOOL_EXECUTABLE
 	} else {
-	set GCC_UNDER_TEST "[find_gcc]"
+	set GCC_UNDER_TEST "[getenv CC]"
 	}
 }
 
diff --git a/libgomp/testsuite/Makefile.am b/libgomp/testsuite/Makefile.am
index 66a9d94..821ab31 100644
--- a/libgomp/testsuite/Makefile.am
+++ b/libgomp/testsuite/Makefile.am
@@ -25,3 +25,5 @@ libgomp-test-support.exp: libgomp-test-support.pt.exp Makefile
 	mv $@.tmp $@
 
 all-local: libgomp-test-support.exp
+
+export CC
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index 4dbb406..7bd8b86 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -475,6 +475,8 @@ libgomp-test-support.exp: libgomp-test-support.pt.exp Makefile
 
 all-local: libgomp-test-support.exp
 
+export CC
+
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 1cb4991..063 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -108,7 +108,7 @@ proc libgomp_init { args } {
 	if [info exists TOOL_EXECUTABLE] {
 	set GCC_UNDER_TEST $TOOL_EXECUTABLE
 	} else {
-	set GCC_UNDER_TEST "[find_gcc]"
+	set GCC_UNDER_TEST "[getenv CC]"
 	}
 }
 
diff --git a/libitm/testsuite/Makefile.am b/libitm/testsuite/Makefile.am
index 561b7e2..d9af02a 100644
--- a/libitm/testsuite/Makefile.am
+++ b/libitm/testsuite/Makefile.am
@@ -11,3 +11,5 @@ EXPECT = $(shell if test -f $(top_builddir)/../expect/expect; then \
 _RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \
 	 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 RUNTEST = "$(_RUNTEST) $(AM_RUNTESTFLAGS)"
+
+export CC
diff --git a/libitm/testsuite/Makefile.in b/libitm/testsuite/Makefile.in
index 4d79781..49a333a 100644
--- a/libitm/testsuite/Makefile.in
+++ b/libitm/testsuite/Makefile.in
@@ -438,6 +438,8 @@ uninstall-am:
 	uninstall uninstall-am
 
 
+export CC
+
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
index 0416296..8679e92 100644
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -90,7 +90,7 @@ proc libitm_init { args } {
 	if [info exists TOOL_EXECUTABLE] {
 	set G

Re: [AArch64][1/3] Migrate aarch64_add_constant to new interface & kill aarch64_build_constant

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 14:02, Jiong Wang wrote:
> Currently aarch64_add_constant is using aarch64_build_constant to move
> an immediate into the destination register.
> 
> It has considered the following situations:
> 
>   * immediate can fit into bitmask pattern that only needs single
> instruction.
>   * immediate can fit into single movz/movn.
>   * immediate needs single movz/movn, and multiply movk.
> 
> 
> Actually we have another constant building helper function
> "aarch64_internal_mov_immediate" which cover all these situations and
> more.
> 
> This patch thus migrate aarch64_add_constant to
> aarch64_internal_mov_immediate so that we can kill the old
> aarch64_build_constant.
> 
> OK for trunk?
> 
> gcc/
> 2016-07-20  Jiong Wang  
> 
> * config/aarch64/aarch64.c (aarch64_add_constant): New
> parameter "mode".  Use aarch64_internal_mov_immediate
> instead of aarch64_build_constant.
> (aarch64_build_constant): Delete.
> 

Really you should also list the callers of aarch64_add_constant that
have been updated as well (there aren't that many).

OK with that change.

R.

> 
> build-const-1.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 512ef10d158d2eaa1384d28c43b9a8f90387099d..aeea3b3ebc514663043ac8d7cd13361f06f78502
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3337,98 +3337,20 @@ aarch64_final_eh_return_addr (void)
>  - 2 * UNITS_PER_WORD));
>  }
>  
> -/* Possibly output code to build up a constant in a register.  For
> -   the benefit of the costs infrastructure, returns the number of
> -   instructions which would be emitted.  GENERATE inhibits or
> -   enables code generation.  */
> -
> -static int
> -aarch64_build_constant (int regnum, HOST_WIDE_INT val, bool generate)
> -{
> -  int insns = 0;
> -
> -  if (aarch64_bitmask_imm (val, DImode))
> -{
> -  if (generate)
> - emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val));
> -  insns = 1;
> -}
> -  else
> -{
> -  int i;
> -  int ncount = 0;
> -  int zcount = 0;
> -  HOST_WIDE_INT valp = val >> 16;
> -  HOST_WIDE_INT valm;
> -  HOST_WIDE_INT tval;
> -
> -  for (i = 16; i < 64; i += 16)
> - {
> -   valm = (valp & 0x);
> -
> -   if (valm != 0)
> - ++ zcount;
> -
> -   if (valm != 0x)
> - ++ ncount;
> -
> -   valp >>= 16;
> - }
> -
> -  /* zcount contains the number of additional MOVK instructions
> -  required if the constant is built up with an initial MOVZ instruction,
> -  while ncount is the number of MOVK instructions required if starting
> -  with a MOVN instruction.  Choose the sequence that yields the fewest
> -  number of instructions, preferring MOVZ instructions when they are both
> -  the same.  */
> -  if (ncount < zcount)
> - {
> -   if (generate)
> - emit_move_insn (gen_rtx_REG (Pmode, regnum),
> - GEN_INT (val | ~(HOST_WIDE_INT) 0x));
> -   tval = 0x;
> -   insns++;
> - }
> -  else
> - {
> -   if (generate)
> - emit_move_insn (gen_rtx_REG (Pmode, regnum),
> - GEN_INT (val & 0x));
> -   tval = 0;
> -   insns++;
> - }
> -
> -  val >>= 16;
> -
> -  for (i = 16; i < 64; i += 16)
> - {
> -   if ((val & 0x) != tval)
> - {
> -   if (generate)
> - emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum),
> -GEN_INT (i),
> -GEN_INT (val & 0x)));
> -   insns++;
> - }
> -   val >>= 16;
> - }
> -}
> -  return insns;
> -}
> -
>  static void
> -aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
> +aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
> +   HOST_WIDE_INT delta)
>  {
>HOST_WIDE_INT mdelta = delta;
> -  rtx this_rtx = gen_rtx_REG (Pmode, regnum);
> -  rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg);
> +  rtx this_rtx = gen_rtx_REG (mode, regnum);
> +  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
>  
>if (mdelta < 0)
>  mdelta = -mdelta;
>  
>if (mdelta >= 4096 * 4096)
>  {
> -  (void) aarch64_build_constant (scratchreg, delta, true);
> +  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, 
> mode);
>emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
>  }
>else if (mdelta > 0)
> @@ -3436,19 +3358,19 @@ aarch64_add_constant (int regnum, int scratchreg, 
> HOST_WIDE_INT delta)
>if (mdelta >= 4096)
>   {
> emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
> -   rtx shift = gen_rtx_ASHIFT (Pmode, scratch_rtx, GEN_INT (12));
> +   rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12));
>

Re: [PATCH] disable ifunc on *-musl by default

2016-07-20 Thread Szabolcs Nagy

On 20/07/16 14:45, David Edelsohn wrote:
>> Musl libc does not support gnu ifunc, so disable it by default.
>> (not disabled on s390-* since that has no musl support yet.)
> 
> Musl libc now supports PPC64. Support for s390 is in progress.
> 

it seemed to me that on ppc64 ifunc is disabled by default.
(at least it is not enabled in config.gcc)

Re: [PATCH] Avoid invoking ranlib on libbackend.a

2016-07-20 Thread Patrick Palka

On Wed, 20 Jul 2016, Bernd Schmidt wrote:

> On 07/19/2016 10:20 AM, Richard Biener wrote:
> > I like it.  Improving re-build time in my dev tree is very much
> > welcome, and yes,
> > libbackend build time is a big part of it usually (plus of course cc1
> > link time).
> 
> Since that wasn't an entirely explicit ack, I'll add mine. Thank you for doing
> this.
> 
> 
> Bernd
> 
> 

Committed as r238524 with the following minor change to the configure
test to use $CFLAGS and $LDFLAGS consistently:

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 63052ba..241e82d 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4905,7 +4905,7 @@ echo 'int main (void) { return 0; }' > conftest.c
 if ($AR --version | sed 1q | grep "GNU ar" \
 && $CC $CFLAGS -c conftest.c \
 && $AR rcT conftest.a conftest.o \
-&& $CC -o conftest conftest.a) >/dev/null 2>&1; then
+&& $CC $CFLAGS $LDFLAGS -o conftest conftest.a) >/dev/null 2>&1; then
   thin_archive_support=yes
 fi
 rm -f conftest.c conftest.o conftest.a conftest

[PATCH v2] C++ FE: handle misspelled identifiers and typenames

2016-07-20 Thread David Malcolm

Changes in v2:
 - split out the non-C++ parts already approved by Jeff (I've committed
   these as r238522).
 - updated to mirror the fixes for PR c/71858 Jakub made to the
   corresponding C implementation in r238352, skipping anticipated decls
   of builtin functions
 - rewritten to more closely resemble the C FE's implementation

This is a port of the C frontend's r237714 [1] to the C++ frontend:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01052.html
offering spelling suggestions for misspelled identifiers, macro names,
and some keywords (e.g. "singed" vs "signed" aka PR c/70339).

Examples of suggestions can be seen in the test case.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu; adds
267 PASS results to g++.sum.

OK for trunk?

[1] aka 8469aece13814deddf2cd80538d33c2d0a8d60d9 in the git mirror

gcc/cp/ChangeLog:
PR c/70339
PR c/71858
* name-lookup.c: Include gcc-rich-location.h, spellcheck-tree.h,
and parser.h.
(suggest_alternatives_for): If no candidates are found, try
lookup_name_fuzzy and report if if finds a suggestion.
(consider_binding_level): New function.
(lookup_name_fuzzy) New function.
* parser.c: Include gcc-rich-location.h.
(cp_lexer_next_token_is_decl_specifier_keyword): Move most of
logic into...
(cp_keyword_starts_decl_specifier_p): ...this new function.
(cp_parser_diagnose_invalid_type_name): When issuing
"does not name a type" errors, attempt to make a suggestion using
lookup_name_fuzzy.
* parser.h (cp_keyword_starts_decl_specifier_p): New prototype.
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Don't
attempt to access TYPE_FIELDS within a TYPE_PACK_EXPANSION.

gcc/testsuite/ChangeLog:
PR c/70339
PR c/71858
* g++.dg/spellcheck-identifiers.C: New test case, based on
gcc.dg/spellcheck-identifiers.c.
* g++.dg/spellcheck-identifiers-2.C: New test case, based on
gcc.dg/spellcheck-identifiers-2.c.
* g++.dg/spellcheck-typenames.C: New test case, based on
gcc.dg/spellcheck-typenames.c
---
 gcc/cp/name-lookup.c| 116 ++-
 gcc/cp/parser.c |  43 +++-
 gcc/cp/parser.h |   1 +
 gcc/cp/search.c |   4 +
 gcc/testsuite/g++.dg/spellcheck-identifiers-2.C |  43 
 gcc/testsuite/g++.dg/spellcheck-identifiers.C   | 255 
 gcc/testsuite/g++.dg/spellcheck-typenames.C |  84 
 7 files changed, 533 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-identifiers-2.C
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-identifiers.C
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-typenames.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index cbd5209..561bf71 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -29,6 +29,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 #include "c-family/c-pragma.h"
 #include "params.h"
+#include "gcc-rich-location.h"
+#include "spellcheck-tree.h"
+#include "parser.h"
 
 /* The bindings for a particular name in a particular scope.  */
 
@@ -4435,9 +4438,20 @@ suggest_alternatives_for (location_t location, tree name)
 
   namespaces_to_search.release ();
 
-  /* Nothing useful to report.  */
+  /* Nothing useful to report for NAME.  Report on likely misspellings,
+ or do nothing.  */
   if (candidates.is_empty ())
-return;
+{
+  const char *fuzzy_name = lookup_name_fuzzy (name, FUZZY_LOOKUP_NAME);
+  if (fuzzy_name)
+   {
+ gcc_rich_location richloc (location);
+ richloc.add_fixit_misspelled_id (location, fuzzy_name);
+ inform_at_rich_loc (&richloc, "suggested alternative: %qs",
+ fuzzy_name);
+   }
+  return;
+}
 
   inform_n (location, candidates.length (),
"suggested alternative:",
@@ -4672,6 +4686,104 @@ qualified_lookup_using_namespace (tree name, tree scope,
   return result->value != error_mark_node;
 }
 
+/* Helper function for lookup_name_fuzzy.
+   Traverse binding level LVL, looking for good name matches for NAME
+   (and BM).  */
+static void
+consider_binding_level (tree name, best_match  &bm,
+   cp_binding_level *lvl, bool look_within_fields,
+   enum lookup_name_fuzzy_kind kind)
+{
+  if (look_within_fields)
+if (lvl->this_entity && TREE_CODE (lvl->this_entity) == RECORD_TYPE)
+  {
+   tree type = lvl->this_entity;
+   bool want_type_p = (kind == FUZZY_LOOKUP_TYPENAME);
+   tree best_matching_field
+ = lookup_member_fuzzy (type, name, want_type_p);
+   if (best_matching_field)
+ bm.consider (best_matching_field);
+  }
+
+  for (tree t = lvl->names; t; t = TREE_CHAIN (t))
+{
+  /* Don't use bindin

Re: [AArch64][2/3] Optimize aarch64_add_constant to generate better addition sequences

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 14:02, Jiong Wang wrote:
> This patch optimize immediate addition sequences generated by
> aarch64_add_constant.
> 
> The current addition sequences generated are:
> 
>   * If immediate fit into unsigned 12bit range, generate single add/sub.
>   * Otherwise if it fit into unsigned 24bit range, generate two
> add/sub.
> 
>   * Otherwise invoke general constant build function.
> 
> 
> This haven't considered the situation where immedate can't fit into
> unsigned 12bit range, but can fit into single mov instruction for which
> case we generate one move and one addition.  The move won't touch the
> destination register thus the sequences is better than two additions
> which both touch the destination register.
> 
> 
> This patch thus optimize the addition sequences into:
> 
>   * If immediate fit into unsigned 12bit range, generate single add/sub.
>  
>   * Otherwise if it fit into unsigned 24bit range, generate two add/sub.
> And don't do this if it fit into single move instruction, in which case
> move the immedaite to scratch register firstly, then generate one
> addition to add the scratch register to the destination register.
>   * Otherwise invoke general constant build function.
> 
> 
> OK for trunk?
> 
> gcc/
> 2016-07-20  Jiong Wang  
> 
> * config/aarch64/aarch64.c (aarch64_add_constant): Optimize
> instruction sequences.
> 
> 

OK with the updates to the comments as mentioned below.

> build-const-2.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> aeea3b3ebc514663043ac8d7cd13361f06f78502..41844a101247c939ecb31f8a8c17cf79759255aa
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1865,6 +1865,47 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>aarch64_internal_mov_immediate (dest, imm, true, GET_MODE (dest));
>  }
>  
> +/* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate 
> value if
> +   it is necessary.  */

Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to hold an
intermediate value if necessary.


> +
> +static void
> +aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
> +   HOST_WIDE_INT delta)
> +{
> +  HOST_WIDE_INT mdelta = abs_hwi (delta);
> +  rtx this_rtx = gen_rtx_REG (mode, regnum);
> +
> +  /* Do nothing if mdelta is zero.  */
> +  if (!mdelta)
> +return;
> +
> +  /* We only need single instruction if the offset fit into add/sub.  */
> +  if (aarch64_uimm12_shift (mdelta))
> +{
> +  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
> +  return;
> +}
> +
> +  /* We need two add/sub instructions, each one perform part of the
> + addition/subtraction, but don't this if the addend can be loaded into
> + register by single instruction, in that case we prefer a move to scratch
> + register following by addition.  */

We need two add/sub instructions, each one performing part of the
calculation.  Don't do this if the addend can be loaded into
register with a single instruction, in that case we prefer a move to a
scratch register following by an addition.



> +  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
> +{
> +  HOST_WIDE_INT low_off = mdelta & 0xfff;
> +
> +  low_off = delta < 0 ? -low_off : low_off;
> +  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
> +  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
> +  return;
> +}
> +
> +  /* Otherwise use generic function to handle all other situations.  */
> +  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
> +  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
> +  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +}
> +
>  static bool
>  aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
>tree exp ATTRIBUTE_UNUSED)
> @@ -3337,44 +3378,6 @@ aarch64_final_eh_return_addr (void)
>  - 2 * UNITS_PER_WORD));
>  }
>  
> -static void
> -aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
> -   HOST_WIDE_INT delta)
> -{
> -  HOST_WIDE_INT mdelta = delta;
> -  rtx this_rtx = gen_rtx_REG (mode, regnum);
> -  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
> -
> -  if (mdelta < 0)
> -mdelta = -mdelta;
> -
> -  if (mdelta >= 4096 * 4096)
> -{
> -  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, 
> mode);
> -  emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
> -}
> -  else if (mdelta > 0)
> -{
> -  if (mdelta >= 4096)
> - {
> -   emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
> -   rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12));
> -   if (delta < 0)
> - emit_insn (gen_rtx_SET (this_rtx,
> - gen_rtx_MINUS (mode, this_rtx, shift)));
> -   else
> -

Re: [PATCH] disable ifunc on *-musl by default

2016-07-20 Thread David Edelsohn

On Wed, Jul 20, 2016 at 7:09 AM, Szabolcs Nagy  wrote:
> On 20/07/16 14:45, David Edelsohn wrote:
>>> Musl libc does not support gnu ifunc, so disable it by default.
>>> (not disabled on s390-* since that has no musl support yet.)
>>
>> Musl libc now supports PPC64. Support for s390 is in progress.
>>
>
> it seemed to me that on ppc64 ifunc is disabled by default.
> (at least it is not enabled in config.gcc)

Ifunc is used on PPC64.

- David

Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 14:03, Jiong Wang wrote:
> Those stack adjustment sequences inside aarch64_expand_prologue/epilogue
> are doing exactly what's aarch64_add_constant offered, except they also
> need to be aware of dwarf generation.
> 
> This patch teach existed aarch64_add_constant about dwarf generation and
> currently SP register is supported.  Whenever SP is updated, there
> should be CFA update, we then mark these instructions as frame related,
> and if the update is too complex for gcc to guess the adjustment, we
> attach explicit annotation.
> 
> Both dwarf frame info size and pro/epilogue scheduling are improved after
> this patch as aarch64_add_constant has better utilization of scratch
> register.
> 
> OK for trunk?
> 
> gcc/
> 2016-07-20  Jiong Wang  
> 
> * config/aarch64/aarch64.c (aarch64_add_constant): Mark
> instruction as frame related when it is.  Generate CFA
> annotation when it's necessary.
> (aarch64_expand_prologue): Use aarch64_add_constant.
> (aarch64_expand_epilogue): Likewise.
> 

Are you sure using aarch64_add_constant is unconditionally safe?  Stack
adjustments need to be done very carefully to ensure that we never
transiently deallocate part of the stack.

R.


> 
> build-const-3.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 41844a101247c939ecb31f8a8c17cf79759255aa..b38f3f1e8f85a5f3191d0c96080327dac7b2eaed
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1874,6 +1874,8 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>  {
>HOST_WIDE_INT mdelta = abs_hwi (delta);
>rtx this_rtx = gen_rtx_REG (mode, regnum);
> +  bool frame_related_p = (regnum == SP_REGNUM);
> +  rtx_insn *insn;
>  
>/* Do nothing if mdelta is zero.  */
>if (!mdelta)
> @@ -1882,7 +1884,8 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>/* We only need single instruction if the offset fit into add/sub.  */
>if (aarch64_uimm12_shift (mdelta))
>  {
> -  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
> +  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
>return;
>  }
>  
> @@ -1895,15 +1898,23 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>HOST_WIDE_INT low_off = mdelta & 0xfff;
>  
>low_off = delta < 0 ? -low_off : low_off;
> -  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
> -  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
> +  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
> +  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
>return;
>  }
>  
>/* Otherwise use generic function to handle all other situations.  */
>rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
>aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
> -  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +  if (frame_related_p)
> +{
> +  RTX_FRAME_RELATED_P (insn) = frame_related_p;
> +  rtx adj = plus_constant (mode, this_rtx, delta);
> +  add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
> +}
>  }
>  
>  static bool
> @@ -3038,36 +3049,7 @@ aarch64_expand_prologue (void)
>frame_size -= (offset + crtl->outgoing_args_size);
>fp_offset = 0;
>  
> -  if (frame_size >= 0x100)
> - {
> -   rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
> -   emit_move_insn (op0, GEN_INT (-frame_size));
> -   insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
> -
> -   add_reg_note (insn, REG_CFA_ADJUST_CFA,
> - gen_rtx_SET (stack_pointer_rtx,
> -  plus_constant (Pmode, stack_pointer_rtx,
> - -frame_size)));
> -   RTX_FRAME_RELATED_P (insn) = 1;
> - }
> -  else if (frame_size > 0)
> - {
> -   int hi_ofs = frame_size & 0xfff000;
> -   int lo_ofs = frame_size & 0x000fff;
> -
> -   if (hi_ofs)
> - {
> -   insn = emit_insn (gen_add2_insn
> - (stack_pointer_rtx, GEN_INT (-hi_ofs)));
> -   RTX_FRAME_RELATED_P (insn) = 1;
> - }
> -   if (lo_ofs)
> - {
> -   insn = emit_insn (gen_add2_insn
> - (stack_pointer_rtx, GEN_INT (-lo_ofs)));
> -   RTX_FRAME_RELATED_P (insn) = 1;
> - }
> - }
> +  aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -frame_size);
>  }
>else
>  frame_size = -1;
> @@ -3287,31 +3269,7 @@ aarch64_expand_epilogue (bool for_sibcall)
>if

Re: [PATCH v2] C++ FE: handle misspelled identifiers and typenames

2016-07-20 Thread Jakub Jelinek

On Wed, Jul 20, 2016 at 10:46:58AM -0400, David Malcolm wrote:
> +  /* Skip anticipated decls of builtin functions.  */
> +  if (TREE_CODE (t) == FUNCTION_DECL)
> + if (DECL_BUILT_IN (t))
> +   if (DECL_ANTICIPATED (t))

Just a style comment, wouldn't
  if (TREE_CODE (t) == FUNCTION_DECL
  && DECL_BUILT_IN (t)
  && DECL_ANTICIPATED (t))
continue;
be better?

Jakub

Re: [patch,avr] make progmem work on AVR_TINY, use TARGET_ADDR_SPACE_DIAGNOSE_USAGE

2016-07-20 Thread Georg-Johann Lay


On 18.07.2016 08:58, Denis Chertykov wrote:

2016-07-15 18:26 GMT+03:00 Georg-Johann Lay :

This patch needs new hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00839.html

This patch turns attribute progmem into a working feature for AVR_TINY
cores.

It boils down to adding 0x4000 to all symbols with progmem:  Flash memory
can be seen in the RAM address space starting at 0x4000, i.e. data in flash
can be read by means of LD instruction if we add offsets of 0x4000.  There
is no need for special access macros like pgm_read_* or special address
spaces as there is nothing like a LPM instruction.

This is simply achieved by setting a respective symbol_ref_flag, and when
such a symbol has to be printed, then plus_constant with 0x4000 is used.

Diagnosing of unsupported address spaces is now performed by
TARGET_ADDR_SPACE_DIAGNOSE_USAGE which has exact location information.
Hence there is no need to scan all decls for invalid address spaces.

For AVR_TINY, alls address spaces have been disabled.  They are of no use.
Supporting __flash would just make the backend more complicated without any
gains.


Ok for trunk?

Johann


gcc/
* doc/extend.texi (AVR Variable Attributes) [progmem]: Add
documentation how it works on reduced Tiny cores.
(AVR Named Address Spaces): No support for reduced Tiny.
* avr-protos.h (avr_addr_space_supported_p): New prototype.
* avr.c (AVR_SYMBOL_FLAG_TINY_PM): New macro.
(avr_address_tiny_pm_p): New static function.
(avr_print_operand_address) [AVR_TINY]: Add AVR_TINY_PM_OFFSET
if the address is in progmem.
(avr_assemble_integer): Same.
(avr_encode_section_info) [AVR_TINY]: Set AVR_SYMBOL_FLAG_TINY_PM
for symbol_ref in progmem.
(TARGET_ADDR_SPACE_DIAGNOSE_USAGE): New hook define...
(avr_addr_space_diagnose_usage): ...and implementation.
(avr_addr_space_supported_p): New function.
(avr_nonconst_pointer_addrspace, avr_pgm_check_var_decl): Only
report bad address space usage if that space is supported.
(avr_insert_attributes): Same.  No more complain about unsupported
address spaces.
* avr.h (AVR_TINY_PM_OFFSET): New macro.
* avr-c.c (tm_p.h): Include it.
(avr_cpu_cpp_builtins) [__AVR_TINY_PM_BASE_ADDRESS__]: Use
AVR_TINY_PM_OFFSET instead of magic 0x4000 when built-in def'ing.
Only define addr-space related built-in macro if
avr_addr_space_supported_p.
gcc/testsuite/
* gcc.target/avr/torture/tiny-progmem.c: New test.



Approved.


Committed, but I split it into 2 change-sets.  The only effective change is 
that the hook has a different prototype (returns void instead of bool).



Part1: Implement new target hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE.

https://gcc.gnu.org/r238519

gcc/
* avr-protos.h (avr_addr_space_supported_p): New prototype.
* avr.c (TARGET_ADDR_SPACE_DIAGNOSE_USAGE): New hook define...
(avr_addr_space_diagnose_usage): ...and implementation.
(avr_addr_space_supported_p): New function.
(avr_nonconst_pointer_addrspace, avr_pgm_check_var_decl): Only
report bad address space usage if that space is supported.
(avr_insert_attributes): Same.  No more complain about unsupported
address spaces.
* avr-c.c (tm_p.h): Include it.
(avr_cpu_cpp_builtins): Only define addr-space related built-in
macro if avr_addr_space_supported_p.

Part2: Make progmem work for reduced Tiny cores

https://gcc.gnu.org/r238525

gcc/
Implement attribute progmem on reduced Tiny cores by adding
flash offset 0x4000 to respective symbols.

PR target/71948
* doc/extend.texi (AVR Variable Attributes) [progmem]: Add
documentation how it works on reduced Tiny cores.
(AVR Named Address Spaces): No support for reduced Tiny.
* config/avr/avr.c (AVR_SYMBOL_FLAG_TINY_PM): New macro.
(avr_address_tiny_pm_p): New static function.
(avr_print_operand_address) [AVR_TINY]: Add AVR_TINY_PM_OFFSET
if the address is in progmem.
(avr_assemble_integer): Same.
(avr_encode_section_info) [AVR_TINY]: Set AVR_SYMBOL_FLAG_TINY_PM
for symbol_ref in progmem.
* config/avr/avr.h (AVR_TINY_PM_OFFSET): New macro.
* config/avr/avr-c.c (avr_cpu_cpp_builtins): Use it instead of
magic 0x4000 when built-in def'ing __AVR_TINY_PM_BASE_ADDRESS__.
gcc/testsuite/
PR target/71948
* gcc.target/avr/torture/tiny-progmem.c: New test.

Index: config/avr/avr-c.c
===
--- config/avr/avr-c.c	(revision 238518)
+++ config/avr/avr-c.c	(revision 238519)
@@ -26,7 +26,7 @@
 #include "c-family/c-common.h"
 #include "stor-layout.h"
 #include "langhooks.h"
-
+#include "tm_p.h"
 
 /* IDs for all the AVR builtins.  */
 
@@ -253,7 +253,10 @@ avr_register

Re: [PATCH] disable ifunc on *-musl by default

2016-07-20 Thread Szabolcs Nagy

On 20/07/16 15:13, David Edelsohn wrote:
> On Wed, Jul 20, 2016 at 7:09 AM, Szabolcs Nagy  wrote:
>> On 20/07/16 14:45, David Edelsohn wrote:
 Musl libc does not support gnu ifunc, so disable it by default.
 (not disabled on s390-* since that has no musl support yet.)
>>>
>>> Musl libc now supports PPC64. Support for s390 is in progress.
>>>
>>
>> it seemed to me that on ppc64 ifunc is disabled by default.
>> (at least it is not enabled in config.gcc)
> 
> Ifunc is used on PPC64.
> 

only if you build gcc with --enable-gnu-indirect-function

otherwise the ifunc attribute does not work, and target
libs (e.g. libatomic on x86_64) don't use ifunc.

in glibc i think you need --enable-multiarch to use ifunc,
but it handles ifunc in user code independently of that.

i just want to make sure that --enable-gnu-indirect-function
is not the default with *-musl (since musl has no ifunc support).

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-07-20 Thread Ilya Enkovich

On 14 Jul 16:04, Jeff Law wrote:
> On 06/28/2016 06:24 AM, Ilya Enkovich wrote:
> 
> >
> >Here is an updated patch version.
> >
> >Thanks,
> >Ilya
> >--
> >gcc/
> >
> >+/* Function vect_gen_loop_masks.
> >+
> >+   Create masks to mask a loop described by LOOP_VINFO.  Masks
> >+   are created according to LOOP_VINFO_REQUIRED_MASKS and are stored
> >+   into MASKS vector.
> >+
> >+   Index of a mask in a vector is computed according to a number
> >+   of masks's elements.  Masks are sorted by number of its elements
> >+   in descending order.  Index 0 is used to access a mask with
> >+   current_vector_size elements.  Among masks with the same number
> >+   of elements the one with lower index is used to mask iterations
> >+   with smaller iteration counter.  Note that you may get NULL elements
> >+   for masks which are not required.  Use vect_get_mask_index_for_elems
> >+   or vect_get_mask_index_for_type to access resulting vector.  */
> >+
> >+static void
> >+vect_gen_loop_masks (loop_vec_info loop_vinfo, vec *masks)
> I find myself wondering if this ought to be broken down a bit (without
> changing the underlying semantics).
> 
> >+
> >+  /* Create narrowed masks.  */
> >+  cur_mask_elems = iv_elems;
> >+  nmasks = ivs.length ();
> >+  while (cur_mask_elems < max_mask_elems)
> >+{
> >+  prev_mask = vect_get_mask_index_for_elems (cur_mask_elems);
> >+
> >+  cur_mask_elems <<= 1;
> >+  nmasks >>= 1;
> >+
> >+  cur_mask = vect_get_mask_index_for_elems (cur_mask_elems);
> >+
> >+  mask_type = build_truth_vector_type (cur_mask_elems, vec_size);
> >+
> >+  for (unsigned i = 0; i < nmasks; i++)
> >+{
> >+  tree mask_low = (*masks)[prev_mask++];
> >+  tree mask_hi = (*masks)[prev_mask++];
> >+  mask = vect_get_new_ssa_name (mask_type, vect_mask_var);
> >+  stmt = gimple_build_assign (mask, VEC_PACK_TRUNC_EXPR,
> >+  mask_low, mask_hi);
> >+  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> >+  (*masks)[cur_mask++] = mask;
> >+}
> >+}
> For example, pull this into its own function as well as the code to create
> widened masks.  In fact, didn't I see those functions in one of the other
> patches as their own separate subroutines?

There were functions which check we may generate such masks.  Here we
actually generate them.  I moved the code into separate functions.

> 
> It's not a huge deal and I don't think it requires another round of review.
> I just found myself scrolling through multiple pages of this function and
> thought it'd be slightly easier to grok if were simply smaller.
> 
> 
> >+
> >+/* Function vect_mask_reduction_stmt.
> >+
> >+   Mask given vectorized reduction statement STMT using
> >+   MASK.  In case scalar reduction statement is vectorized
> >+   into several vector statements then PREV holds a
> >+   preceding vetor statement copy for STMT.
> s/vetor/vector/
> 
> With the one function split up and the typo fix I think this is OK for the
> trunk when the set as a whole is ready.
> 
> jeff
> 
> 

Here is an updated version.

Thanks,
Ilya
--
gcc/

2016-07-20  Ilya Enkovich  

* dbgcnt.def (vect_tail_combine): New.
* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
epilogue combined with loop body.
(vect_do_peeling_for_loop_bound): LIkewise.
(vect_do_peeling_for_alignment): ???
* tree-vect-loop.c Include alias.h and dbgcnt.h.
(vect_estimate_min_profitable_iters): Add 
ret_min_profitable_combine_niters
arg, compute number of iterations for which loop epilogue combining is
profitable.
(vect_generate_tmps_on_preheader): Support combined apilogue.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_create_narrowed_masks): New.
(vect_create_widened_masks): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_combine_loop_epilogue): New.
(vect_transform_loop): Support combined apilogue.


diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 78ddcc2..73c2966 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -192,4 +192,5 @@ DEBUG_COUNTER (treepre_insert)
 DEBUG_COUNTER (tree_sra)
 DEBUG_COUNTER (vect_loop)
 DEBUG_COUNTER (vect_slp)
+DEBUG_COUNTER (vect_tail_combine)
 DEBUG_COUNTER (dom_unreachable_edges)
diff --git a/gcc/params.def b/gcc/params.def
index b86d592..745da4c 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1232,6 +1232,11 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
  "Maximum number of may-defs visited when devirtualizing "
  "speculatively", 50, 0, 0)
 
+DEFPARAM (PARAM_VECT_COST_INCREA

Re: [PATCH v2] C++ FE: handle misspelled identifiers and typenames

2016-07-20 Thread Jason Merrill

On Wed, Jul 20, 2016 at 10:46 AM, David Malcolm  wrote:
> @@ -1407,6 +1407,10 @@ lookup_field_fuzzy_info::fuzzy_lookup_field (tree type)
> The TYPE_FIELDS of TYPENAME_TYPE is its TYPENAME_TYPE_FULLNAME.  */
>  return;
>
> +  /* TYPE_FIELDS is not valid for a TYPE_PACK_EXPANSION.  */
> +  if (TREE_CODE (type) == TYPE_PACK_EXPANSION)
> +return;

Instead of checking for various invalid codes, why don't we just check
CLASS_TYPE_P at the top, like fuzzy_lookup_fnfields?

OK with that change.

Jason

Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-20 Thread Jiong Wang


On 20/07/16 15:18, Richard Earnshaw (lists) wrote:

On 20/07/16 14:03, Jiong Wang wrote:

Those stack adjustment sequences inside aarch64_expand_prologue/epilogue
are doing exactly what's aarch64_add_constant offered, except they also
need to be aware of dwarf generation.

This patch teach existed aarch64_add_constant about dwarf generation and
currently SP register is supported.  Whenever SP is updated, there
should be CFA update, we then mark these instructions as frame related,
and if the update is too complex for gcc to guess the adjustment, we
attach explicit annotation.

Both dwarf frame info size and pro/epilogue scheduling are improved after
this patch as aarch64_add_constant has better utilization of scratch
register.

OK for trunk?

gcc/
2016-07-20  Jiong Wang  

 * config/aarch64/aarch64.c (aarch64_add_constant): Mark
 instruction as frame related when it is.  Generate CFA
 annotation when it's necessary.
 (aarch64_expand_prologue): Use aarch64_add_constant.
 (aarch64_expand_epilogue): Likewise.


Are you sure using aarch64_add_constant is unconditionally safe?  Stack
adjustments need to be done very carefully to ensure that we never
transiently deallocate part of the stack.


Richard,

  Thanks for the review, yes, I believe using aarch64_add_constant is 
unconditionally
safe here.  Because we have generated a stack tie to clobber the whole 
memory thus

prevent any instruction which access stack be scheduled after that.

  The access to deallocated stack issue was there and fixed by

  https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html.

 aarch64_add_constant itself is generating the same instruction 
sequences as the

original code, except for a few cases, it will prefer

  move scratch_reg, #imm
  add sp, sp, scratch_reg

than:
  add sp, sp, #imm_part1
  add sp, sp, #imm_part2

Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns

2016-07-20 Thread Segher Boessenkool

On Wed, Jul 20, 2016 at 01:23:44PM +0200, Bernd Schmidt wrote:
> >>>But you need the profile to make even reasonably good decisions.
> >>
> >>I'm not worried about making cost decisions: as far as I'm concerned
> >>it's perfectly fine for that. I'm worried about correctness - you can't
> >>validly save registers inside a loop.
> >
> >Of course you can.  It needs to be paired with a restore; and we do
> >that just fine.
> > Pretty much *all* implementations in the literature do this, fwiw.

> I, however, fail to see where this happens.

See for example one of the better papers on shrink-wrapping, "Post Register
Allocation Spill Code Optimization", by Lupo and Wilken.

See the problem definition (section 2), figure 1 for a figure clearly
showing multiple save/restore (and executed more than once). section 4.2
for why we don't need to look at loops.

[ In this paper prologue/epilogue pairs are only placed around SESE
regions, which we do not have many in GCC that late in RTL (often the
CFG isn't even reducible); there is no reason to restrict to SESE regions
though ].

> If you have references to 
> somewhere where this algorithm is described, that would be helpful, 

No, of course not, because I just made this up, as should be clear.

The problem definition is simple: we have a CFG, and some of the blocks
in that CFG need some things done by the prologue done before they
execute.  We don't want to run that prologue code more often than
necessary, because it can be expensive (compared to the parts of the
function that are executed at all).  Considering all possible combinations
of blocks (or edges) where we can place a prologue is not computationally
feasible.  But there is a shortcut: if a block X gets a prologue, all
blocks dominated by it will for zero cost have that prologue established
as well (by simply not doing an epilogue until they are reached).  So
this algo does the obvious thing, simply walking the dom tree (which is
O(n)).  Then, from the prologue placement, we compute which blocks will
execute with that concern "active"; and we insert prologue/epilogue code
to make that assignment true (a prologue or epilogue for every edge that
crosses from "does not have" to "does have", or the other way around; and
then there is the head/tail thing because cross-jumping fails to unify
many of those *logues, so we take care of it manually).

> because at this stage I think I really don't understand what you're 
> trying to achieve. The submission lacks examples.

It says what it does right at the start of the head comment:

"""
   Instead of putting all of the prologue and epilogue in one spot, we
   can put parts of it in places that are executed less frequently.  The
   following code does this, for concerns that can have more than one
   prologue and epilogue, and where those pro- and epilogues can be
   executed more than once.
"""

followed by a bunch of detail.

> So I could see things could work if you place an epilogue part in the 
> last block of a loop if the start of the loop contains a corresponding 
> part of the prologue, but taking just the comment in the code:
>Prologue concerns are placed in such a way that they are executed as
>infrequently as possible.  Epilogue concerns are put everywhere where
>there is an edge from a bb dominated by such a prologue concern to a
>bb not dominated by one.
> 
> this describes no mechanism by which such a thing would happen.

Sure it does.  The edge leaving the loop, for example.

You can have a prologue/epilogue pair within a loop (which is unusual,
but *can* happen, e.g. as part of a conditional that executes almost
never -- this is quite frequent btw, assertions, on-the-run initialisation,
etc.)

The situation you describe has all the blocks in the loop needing the
prologue (but, say, nothing outside the loop).  Then of course the prologue
is placed on the entry (edge) into the loop, and the epilogue on the exit
edge(s).

> And I 
> fail to see how moving parts of the prologue into a loop would be 
> beneficial as an optimization.

for (i = 0; i < 10; i++)
if (less_than_one_in_ten_times)
do_something_that_needs_a_prologue;

or

for (i = 0; i < 10; i++)
if (whatever)
do_something_that_needs_a_prologue_and_does_not_return;

or whatever other situation.  We do not have natural loops, often.  The
algorithm places prologues so that their dynamic execution frequency is
optimal, which results in their dynamic execution being optimal, whatever
the CFG looks like.

Segher

Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 16:02, Jiong Wang wrote:
> On 20/07/16 15:18, Richard Earnshaw (lists) wrote:
>> On 20/07/16 14:03, Jiong Wang wrote:
>>> Those stack adjustment sequences inside aarch64_expand_prologue/epilogue
>>> are doing exactly what's aarch64_add_constant offered, except they also
>>> need to be aware of dwarf generation.
>>>
>>> This patch teach existed aarch64_add_constant about dwarf generation and
>>> currently SP register is supported.  Whenever SP is updated, there
>>> should be CFA update, we then mark these instructions as frame related,
>>> and if the update is too complex for gcc to guess the adjustment, we
>>> attach explicit annotation.
>>>
>>> Both dwarf frame info size and pro/epilogue scheduling are improved
>>> after
>>> this patch as aarch64_add_constant has better utilization of scratch
>>> register.
>>>
>>> OK for trunk?
>>>
>>> gcc/
>>> 2016-07-20  Jiong Wang  
>>>
>>>  * config/aarch64/aarch64.c (aarch64_add_constant): Mark
>>>  instruction as frame related when it is.  Generate CFA
>>>  annotation when it's necessary.
>>>  (aarch64_expand_prologue): Use aarch64_add_constant.
>>>  (aarch64_expand_epilogue): Likewise.
>>>
>> Are you sure using aarch64_add_constant is unconditionally safe?  Stack
>> adjustments need to be done very carefully to ensure that we never
>> transiently deallocate part of the stack.
> 
> Richard,
> 
>   Thanks for the review, yes, I believe using aarch64_add_constant is
> unconditionally
> safe here.  Because we have generated a stack tie to clobber the whole
> memory thus
> prevent any instruction which access stack be scheduled after that.
> 
>   The access to deallocated stack issue was there and fixed by
> 
>   https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html.
> 
>  aarch64_add_constant itself is generating the same instruction
> sequences as the
> original code, except for a few cases, it will prefer
> 
>   move scratch_reg, #imm
>   add sp, sp, scratch_reg
> 
> than:
>   add sp, sp, #imm_part1
>   add sp, sp, #imm_part2
> 
> 
> 
> 

But can you guarantee we will never get and add and a sub in a single
adjustment?  If not, then we need to ensure the two instructions appear
in the right order.

R.

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra

Richard Earnshaw wrote:
> Both of which look reasonable to me.

Yes the code we generate for these examples is fine, I don't believe this
example ever went bad. It's just the cost calculation that is incorrect with
the outer check.

Wilco

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Richard Earnshaw (lists)

On 20/07/16 16:28, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
>> Both of which look reasonable to me.
> 
> Yes the code we generate for these examples is fine, I don't believe this
> example ever went bad. It's just the cost calculation that is incorrect with
> the outer check.
> 
> Wilco
> 
> 

So under what circumstances does it lead to sub-optimal code?

R.

Re: fold x ^ y to 0 if x == y

2016-07-20 Thread Richard Biener

On Wed, 20 Jul 2016, Prathamesh Kulkarni wrote:

> On 8 July 2016 at 12:29, Richard Biener  wrote:
> > On Fri, 8 Jul 2016, Richard Biener wrote:
> >
> >> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
> >>
> >> > Hi Richard,
> >> > For the following test-case:
> >> >
> >> > int f(int x, int y)
> >> > {
> >> >int ret;
> >> >
> >> >if (x == y)
> >> >  ret = x ^ y;
> >> >else
> >> >  ret = 1;
> >> >
> >> >return ret;
> >> > }
> >> >
> >> > I was wondering if x ^ y should be folded to 0 since
> >> > it's guarded by condition x == y ?
> >> >
> >> > optimized dump shows:
> >> > f (int x, int y)
> >> > {
> >> >   int iftmp.0_1;
> >> >   int iftmp.0_4;
> >> >
> >> >   :
> >> >   if (x_2(D) == y_3(D))
> >> > goto ;
> >> >   else
> >> > goto ;
> >> >
> >> >   :
> >> >   iftmp.0_4 = x_2(D) ^ y_3(D);
> >> >
> >> >   :
> >> >   # iftmp.0_1 = PHI 
> >> >   return iftmp.0_1;
> >> >
> >> > }
> >> >
> >> > The attached patch tries to fold for above case.
> >> > I am checking if op0 and op1 are equal using:
> >> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
> >> >&& operand_equal_p (vr1->min, vr1->max)
> >> >&& operand_equal_p (vr2->min, vr2->max))
> >> >   { /* equal /* }
> >> >
> >> > I suppose intersection would check if op0 and op1 have equivalent ranges,
> >> > and added operand_equal_p check to ensure that there is only one
> >> > element within the range. Does that look correct ?
> >> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> >>
> >> I think VRP is the wrong place to catch this and DOM should have but it
> >> does
> >>
> >> Optimizing block #3
> >>
> >> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> >> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> >> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> >> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> >> 0>>> COPY x_2(D) = y_3(D)
> >> 0>>> COPY y_3(D) = x_2(D)
> >> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >>   Replaced 'x_2(D)' with variable 'y_3(D)'
> >>   Replaced 'y_3(D)' with variable 'x_2(D)'
> >>   Folded to: ret_4 = x_2(D) ^ y_3(D);
> >> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
> >>
> >> heh, registering both reqivalencies is obviously not going to help...
> >>
> >> The 2nd equivalence is from doing
> >>
> >>   /* We already recorded that LHS = RHS, with canonicalization,
> >>  value chain following, etc.
> >>
> >>  We also want to record RHS = LHS, but without any
> >> canonicalization
> >>  or value chain following.  */
> >>   if (TREE_CODE (rhs) == SSA_NAME)
> >> const_and_copies->record_const_or_copy_raw (rhs, lhs,
> >> SSA_NAME_VALUE (rhs));
> >>
> >> generally recording both is not helpful.  Jeff?  This seems to be
> >> r233207 (fix for PR65917) which must have regressed this testcase.
> >
> > Just verified it works fine on the GCC 5 branch:
> >
> > Optimizing block #3
> >
> > 0>>> COPY y_3(D) = x_2(D)
> > 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> > 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> > 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> > 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> > Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >   Replaced 'y_3(D)' with variable 'x_2(D)'
> > Applying pattern match.pd:240, gimple-match.c:11346
> > gimple_simplified to ret_4 = 0;
> >   Folded to: ret_4 = 0;
> I have reported it as PR71947.
> Could you help me point out how to fix this ?

Not record both equivalences.  This might break the testcase it was
introduced for (obviously).  Which is why I CCed Jeff for his opinion.

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra

Richard Earnshaw wrote:
> So under what circumstances does it lead to sub-optimal code?

If the cost is incorrect Combine can make the wrong decision, for example
whether to emit a multiply-add or not. I'm not sure whether this still happens
as Kyrill fixed several issues in Combine since this patch was written.

Wilco

Re: fold x ^ y to 0 if x == y

2016-07-20 Thread Jeff Law


On 07/20/2016 09:35 AM, Richard Biener wrote:

I have reported it as PR71947.
Could you help me point out how to fix this ?


Not record both equivalences.  This might break the testcase it was
introduced for (obviously).  Which is why I CCed Jeff for his opinion.

It's on my todo list.  I'm still catching up from my PTO last month.

It'll certainly regress the testcase that was introduced when we 
recorded both equivalences.


jeff

Re: [PATCH]: Use HOST_WIDE_INT_{,M}1{,U} some more

2016-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2016 at 3:15 PM, Bernd Schmidt  wrote:
>
>
> On 07/20/2016 02:25 PM, Uros Bizjak wrote:
>>
>> 2016-07-19 14:46 GMT+02:00 Uros Bizjak :
>>>
>>> The result of exercises with sed in gcc/ directory.
>>
>>
>> Some more conversions:
>>
>> 2016-07-20  Uros Bizjak  
>>
>> * cse.c: Use HOST_WIDE_INT_M1 instead of ~(HOST_WIDE_INT) 0.
>> * combine.c: Use HOST_WIDE_INT_M1U instead of
>> ~(unsigned HOST_WIDE_INT) 0.
>> * double-int.h: Ditto.
>> * dse.c: Ditto.
>> * dwarf2asm.c:Ditto.
>> * expmed.c: Ditto.
>> * genmodes.c: Ditto.
>> * match.pd: Ditto.
>> * read-rtl.c: Ditto.
>> * tree-ssa-loop-ivopts.c: Ditto.
>> * tree-ssa-loop-prefetch.c: Ditto.
>> * tree-vect-generic.c: Ditto.
>> * tree-vect-patterns.c: Ditto.
>> * tree.c: Ditto.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>>
>> OK for mainline?
>
>
> I think this is a good set of changes which makes the code easier to read.
> Can I impose one additional requirement, building before/after and verifying
> that all the object files are identical? If you do this, these and all other
> similar changes are preapproved.

I did check for differences of object files in stage1 and stage3
(final) directory when the compiler was bootstrapped w/ and w/o the
patch. As expected, objdump -dr didn't show any, so I'm pretty
confident that the sources are functionally the same.

I have committed the patch to mainline.

Uros.

Re: Merge switch statements in tree-cfgcleanup

2016-07-20 Thread Jeff Law


On 07/20/2016 05:14 AM, Bernd Schmidt wrote:

On 07/19/2016 01:18 PM, Richard Biener wrote:

On Tue, Jul 19, 2016 at 1:07 PM, Bernd Schmidt
 wrote:

On 07/19/2016 12:35 PM, Richard Biener wrote:


I think that start/end_recording_case_labels also merged
adjacent labels via group_case_labels_stmt.  Not sure why you
need to stop recording case labels during the transform.  Is
this because you are building a new switch stmt?



It's because the cached mapping gets invalidated. Look in
tree-cfg, it has a edge_to_cases map which I think cannot be
maintained if you modify the structure. I certainly got lots of
internal errors until I added that pair of calls.


Yeah, I see that.  OTOH cfgcleanup relies on this cache to be
efficient and you (repeatedly) clear it.  Clearing parts of it
should be sufficient and if you used redirect_edge_and_branch
instead of redirect_edge_pred it would have maintained the cache as
far as I can see,


I don't think that would work, since we're modifying and/or
discarding case labels as well and they can't remain part of the
cache.


or you can make sure to maintain it yourself or just clear the info
associated with the edges you redirect from one switch to another.


How's this? Tested as before.
So I'm going to let Richi run with the review on this one since the two 
of you are already iterating.  But I did have one comment on the 
placement of the pass.


I believe one of the key things to consider for whether or not something 
like this belongs in the cfgcleanup code is whether or not the 
optimization is likely exposed repeatedly through the optimization 
pipeline.  If it's mostly a source level issue or only usually exposed 
by a limited set of optimizers, then a separate pass might be better.




jeff

Re: [PATCH] Fix unsafe function attributes for special functions (PR 71876)

2016-07-20 Thread Jeff Law


On 07/20/2016 05:53 AM, Richard Biener wrote:

Is it OK after boot-strap and regression-testing?


I think the __builtin_setjmp change is wrong - __builtin_setjmp is
_not_ 'setjmp' it is part of the GCC internal machinery (using setjmp
and longjmp in the end) for SJLJ exception handing.

Am I correct Eric?
That is correct.  __builtin_setjmp (and friends) are part of the SJLJ 
exception handling code.   They use a fixed sized buffer (5 words) to 
store the key items (as opposed to the OS defined jmp_buf structure 
which is usually considerably larger).


jeff

Re: [PATCH GCC]Improve no-overflow check in SCEV using value range info.

2016-07-20 Thread Bin.Cheng

On Wed, Jul 20, 2016 at 11:01 AM, Richard Biener
 wrote:
> On Tue, Jul 19, 2016 at 6:15 PM, Bin.Cheng  wrote:
>> On Tue, Jul 19, 2016 at 1:10 PM, Richard Biener
>>  wrote:
>>> On Mon, Jul 18, 2016 at 6:27 PM, Bin Cheng  wrote:
 Hi,
 Scalar evolution needs to prove no-overflow for source variable when 
 handling type conversion.  This is important because otherwise we would 
 fail to recognize result of the conversion as SCEV, resulting in missing 
 loop optimizations.  Take case added by this patch as an example, the loop 
 can't be distributed as memset call because address of memory reference is 
 not recognized.  At the moment, we rely on type overflow semantics and 
 loop niter info for no-overflow checking, unfortunately that's not enough. 
  This patch introduces new method checking no-overflow using value range 
 information.  As commented in the patch, value range can only be used when 
 source operand variable evaluates on every loop iteration, rather than 
 guarded by some conditions.

 This together with patch improving loop niter analysis 
 (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00736.html) can help 
 various loop passes like vectorization.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> @@ -3187,7 +3187,8 @@ idx_infer_loop_bounds (tree base, tree *idx, void 
>>> *dta)
>>>/* If access is not executed on every iteration, we must ensure that 
>>> overlow
>>>   may not make the access valid later.  */
>>>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>>> -  && scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
>>> loop->num),
>>> +  && scev_probably_wraps_p (NULL,
>>>
>>> use NULL_TREE for the null pointer constant of tree.
>>>
>>> +  /* Check if VAR evaluates in every loop iteration.  */
>>> +  gimple *def;
>>> +  if ((def = SSA_NAME_DEF_STMT (var)) != NULL
>>>
>>> def is never NULL but it might be a GIMPLE_NOP which has a NULL gimple_bb.
>>> Better check for ! SSA_DEFAULT_DEF_P (var)
>>>
>>> +  if (TREE_CODE (step) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE 
>>> (var)))
>>> +return false;
>>>
>>> this looks like a cheaper test so please do that first.
>>>
>>> +  step_wi = step;
>>> +  type = TREE_TYPE (var);
>>> +  if (tree_int_cst_sign_bit (step))
>>> +{
>>> +  diff = lower_bound_in_type (type, type);
>>> +  diff = minv - diff;
>>> +  step_wi = - step_wi;
>>> +}
>>> +  else
>>> +{
>>> +  diff = upper_bound_in_type (type, type);
>>> +  diff = diff - maxv;
>>> +}
>>>
>>> this lacks a comment - it's not obvious to me what the gymnastics
>>> with lower/upper_bound_in_type are supposed to achieve.
>>
>> Thanks for reviewing, I will prepare another version of patch.
>>>
>>> As VRP uses niter analysis itself I wonder how this fires back-to-back 
>>> between
>> I am not sure if I mis-understood the question.  If the VRP
>> information comes from loop niter, I think it will not change loop
>> niter or VRP2 in back because that's the best information we got in
>> the first place in niter.  If the VRP information comes from other
>> places (guard conditions?)  SCEV and loop niter after vrp1 might be
>> improved and thus VRP2.  There should be no problems in either case,
>> as long as GCC breaks the recursive chain among niter/scev/vrp
>> correctly.
>
> Ok.
>
>>> VRP1 and VRP2?  If the def of var dominates the latch isn't it enough to do
>>> a + 1 to check whether VRP bumped the range up to INT_MAX/MIN?  That is,
>>> why do we need to add step if not for the TYPE_OVERFLOW_UNDEFINED case
>>> of VRP handling the ranges optimistically?
>> Again, please correct me if I mis-understood.  Considering a variable
>> whose type is unsigned int and scev is {0, 4}_loop, the value range
>> could be computed as [0, 0xfffc], thus MAX + 1 is smaller than
>> type_MAX, but the scev could be overflow.
>
> Yes.  I was wondering about the case where VRP bumps the range to +INF
> because it gave up during iteration or because overflow behavior is undefined.
> Do I understand correctly that the code is mostly to improve the not
> undefined-overflow case?
Hi Richard,

I think we resolved these on IRC, here are words for the record.
The motivation case is for unsigned type loop counter, while the patch
should work for signed type in theory.  Considering a loop has signed
char counter i and it's used in array_ref[i + 10], since front-end
converts signed char addition into unsigned operation, we may need the
range information to prove (unsigned char)i + 10 doesn't overflow,
thus address of array reference is a scev.  I am not sure if the
signed case can be handled by current code, or there are other
fallouts preventing this patch from working.

>
> Also I was wondering if the range DEF dominates the latch then why
> do we necessarily need to add step to verify overflow?  Can't we do better
> if we for example see that the DEF is the loop header

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-07-20 Thread Jeff Law


On 07/20/2016 08:37 AM, Ilya Enkovich wrote:


Here is an updated version.

Thanks,
Ilya
--
gcc/

2016-07-20  Ilya Enkovich  

* dbgcnt.def (vect_tail_combine): New.
* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
epilogue combined with loop body.
(vect_do_peeling_for_loop_bound): LIkewise.
(vect_do_peeling_for_alignment): ???
* tree-vect-loop.c Include alias.h and dbgcnt.h.
(vect_estimate_min_profitable_iters): Add 
ret_min_profitable_combine_niters
arg, compute number of iterations for which loop epilogue combining is
profitable.
(vect_generate_tmps_on_preheader): Support combined apilogue.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_create_narrowed_masks): New.
(vect_create_widened_masks): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_combine_loop_epilogue): New.
(vect_transform_loop): Support combined apilogue.

I think this is OK.  We've just got patch #5 to work through now, correct?

Jeff

Re: Merge switch statements in tree-cfgcleanup

2016-07-20 Thread Bernd Schmidt


On 07/20/2016 06:09 PM, Jeff Law wrote:

So I'm going to let Richi run with the review on this one since the two
of you are already iterating.  But I did have one comment on the
placement of the pass.

I believe one of the key things to consider for whether or not something
like this belongs in the cfgcleanup code is whether or not the
optimization is likely exposed repeatedly through the optimization
pipeline.  If it's mostly a source level issue or only usually exposed
by a limited set of optimizers, then a separate pass might be better.


It can trigger before switchconv, and could expose optimization 
opportunities there, but I've also seen it trigger much later. Since I 
think it's cheap I don't see a reason not to put it in cfgcleanup, IMO 
it's the best fit conceptually.



Bernd

Re: [PATCH] Fix unsafe function attributes for special functions (PR 71876)

2016-07-20 Thread Bernd Edlinger

On 07/20/16 18:15, Jeff Law wrote:
> On 07/20/2016 05:53 AM, Richard Biener wrote:
>>> Is it OK after boot-strap and regression-testing?
>>
>> I think the __builtin_setjmp change is wrong - __builtin_setjmp is
>> _not_ 'setjmp' it is part of the GCC internal machinery (using setjmp
>> and longjmp in the end) for SJLJ exception handing.
>>
>> Am I correct Eric?
> That is correct.  __builtin_setjmp (and friends) are part of the SJLJ
> exception handling code.   They use a fixed sized buffer (5 words) to
> store the key items (as opposed to the OS defined jmp_buf structure
> which is usually considerably larger).
>
> jeff

Yes. __builtin_setjmp is declared in builtins.def:

DEF_GCC_BUILTIN(BUILT_IN_SETJMP, "setjmp", BT_FN_INT_PTR, 
ATTR_NOTHROW_LEAF_LIST)

It is visible in C as __builtin_setjmp, and it special_function_p
adds the ECF_RETURNS_TWICE | ECF_LEAF.

So it becomes equivalent to this:

int __builtin_setjmp(void*) __attribute__((returns_twice, nothrow,
leaf))

after special_function_p does it's magic.

If I remove the recognition of "__builtin_" from special_function_p
I have to add the returns_twice attribute in the DEF_GCC_BUILTIN.
Otherwise, I would get wrong code on all platforms, because
__builtin_setjmp saves only IP, SP, and FP registers.

Everything in the normal test suite keeps on going with the patch,
but is there anything that I have to do to make sure that the
SJLJ eh is still working? It is not the default on x86_64, right?

Bernd.

Re: [PATCH] Fix assembler arguments for -m16

2016-07-20 Thread Roger Pau Monne

On Wed, Jul 06, 2016 at 04:18:49PM +0200, Roger Pau Monne wrote:
> At the moment the -m16 option only passes the "--32" parameter to the
> assembler on glibc OSes, while on other OSes the assembler is called without
> any specific flag. This is wrong and causes the assembler to fail. Fix it
> by adding support for the -m16 option to x86-64.h.
> 
> 2016-07-06  Roger Pau Monné  
> 
>   * x86-64.h: append --32 to the assembler options when -m16 is used
>   even on non-glibc OSes.
> 
> ---
> Cc: h...@gcc.gnu.org
> Cc: ger...@freebsd.org
> ---
> This should be backported to all stable branches up to 4.9 (when -m16 was
> introduced).
> 
> Please keep me on Cc since I'm not subscribed to the list, thanks.

Ping?

Roger.

Re: [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost

2016-07-20 Thread James Greenhalgh

On Wed, Jul 20, 2016 at 01:41:39PM +0200, Bernd Schmidt wrote:
> On 07/20/2016 11:51 AM, James Greenhalgh wrote:
> 
> >
> >2016-07-20  James Greenhalgh  
> >
> > * target.def (max_noce_ifcvt_seq_cost): New.
> > * doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
> > * doc/tm.texi: Regenerate.
> > * targhooks.h (default_max_noce_ifcvt_seq_cost): New.
> > * targhooks.c (default_max_noce_ifcvt_seq_cost): New.
> > * params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
> > (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
> > * doc/invoke.texi: Document new params.
> 
> I think this is starting to look like a clear improvement, so I'll
> ack patches 1-3 with a few minor comments, and with the expectation
> that you'll address performance regressions on other targets if they
> occur.

I'll gladly take a look if I've caused anyone any trouble.

> Number 4 I still need to figure out.
> 
> Minor details:
> 
> >+  if (!speed_p)
> >+{
> >+  return cost <= if_info->original_cost;
> >+}
> 
> No braces around single statements in ifs. There's an instance of
> this in patch 4 as well.
> 
> >+  if (global_options_set.x_param_values[param])
> >+return PARAM_VALUE (param);
> 
> How about wrapping the param value into COSTS_N_INSNS, to make the
> value of the param less dependent on compiler internals?

I did consider this, but found it hard to word for the user documentation.
I found it easier to understand when it was in the same units as
rtx_cost, particularly as the AArch64 backend prints RTX costs to most
dump files (including ce1, ce2, ce3) so comparing directly was easy for me
to grok. I think going in either direction has the potential to confuse
users, the cost metrics of the RTL passes are very tightly coupled to
compiler internals.

I don't have a strong feeling either way, just a slight preference to keep
everything in the same units as rtx_cost where I can.

Let me know if you'd rather I follow this comment. There's some precedent
to wrapping it in COSTS_N_INSNS in GCSE_UNRESTRICTED_COST, but I find this
less clear than what I've done (well, I would say that :-) ).

> In patch 4:
> 
> >+  /* Check that our new insn isn't going to clobber ORIG_OTHER_DEST.  */
> >+  bool modified_in_x = (set_tmp != NULL_RTX)
> >+&& modified_in_p (orig_other_dest, set_tmp);
> 
> Watch line wrapping. No parens around the first subexpression (there
> are other examples of unnecessary ones in invocations of
> noce_arith_helper), but around the full one.

I'll catch these and others on commit, thanks for pointing them out.

Thanks,
James

[PATCH GCC]Cleanup lt_to_ne handling in niter analyzer

2016-07-20 Thread Bin Cheng

Hi,
This patch cleans up function number_of_iterations_lt_to_ne mainly by removing 
computation of may_be_zero.  The computation is unnecessary and may_be_zero in 
this case must be true.  Specifically, DELTA is integer constant and iv0.base < 
iv1.base bounds to be true because the false case is handled in function 
number_of_iterations_cond before.  This patch also refines comment a little.

Bootstrap and test on x86_64, is it OK?

Thanks,
bin

2016-07-19  Bin Cheng  

* tree-ssa-loop-niter.c (number_of_iterations_lt_to_ne): Clean up
by removing computation of may_be_zero.diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 0723752..3302f62 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1072,12 +1072,8 @@ number_of_iterations_lt_to_ne (tree type, affine_iv 
*iv0, affine_iv *iv1,
   tree niter_type = TREE_TYPE (step);
   tree mod = fold_build2 (FLOOR_MOD_EXPR, niter_type, *delta, step);
   tree tmod;
-  mpz_t mmod;
-  tree assumption = boolean_true_node, bound, noloop;
-  bool ret = false, fv_comp_no_overflow;
-  tree type1 = type;
-  if (POINTER_TYPE_P (type))
-type1 = sizetype;
+  tree assumption = boolean_true_node, bound;
+  tree type1 = (POINTER_TYPE_P (type)) ? sizetype : type;
 
   if (TREE_CODE (mod) != INTEGER_CST)
 return false;
@@ -1085,96 +1081,51 @@ number_of_iterations_lt_to_ne (tree type, affine_iv 
*iv0, affine_iv *iv1,
 mod = fold_build2 (MINUS_EXPR, niter_type, step, mod);
   tmod = fold_convert (type1, mod);
 
-  mpz_init (mmod);
-  wi::to_mpz (mod, mmod, UNSIGNED);
-  mpz_neg (mmod, mmod);
-
   /* If the induction variable does not overflow and the exit is taken,
- then the computation of the final value does not overflow.  This is
- also obviously the case if the new final value is equal to the
- current one.  Finally, we postulate this for pointer type variables,
- as the code cannot rely on the object to that the pointer points being
- placed at the end of the address space (and more pragmatically,
- TYPE_{MIN,MAX}_VALUE is not defined for pointers).  */
-  if (integer_zerop (mod) || POINTER_TYPE_P (type))
-fv_comp_no_overflow = true;
-  else if (!exit_must_be_taken)
-fv_comp_no_overflow = false;
-  else
-fv_comp_no_overflow =
-   (iv0->no_overflow && integer_nonzerop (iv0->step))
-   || (iv1->no_overflow && integer_nonzerop (iv1->step));
-
-  if (integer_nonzerop (iv0->step))
+ then the computation of the final value does not overflow.  There
+ are three cases:
+   1) The case if the new final value is equal to the current one.
+   2) Induction varaible has pointer type, as the code cannot rely
+ on the object to that the pointer points being placed at the
+ end of the address space (and more pragmatically,
+ TYPE_{MIN,MAX}_VALUE is not defined for pointers).
+   3) EXIT_MUST_BE_TAKEN is true, note it implies that the induction
+ variable does not overflow.  */
+  if (!integer_zerop (mod) && !POINTER_TYPE_P (type) && !exit_must_be_taken)
 {
-  /* The final value of the iv is iv1->base + MOD, assuming that this
-computation does not overflow, and that
-iv0->base <= iv1->base + MOD.  */
-  if (!fv_comp_no_overflow)
+  if (integer_nonzerop (iv0->step))
{
+ /* The final value of the iv is iv1->base + MOD, assuming
+that this computation does not overflow, and that
+iv0->base <= iv1->base + MOD.  */
  bound = fold_build2 (MINUS_EXPR, type1,
   TYPE_MAX_VALUE (type1), tmod);
  assumption = fold_build2 (LE_EXPR, boolean_type_node,
iv1->base, bound);
- if (integer_zerop (assumption))
-   goto end;
}
-  if (mpz_cmp (mmod, bnds->below) < 0)
-   noloop = boolean_false_node;
-  else if (POINTER_TYPE_P (type))
-   noloop = fold_build2 (GT_EXPR, boolean_type_node,
- iv0->base,
- fold_build_pointer_plus (iv1->base, tmod));
   else
-   noloop = fold_build2 (GT_EXPR, boolean_type_node,
- iv0->base,
- fold_build2 (PLUS_EXPR, type1,
-  iv1->base, tmod));
-}
-  else
-{
-  /* The final value of the iv is iv0->base - MOD, assuming that this
-computation does not overflow, and that
-iv0->base - MOD <= iv1->base. */
-  if (!fv_comp_no_overflow)
{
+ /* The final value of the iv is iv0->base - MOD, assuming
+that this computation does not overflow, and that
+iv0->base - MOD <= iv1->base.  */
  bound = fold_build2 (PLUS_EXPR, type1,
   TYPE_MIN_VALUE (type1), tmod);
  assumption = fold_build2 (GE_EXPR, boolean_type_node,
iv0->base, bound

Re: [PATCH] Fix unsafe function attributes for special functions (PR 71876)

2016-07-20 Thread Bernd Edlinger

On 07/20/16 18:20, Jeff Law wrote:
> On 07/20/2016 09:41 AM, Bernd Edlinger wrote:
>> On 07/20/16 12:44, Richard Biener wrote:
>>> On Tue, 19 Jul 2016, Bernd Edlinger wrote:
>>>
 Hi!

 As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71876,
 we have a _very_ old hack in gcc, that recognizes certain functions by
 name, and inserts in some cases unsafe attributes, that don't work for
 a freestanding environment.

 It is unsafe to return ECF_MAY_BE_ALLOCA, ECF_LEAF and ECF_NORETURN
 from special_function_p, just by the name of the function, especially
 for less well known functions, like "getcontext" or "savectx", which
 could easily used for something completely different.
>>>
>>> Returning ECF_MAY_BE_ALLOCA is safe.  Just wanted to mention this,
>>> regardless of the followups you already received.
>>
>>
>> I dont think so.
>>
>>
>> Consider this example:
>>
>> cat test.cc
>> //extern "C"
>> void *alloca(unsigned long);
>> void bar(unsigned long n)
>> {
>>char *x = (char*) alloca(n);
>>if (x)
>>  *x = 0;
>> }
>>
>> g++ -O3 -S test.cc
>>
>> result:
>>
>> _Z3barm:
>> .LFB0:
>> .cfi_startproc
>> pushq%rbp
>> .cfi_def_cfa_offset 16
>> .cfi_offset 6, -16
>> movq%rsp, %rbp
>> .cfi_def_cfa_register 6
>> call_Z6allocam
>> movb$0, (%rax)
>> leave
>> .cfi_def_cfa 7, 8
>> ret
>>
>> So we call a C++ function with name alloca, but because
>> special_function_p adds ECF_MAY_BE_ALLOCA, the null-pointer
>> check is eliminated, but it is not the builtin alloca,
>> but for the C++ front end it is a pretty normal function.
> Clearly if something "may be alloca", then the properties on the
> arguments/return values & side effects that are specific to alloca can
> not be relied upon.  That to me seems like a bug elsewhere in the
> compiler independent of the changes you're trying to make.
>

Yes. That is another interesting observation.  I think, originally this
flag was introduced by Jan Hubicka, and should mean, "it may be alloca
or a weak alias to alloca or maybe even something different".
But some of the later optimizations use it in a way as if it meant
"it must be alloca".  However I have not been able to come up with
a test case that makes this assumption false, but I probably just
did not try hard enough.

But I think that alloca just should not be recognized by name any
more.

>
>
> Jeff
>

[PATCH test]XFAIL gcc.dg/vect/vect-mask-store-move-1.c

2016-07-20 Thread Bin Cheng

Hi,
After patch @238301, issue reported in PR65206 is also exposed by case 
gcc.dg/vect/vect-mask-store-move-1.c.  This patch xfail the case for the moment.
Test result checked, is it OK?

Thanks,
bin
gcc/testsuite/ChangeLog
2016-07-14  Bin Cheng  

* gcc.dg/vect/vect-mask-store-move-1.c: XFAIL.diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
index f928dbf..1e06b58 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -15,4 +15,4 @@ void foo (int n)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 4 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 4 "vect" { 
target { i?86-*-* x86_64-*-* } xfail { i?86-*-* x86_64-*-* } } } } */

Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics

2016-07-20 Thread Jiong Wang


On 07/07/16 17:14, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 one operand vector intrinsics.

We introduced new mode iterators to cover HF modes, qualified patterns
which was using old mode iterators are switched to new ones.

We can't simply extend old iterator like VDQF to conver HF modes,
because not all patterns using VDQF are with new FP16 support, thus we
introduced new, temperary iterators, and only apply new iterators on
those patterns which do have FP16 supports.


I noticed the patchset at

  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html

has some modifications on the standard name "div" and "sqrt", thus there
are minor conflicts as this patch touch "sqrt" as well.

This patch resolve the conflict and the change is to let
aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode.

gcc/
2016-07-20  Jiong Wang

* config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend to HF 
modes.
(neg2): Likewise.
(abs2): Likewise.
(2): Likewise.
(l2): Likewise.
(2): Likewise.
(2): Likewise.
(ftrunc2): Likewise.
(2): Likewise.
(sqrt2): Likewise.
(*sqrt2): Likewise.
(aarch64_frecpe): Likewise.
(aarch64_cm): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
false for V4HF and V8HF.
* config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
(VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to 
HF modes.
(stype): New.
* config/aarch64/arm_neon.h (vdup_n_f16): New.
(vdupq_n_f16): Likewise.
(vld1_dup_f16): Use vdup_n_f16.
(vld1q_dup_f16): Use vdupq_n_f16.
(vabs_f16): New.
(vabsq_f16): Likewise.
(vceqz_f16): Likewise.
(vceqzq_f16): Likewise.
(vcgez_f16): Likewise.
(vcgezq_f16): Likewise.
(vcgtz_f16): Likewise.
(vcgtzq_f16): Likewise.
(vclez_f16): Likewise.
(vclezq_f16): Likewise.
(vcltz_f16): Likewise.
(vcltzq_f16): Likewise.
(vcvt_f16_s16): Likewise.
(vcvtq_f16_s16): Likewise.
(vcvt_f16_u16): Likewise.
(vcvtq_f16_u16): Likewise.
(vcvt_s16_f16): Likewise.
(vcvtq_s16_f16): Likewise.
(vcvt_u16_f16): Likewise.
(vcvtq_u16_f16): Likewise.
(vcvta_s16_f16): Likewise.
(vcvtaq_s16_f16): Likewise.
(vcvta_u16_f16): Likewise.
(vcvtaq_u16_f16): Likewise.
(vcvtm_s16_f16): Likewise.
(vcvtmq_s16_f16): Likewise.
(vcvtm_u16_f16): Likewise.
(vcvtmq_u16_f16): Likewise.
(vcvtn_s16_f16): Likewise.
(vcvtnq_s16_f16): Likewise.
(vcvtn_u16_f16): Likewise.
(vcvtnq_u16_f16): Likewise.
(vcvtp_s16_f16): Likewise.
(vcvtpq_s16_f16): Likewise.
(vcvtp_u16_f16): Likewise.
(vcvtpq_u16_f16): Likewise.
(vneg_f16): Likewise.
(vnegq_f16): Likewise.
(vrecpe_f16): Likewise.
(vrecpeq_f16): Likewise.
(vrnd_f16): Likewise.
(vrndq_f16): Likewise.
(vrnda_f16): Likewise.
(vrndaq_f16): Likewise.
(vrndi_f16): Likewise.
(vrndiq_f16): Likewise.
(vrndm_f16): Likewise.
(vrndmq_f16): Likewise.
(vrndn_f16): Likewise.
(vrndnq_f16): Likewise.
(vrndp_f16): Likewise.
(vrndpq_f16): Likewise.
(vrndx_f16): Likewise.
(vrndxq_f16): Likewise.
(vrsqrte_f16): Likewise.
(vrsqrteq_f16): Likewise.
(vsqrt_f16): Likewise.
(vsqrtq_f16): Likewise.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b90b2af5e9d2b5e7f48569ec1ebcb0ef16314ee..af5fac5b29cf5373561d9bf9a69c401d2bec5cec 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index f1ad325f464f89c981cbdee8a8f6afafa938639a..22c87be429ba1aac2bbe77f1119d16b6b8bd6e80 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -42,7 +42,7 @@

Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics

2016-07-20 Thread Jiong Wang


On 07/07/16 17:15, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 two operands vector intrinsics.


The updated patch resolve the conflict with

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html

The change is to let aarch64_emit_approx_div return false for
V4HFmode and V8HFmode.

gcc/
2016-07-20  Jiong Wang

* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md
(aarch64_rsqrts): Extend to HF modes.
(fabd3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_p): Likewise.
(3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_faddp): Likewise.
(aarch64_fmulx): Likewise.
(aarch64_frecps): Likewise.
(*aarch64_fac): Rename to aarch64_fac.
(add3): Extend to HF modes.
(sub3): Likewise.
(mul3): Likewise.
(div3): Likewise.
(*div3): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
false for V4HF and V8HF.
* config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
iterator.
* config/aarch64/arm_neon.h (vadd_f16): Likewise.
(vaddq_f16): Likewise.
(vabd_f16): Likewise.
(vabdq_f16): Likewise.
(vcage_f16): Likewise.
(vcageq_f16): Likewise.
(vcagt_f16): Likewise.
(vcagtq_f16): Likewise.
(vcale_f16): Likewise.
(vcaleq_f16): Likewise.
(vcalt_f16): Likewise.
(vcaltq_f16): Likewise.
(vceq_f16): Likewise.
(vceqq_f16): Likewise.
(vcge_f16): Likewise.
(vcgeq_f16): Likewise.
(vcgt_f16): Likewise.
(vcgtq_f16): Likewise.
(vcle_f16): Likewise.
(vcleq_f16): Likewise.
(vclt_f16): Likewise.
(vcltq_f16): Likewise.
(vcvt_n_f16_s16): Likewise.
(vcvtq_n_f16_s16): Likewise.
(vcvt_n_f16_u16): Likewise.
(vcvtq_n_f16_u16): Likewise.
(vcvt_n_s16_f16): Likewise.
(vcvtq_n_s16_f16): Likewise.
(vcvt_n_u16_f16): Likewise.
(vcvtq_n_u16_f16): Likewise.
(vdiv_f16): Likewise.
(vdivq_f16): Likewise.
(vdup_lane_f16): Likewise.
(vdup_laneq_f16): Likewise.
(vdupq_lane_f16): Likewise.
(vdupq_laneq_f16): Likewise.
(vdups_lane_f16): Likewise.
(vdups_laneq_f16): Likewise.
(vmax_f16): Likewise.
(vmaxq_f16): Likewise.
(vmaxnm_f16): Likewise.
(vmaxnmq_f16): Likewise.
(vmin_f16): Likewise.
(vminq_f16): Likewise.
(vminnm_f16): Likewise.
(vminnmq_f16): Likewise.
(vmul_f16): Likewise.
(vmulq_f16): Likewise.
(vmulx_f16): Likewise.
(vmulxq_f16): Likewise.
(vpadd_f16): Likewise.
(vpaddq_f16): Likewise.
(vpmax_f16): Likewise.
(vpmaxq_f16): Likewise.
(vpmaxnm_f16): Likewise.
(vpmaxnmq_f16): Likewise.
(vpmin_f16): Likewise.
(vpminq_f16): Likewise.
(vpminnm_f16): Likewise.
(vpminnmq_f16): Likewise.
(vrecps_f16): Likewise.
(vrecpsq_f16): Likewise.
(vrsqrts_f16): Likewise.
(vrsqrtsq_f16): Likewise.
(vsub_f16): Likewise.
(vsubq_f16): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 22c87be429ba1aac2bbe77f1119d16b6b8bd6e80..007dad60b6999158a1c9c1cf2a501a9f0712af54 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VALLF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -248,22 +248,22 @@
   BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
-  BUILTIN_VDQF (BINOP, smax_nan, 3)
-  BUILTIN_VDQF (BINOP, smin_nan, 3)
+  BUILTIN_VHSDF (BINOP, smax_nan, 3)
+  BUILTIN_VHSDF (BINOP, smin_nan, 3)
 
   /* Implemented by 3.  */
-  BUILTIN_VDQF (BINOP, fmax, 3)
-  BUILTIN_VDQF (BINOP, fmin, 3)
+  BUILTIN_VHSDF (BINOP, fmax, 3)
+  BUILTIN_VHSDF (BINOP, fmin, 3)
 
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
   BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
-  BUILTIN_VDQF (BINOP, smaxp, 0)
-  BUILTIN_VDQF (BINOP, sminp, 0)
-  BUILTIN_VDQF (BINOP, smax_nanp, 0)
-  BUILTIN_VDQF (BINOP, smin_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smaxp, 0)
+  BUILTIN_VHSDF (BINOP, sminp, 0)
+  BUILTIN_VHSDF (BINOP, smax_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smin_nanp, 0)
 
   /* Implemented by 2.  */
   BUILTIN_VHSDF (UNOP, btrunc, 2)
@@ -383,7 +383,7 @@
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VDQF (BINOP, frecps, 0)

Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics

2016-07-20 Thread Jiong Wang


On 07/07/16 17:17, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 one operand scalar intrinsics

Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.


The updated patch resolve the conflict with

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html

The change is to let aarch64_emit_approx_sqrt return false for HFmode.

gcc/
2016-07-20  Jiong Wang

* config.gcc (aarch64*-*-*): Install arm_fp16.h.
* config/aarch64/aarch64-builtins.c (hi_UP): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_frsqrte): Extend to HF 
mode.
(aarch64_frecp): Likewise.
(aarch64_cm): Likewise.
* config/aarch64/aarch64.md (2): Likewise.
(l2): Likewise.
(fix_trunc2): Likewise.
(sqrt2): Likewise.
(*sqrt2): Likewise.
(abs2): Likewise.
(hf2): New pattern for HF mode.
(hihf2): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
for HF mode.
* config/aarch64/arm_neon.h: Include arm_fp16.h.
* config/aarch64/iterators.md (GPF_F16): New.
(GPI_F16): Likewise.
(VHSDF_HSDF): Likewise.
(w1): Support HF mode.
(w2): Likewise.
(v): Likewise.
(s): Likewise.
(q): Likewise.
(Vmtype): Likewise.
(V_cmp_result): Likewise.
(fcvt_iesize): Likewise.
(FCVT_IESIZE): Likewise.
* config/aarch64/arm_fp16.h: New file.
(vabsh_f16): New.
(vceqzh_f16): Likewise.
(vcgezh_f16): Likewise.
(vcgtzh_f16): Likewise.
(vclezh_f16): Likewise.
(vcltzh_f16): Likewise.
(vcvth_f16_s16): Likewise.
(vcvth_f16_s32): Likewise.
(vcvth_f16_s64): Likewise.
(vcvth_f16_u16): Likewise.
(vcvth_f16_u32): Likewise.
(vcvth_f16_u64): Likewise.
(vcvth_s16_f16): Likewise.
(vcvth_s32_f16): Likewise.
(vcvth_s64_f16): Likewise.
(vcvth_u16_f16): Likewise.
(vcvth_u32_f16): Likewise.
(vcvth_u64_f16): Likewise.
(vcvtah_s16_f16): Likewise.
(vcvtah_s32_f16): Likewise.
(vcvtah_s64_f16): Likewise.
(vcvtah_u16_f16): Likewise.
(vcvtah_u32_f16): Likewise.
(vcvtah_u64_f16): Likewise.
(vcvtmh_s16_f16): Likewise.
(vcvtmh_s32_f16): Likewise.
(vcvtmh_s64_f16): Likewise.
(vcvtmh_u16_f16): Likewise.
(vcvtmh_u32_f16): Likewise.
(vcvtmh_u64_f16): Likewise.
(vcvtnh_s16_f16): Likewise.
(vcvtnh_s32_f16): Likewise.
(vcvtnh_s64_f16): Likewise.
(vcvtnh_u16_f16): Likewise.
(vcvtnh_u32_f16): Likewise.
(vcvtnh_u64_f16): Likewise.
(vcvtph_s16_f16): Likewise.
(vcvtph_s32_f16): Likewise.
(vcvtph_s64_f16): Likewise.
(vcvtph_u16_f16): Likewise.
(vcvtph_u32_f16): Likewise.
(vcvtph_u64_f16): Likewise.
(vnegh_f16): Likewise.
(vrecpeh_f16): Likewise.
(vrecpxh_f16): Likewise.
(vrndh_f16): Likewise.
(vrndah_f16): Likewise.
(vrndih_f16): Likewise.
(vrndmh_f16): Likewise.
(vrndnh_f16): Likewise.
(vrndph_f16): Likewise.
(vrndxh_f16): Likewise.
(vrsqrteh_f16): Likewise.
(vsqrth_f16): Likewise.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f75f17877334c2bb61cd16b69539ec7514db8ae..8827dc830d374c2512be5713d6dd143913f53c7d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -300,7 +300,7 @@ m32c*-*-*)
 ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_neon.h arm_acle.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index af5fac5b29cf5373561d9bf9a69c401d2bec5cec..ca91d9108ead3eb83c21ee86d9e6ed44c8f4ad2d 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -62,6 +62,7 @@
 #define si_UPSImode
 #define sf_UPSFmode
 #define hi_UPHImode
+#define hf_UPHFmode
 #define qi_UPQImode
 #define UP(X) X##_UP
 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 363e131327d6be04dd94e664ef839e46f26940e4..6f50d8405d3ee8c4823037bb2022a4f2f08b72fe 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -274,6 +274,14 @@
   BUILTIN_VHSDF (UNOP, round, 2)
   BUILTIN_VHSDF_DF (UNOP, frintn, 2)
 
+  VAR1 (UNOP, btrunc, 2, hf)
+  VAR1 (UNOP, ceil, 2, hf)
+  VAR1 (UNOP, floor, 2, hf)
+  VAR1 (UNOP, frintn, 2, hf)
+  VAR1 (UNOP, nearbyint, 2, hf)
+  VAR1 (UNOP, rint, 2, hf)
+  VAR1 (UNOP, round, 2, hf)
+
   /* Implemented by l2.  */
   VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
   VAR1 (UNO

Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics

2016-07-20 Thread Jiong Wang


On 07/07/16 17:17, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 two operands scalar intrinsics.


The updated patch resolve the conflict with

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html

The change is to let aarch64_emit_approx_div return false for HFmode.

gcc/
2016-07-20  Jiong Wang

* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64.md (hf3): 
New.
(hf3): Likewise.
(add3): Likewise.
(sub3): Likewise.
(mul3): Likewise.
(div3): Likewise.
(*div3): Likewise.
(3): Extend to HF.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
false for HFmode.
* config/aarch64/aarch64-simd.md (aarch64_rsqrts): Likewise.
(fabd3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_fmulx): Likewise.
(aarch64_fac): Likewise.
(aarch64_frecps): Likewise.
(hfhi3): New.
(hihf3): Likewise.
* config/aarch64/iterators.md (VHSDF_SDF): Delete.
(VSDQ_HSDI): Support HI.
(fcvt_target, FCVT_TARGET): Likewise.
* config/aarch64/arm_fp16.h: (vaddh_f16): New.
(vsubh_f16): Likewise.
(vabdh_f16): Likewise.
(vcageh_f16): Likewise.
(vcagth_f16): Likewise.
(vcaleh_f16): Likewise.
(vcalth_f16): Likewise.(vcleh_f16): Likewise.
(vclth_f16): Likewise.
(vcvth_n_f16_s16): Likewise.
(vcvth_n_f16_s32): Likewise.
(vcvth_n_f16_s64): Likewise.
(vcvth_n_f16_u16): Likewise.
(vcvth_n_f16_u32): Likewise.
(vcvth_n_f16_u64): Likewise.
(vcvth_n_s16_f16): Likewise.
(vcvth_n_s32_f16): Likewise.
(vcvth_n_s64_f16): Likewise.
(vcvth_n_u16_f16): Likewise.
(vcvth_n_u32_f16): Likewise.
(vcvth_n_u64_f16): Likewise.
(vdivh_f16): Likewise.
(vmaxh_f16): Likewise.
(vmaxnmh_f16): Likewise.
(vminh_f16): Likewise.
(vminnmh_f16): Likewise.
(vmulh_f16): Likewise.
(vmulxh_f16): Likewise.
(vrecpsh_f16): Likewise.
(vrsqrtsh_f16): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 6f50d8405d3ee8c4823037bb2022a4f2f08b72fe..31abc077859254e3696adacb3f8f2b9b2da0647f 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -393,13 +393,12 @@
   /* Implemented by
  aarch64_frecp.  */
   BUILTIN_GPF_F16 (UNOP, frecpe, 0)
-  BUILTIN_GPF (BINOP, frecps, 0)
   BUILTIN_GPF_F16 (UNOP, frecpx, 0)
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VHSDF (BINOP, frecps, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
  only ever used for the int64x1_t intrinsic, there is no scalar version.  */
@@ -496,17 +495,23 @@
   /* Implemented by <*><*>3.  */
   BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
   BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3)
+  VAR1 (SHIFTIMM, scvtfsi, 3, hf)
+  VAR1 (SHIFTIMM, scvtfdi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf)
+  BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3)
+  BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3)
 
   /* Implemented by aarch64_rsqrte.  */
   BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts.  */
-  BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0)
 
   /* Implemented by fabd3.  */
-  BUILTIN_VHSDF_SDF (BINOP, fabd, 3)
+  BUILTIN_VHSDF_HSDF (BINOP, fabd, 3)
 
   /* Implemented by aarch64_faddp.  */
   BUILTIN_VHSDF (BINOP, faddp, 0)
@@ -522,10 +527,10 @@
   BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
 
   /* Implemented by aarch64_fac.  */
-  BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0)
 
   /* Implemented by sqrt2.  */
   VAR1 (UNOP, sqrt, 2, hf)
@@ -543,3 +548,7 @@
   BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
+
+  /* Implemented by 3.  */
+  VAR1 (BINOP, fmax, 3, hf)
+  VAR1 (BINOP, fmin, 3, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/conf

[PATCH #2], PowerPC support to enable -mlra and/or -mfloat128

2016-07-20 Thread Michael Meissner

This patch renames the configure switches to be explicit that they are for the
PowerPC, and that they are temporary.  I would hope by the time GCC 7 exits
stage1 that these switches will be removed, but having them now will allow us
to move to LRA and __float128 in an orderly fashion.

I built a bootstrap compiler using the --enable-powerpc-lra option, and it ran
fine.  There were two additional tests that generate different code with -mlra
and now fail.  These will be fixed in later patches.

I also built a C only compiler using the --enable-powerpc-float128 option
(disabling libquadmath and bootstrap), and the C tests looked fine.

Can I install these patches in the trunk?

2016-07-20  Michael Meissner  

* doc/install.texi (Configuration): Document PowerPC specific
configuration options --enable-powerpc-lra and
--enable-powerpc-float128.
* configure.ac: Add support for the configuration option
--enable-powerpc-lra to enable the use of the LRA register
allocator by default.  Add support for the configuration option
--enable-powerpc-float128 to enable the use of the __float128 type
in PowerPC Linux systems.
* configure: Regenerate.
* config.gcc (powerpc*-*-linux*): Add --enable-powerpc-lra and
--enable-powerpc-float128 support.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
support for --enable-powerpc-lra and --enable-powerpc-float128.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 238445)
+++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
@@ -4306,6 +4306,17 @@ rs6000_option_override_internal (bool gl
   rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_SCALAR;
 }
 
+  /* Enable LRA if the compiler was configured with --enable-lra.  */
+#ifdef ENABLE_LRA
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_LRA) == 0)
+{
+  if (ENABLE_LRA)
+   rs6000_isa_flags |= OPTION_MASK_LRA;
+  else
+   rs6000_isa_flags &= ~OPTION_MASK_LRA;
+}
+#endif
+
   /* There have been bugs with -mvsx-timode that don't show up with -mlra,
  but do show up with -mno-lra.  Given -mlra will become the default once
  PR 69847 is fixed, turn off the options with problems by default if
@@ -4372,6 +4383,17 @@ rs6000_option_override_internal (bool gl
}
 }
 
+  /* Enable FLOAT128 if the compiler was configured with --enable-float128.  */
+#ifdef ENABLE_FLOAT128
+  if (TARGET_VSX && (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128) == 0)
+{
+  if (ENABLE_FLOAT128)
+   rs6000_isa_flags |= OPTION_MASK_FLOAT128;
+  else
+   rs6000_isa_flags &= ~(OPTION_MASK_FLOAT128 | OPTION_MASK_FLOAT128_HW);
+}
+#endif
+
   /* __float128 requires VSX support.  */
   if (TARGET_FLOAT128 && !TARGET_VSX)
 {
Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/doc)  (revision 
238445)
+++ gcc/doc/install.texi(.../gcc/doc)   (working copy)
@@ -1661,6 +1661,35 @@ Using the GNU Compiler Collection (GCC)}
 See ``RS/6000 and PowerPC Options'' in the main manual
 @end ifhtml
 
+@item --enable-powerpc-lra
+This option enables @option{-mlra} by default for powerpc-linux.  This
+switch is a temporary configuration switch that is intended to allow
+for the transition from the reload register allocator to the newer lra
+register allocator.  When the transition is complete, this switch
+may be deleted.
+@ifnothtml
+@xref{RS/6000 and PowerPC Options,, RS/6000 and PowerPC Options, gcc,
+Using the GNU Compiler Collection (GCC)},
+@end ifnothtml
+@ifhtml
+See ``RS/6000 and PowerPC Options'' in the main manual
+@end ifhtml
+
+@item --enable-powerpc-float128
+This option enables @option{-mfloat128} by default for powerpc-linux.
+This switch is a temporary configuation switch that is intended to
+allow the PowerPC GCC developers to work on implementing library
+support for PowerPC IEEE 128-bit floating point functions.  When the
+standard GCC libraries are enhanced to support @code{__float128} by
+default, this switch may be deleted.
+@ifnothtml
+@xref{RS/6000 and PowerPC Options,, RS/6000 and PowerPC Options, gcc,
+Using the GNU Compiler Collection (GCC)},
+@end ifnothtml
+@ifhtml
+See ``RS/6000 and PowerPC Options'' in the main manual
+@end ifhtml
+
 @item --enable-default-ssp
 Turn on @option{-fstack-protector-strong} by default.

Re: [PATCH] S/390: Xfail some tests in insv-[12].c.

2016-07-20 Thread Andreas Krebbel

On 07/19/2016 11:40 AM, Dominik Vogt wrote:
> The attached patch XFAILs some of the "insv" testcases as
> discussed internally.  Tested on s390x biarch and s390.

Applied.  Thanks!

-Andreas-

Re: [PATCH] S/390: Fix pr67443.c.

2016-07-20 Thread Andreas Krebbel

On 07/20/2016 01:55 PM, Dominik Vogt wrote:
> The attached patch rewrites the pr67443.c testcase in a different
> way so that the test still works with the changed allocation of
> globals pinned to registers.  The test ist hopefully more robust
> now.  The test ist hopefully more robust now.  Tested on s390 and s390x 
> biarch.

Applied.  Thanks!

-Andreas-

1 2 >

1 - 100 of 146 matches

Mail list logo