Re: [PATCH] Fix PR78515

2016-12-15 Thread Jakub Jelinek
On Wed, Dec 14, 2016 at 05:10:23PM -0700, Martin Sebor wrote:
> The regression test is failing on powerpc64le due to the warnings
> below:
> 
> FAIL: gcc.dg/torture/pr78515.c   -O0  (test for excess errors)
> Excess errors:
> /src/gcc/trunk/gcc/testsuite/gcc.dg/torture/pr78515.c:11:1: warning: GCC
> vector returned by reference: non-standard ABI extension with no
> compatibility guarantee [-Wpsabi]
> /src/gcc/trunk/gcc/testsuite/gcc.dg/torture/pr78515.c:10:1: warning: GCC
> vector passed by reference: non-standard ABI extension with no compatibility
> guarantee [-Wpsabi]

David has fixed this recently, but just for AIX.  Generally, -Wno-psabi
is beneficial for all targets if it is needed on just one, I've committed
following:

2016-12-15  Jakub Jelinek  

* gcc.dg/tree-ssa/forwprop-35.c: Use -Wno-psabi everywhere.
* gcc.dg/torture/pr78515.c: Likewise.
* gcc.dg/pr69634.c: Likewise.

--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-35.c.jj  2016-12-14 
22:38:37.115389167 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-35.c 2016-12-15 09:18:12.158020252 
+0100
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-cddce1" } */
+/* { dg-options "-O -fdump-tree-cddce1 -Wno-psabi" } */
 /* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-additional-options "-Wno-psabi" { target { powerpc-ibm-aix* } } } */
 
 typedef int v4si __attribute__((vector_size(16)));
 typedef float v4sf __attribute__((vector_size(16)));
--- gcc/testsuite/gcc.dg/torture/pr78515.c.jj   2016-12-14 22:38:36.0 
+0100
+++ gcc/testsuite/gcc.dg/torture/pr78515.c  2016-12-15 09:15:32.000142648 
+0100
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
+/* { dg-additional-options "-Wno-psabi" } */
 /* { dg-additional-options "-mavx512bw" { target x86_64-*-* i?86-*-* } } */
-/* { dg-additional-options "-Wno-psabi" { target powerpc-ibm-aix* } } */
 
 typedef unsigned V __attribute__ ((vector_size (64)));
 
--- gcc/testsuite/gcc.dg/pr69634.c.jj   2016-12-14 22:38:37.251387392 +0100
+++ gcc/testsuite/gcc.dg/pr69634.c  2016-12-15 09:18:54.812455000 +0100
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-dce -fschedule-insns -fno-tree-vrp -fcompare-debug" 
} */
-/* { dg-additional-options "-Wno-psabi -mno-sse" { target i?86-*-* x86_64-*-* 
} } */
-/* { dg-additional-options "-Wno-psabi" { target powerpc-ibm-aix* } } */
+/* { dg-options "-O2 -fno-dce -fschedule-insns -fno-tree-vrp -fcompare-debug 
-Wno-psabi" } */
+/* { dg-additional-options "-mno-sse" { target i?86-*-* x86_64-*-* } } */
 /* { dg-require-effective-target scheduling } */
 
 typedef unsigned short u16;


Jakub


C++ Patch Ping

2016-12-15 Thread Jakub Jelinek
Hi!

I'd like to ping the

http://gcc.gnu.org/ml/gcc-patches/2016-12/msg00698.html
P0490R0 GB 20: decomposition declaration should commit to tuple interpretation 
early 

patch.

Thanks

Jakub


Re: cprop fix for PR78626

2016-12-15 Thread Segher Boessenkool
On Wed, Dec 14, 2016 at 11:49:26AM -0600, Segher Boessenkool wrote:
> On Wed, Dec 14, 2016 at 04:46:09PM +0100, Bernd Schmidt wrote:
> > That would be this patch. Tested as before. The two new testcases seem 
> > to pass with a ppc cross (but I would appreciate if someone were to run 
> > full tests on ppc).
> 
> Thanks, will do.

Bootstrapped and regression checked on powerpc64-linux {-m32,-m64};
no new problems.


Segher


Re: [PATCH] combine: Omit redundant AND in change_zero_ext.

2016-12-15 Thread Dominik Vogt
On Wed, Dec 14, 2016 at 01:32:48PM -0600, Segher Boessenkool wrote:
> On Wed, Dec 14, 2016 at 11:01:47AM +0100, Dominik Vogt wrote:
> > This is another micro-optimisation in change_zero_ext.  If an
> > 
> >   (and (lshiftrt ... (N)) (M))
> > 
> > generated by change_zero_ext is equivalent to just
> > 
> >   (lshiftrt ... (N))
> > 
> > (because the AND constant selects the N rightmost bits of the
> > result), strip off the AND.
> > 
> > _But_ I'm still not completely convinced whether this is a good
> > idea.  It may become necessary to add md patterns to deal with
> > just the LSHIFTRT.  On the other hand it saves the need for
> > another special case in change_zero_ext, and a less obvious, very
> > specific risbg pattern on s390 
> 
> For PowerPC we should already have all such patterns with a "bare" shift
> (they can be created in other ways, too).
> 
> > Bootstrapped and regression tested on s390x and s390.  (Targets
> > with risbg-like instructions (Power, others?) may need some
> > tuning.)
> 
> But, it is also possible I missed some.  So please wait until I have
> tested it.
> 
> 
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 19851a2..5ebf31c 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -11280,8 +11280,13 @@ change_zero_ext (rtx pat)
> >else
> > continue;
> >  
> > -  wide_int mask = wi::mask (size, false, GET_MODE_PRECISION (mode));
> > -  x = gen_rtx_AND (mode, x, immed_wide_int_const (mask, mode));
> > +  if (!(GET_CODE (x) == LSHIFTRT
> > +   && CONST_INT_P (XEXP (x, 1))
> > +   && size + INTVAL (XEXP (x, 1)) == GET_MODE_PRECISION (mode)))
> > +   {
> > + wide_int mask = wi::mask (size, false, GET_MODE_PRECISION (mode));
> > + x = gen_rtx_AND (mode, x, immed_wide_int_const (mask, mode));
> > +   }
> 
> One could argue that this should have been an lshiftrt in the first place
> then, not a zero_ext*.  Hrm.

This one

  void g2(ui64 *pl, i32 seed)
  {
seed = 69607 * seed;
pl[0] = (seed >> 8) & 0xff;
  }

generates

  (zero_extract:DI (reg:SI 75 [ seed ])
(const_int 24 [0x18])
(const_int 0 [0]))

on s390x.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] combine: Omit redundant AND in change_zero_ext.

2016-12-15 Thread Segher Boessenkool
On Thu, Dec 15, 2016 at 09:55:52AM +0100, Dominik Vogt wrote:
> > > Bootstrapped and regression tested on s390x and s390.  (Targets
> > > with risbg-like instructions (Power, others?) may need some
> > > tuning.)
> > 
> > But, it is also possible I missed some.  So please wait until I have
> > tested it.
> > 
> > > diff --git a/gcc/combine.c b/gcc/combine.c
> > > index 19851a2..5ebf31c 100644
> > > --- a/gcc/combine.c
> > > +++ b/gcc/combine.c
> > > @@ -11280,8 +11280,13 @@ change_zero_ext (rtx pat)
> > >else
> > >   continue;
> > >  
> > > -  wide_int mask = wi::mask (size, false, GET_MODE_PRECISION (mode));
> > > -  x = gen_rtx_AND (mode, x, immed_wide_int_const (mask, mode));
> > > +  if (!(GET_CODE (x) == LSHIFTRT
> > > + && CONST_INT_P (XEXP (x, 1))
> > > + && size + INTVAL (XEXP (x, 1)) == GET_MODE_PRECISION (mode)))
> > > + {
> > > +   wide_int mask = wi::mask (size, false, GET_MODE_PRECISION (mode));
> > > +   x = gen_rtx_AND (mode, x, immed_wide_int_const (mask, mode));
> > > + }
> > 
> > One could argue that this should have been an lshiftrt in the first place
> > then, not a zero_ext*.  Hrm.
> 
> This one
> 
>   void g2(ui64 *pl, i32 seed)
>   {
> seed = 69607 * seed;
> pl[0] = (seed >> 8) & 0xff;
>   }
> 
> generates
> 
>   (zero_extract:DI (reg:SI 75 [ seed ])
> (const_int 24 [0x18])
> (const_int 0 [0]))
> 
> on s390x.

Ah, right, it changes mode as well.  I see.

Tested on powerpc64-linux {-m32,-m64}, no new failures.  The patch is
okay for trunk.  Thanks!


Segher


Re: [PATCH][ARM] PR target/71436: Restrict *load_multiple pattern till after LRA

2016-12-15 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 08/12/16 11:55, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03078.html

Thanks,
Kyrill

On 30/11/16 16:47, Kyrill Tkachov wrote:

Hi all,

In this awkward ICE we have a *load_multiple pattern that is being transformed 
in reload from:
(insn 55 67 151 3 (parallel [
(set (reg:SI 0 r0)
(mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
(set (reg:SI 158 [ c+4 ])
(mem/u/c:SI (plus:SI (reg/f:SI 147)
(const_int 4 [0x4])) [2 c+4 S4 A32]))
]) arm-crash.c:25 393 {*load_multiple}
 (expr_list:REG_UNUSED (reg:SI 0 r0)
(nil)))


into the invalid:
(insn 55 67 70 3 (parallel [
(set (reg:SI 0 r0)
(mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
(set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
(const_int -4 [0xfffc])) [4 %sfp+-12 S4 
A32])
(mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
(const_int 4 [0x4])) [2 c+4 S4 A32]))
]) arm-crash.c:25 393 {*load_multiple}
 (nil))

The operands of *load_multiple are not validated through constraints like LRA 
is used to, but rather through
a match_parallel predicate which ends up calling ldm_stm_operation_p to 
validate the multiple sets.
But this means that LRA cannot reason about the constraints properly.
This two-regiseter load should not have used *load_multiple anyway, it should 
have used *ldm2_ from ldmstm.md
and indeed it did until the loop2_invariant pass which copied the ldm2_ pattern:
(insn 27 23 28 4 (parallel [
(set (reg:SI 0 r0)
(mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
(set (reg:SI 1 r1)
(mem/u/c:SI (plus:SI (reg/f:SI 147)
(const_int 4 [0x4])) [2 c+4 S4 A32]))
]) "ldm.c":25 385 {*ldm2_}
 (nil))

into:
(insn 55 19 67 3 (parallel [
(set (reg:SI 0 r0)
(mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
(set (reg:SI 158)
(mem/u/c:SI (plus:SI (reg/f:SI 147)
(const_int 4 [0x4])) [2 c+4 S4 A32]))
]) "ldm.c":25 404 {*load_multiple}
 (expr_list:REG_UNUSED (reg:SI 0 r0)
(nil)))

Note that it now got recognised as load_multiple because the second register is 
not a hard register but the pseudo 158.
In any case, the solution suggested in the PR (and I agree with it) is to 
restrict *load_multiple to after reload.
The similar pattern *load_multiple_with_writeback also has a similar condition 
and the comment above *load_multiple says that
it's used to generate epilogues, which is done after reload anyway. For 
pre-reload load-multiples the patterns in ldmstm.md
should do just fine.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-11-30  Kyrylo Tkachov  

PR target/71436
* config/arm/arm.md (*load_multiple): Add reload_completed to
matching condition.

2016-11-30  Kyrylo Tkachov  

PR target/71436
* gcc.c-torture/compile/pr71436.c: New test.






Re: [PATCH][ARM] PR target/71436: Restrict *load_multiple pattern till after LRA

2016-12-15 Thread Richard Earnshaw (lists)
On 30/11/16 16:47, Kyrill Tkachov wrote:
> Hi all,
> 
> In this awkward ICE we have a *load_multiple pattern that is being
> transformed in reload from:
> (insn 55 67 151 3 (parallel [
> (set (reg:SI 0 r0)
> (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
> (set (reg:SI 158 [ c+4 ])
> (mem/u/c:SI (plus:SI (reg/f:SI 147)
> (const_int 4 [0x4])) [2 c+4 S4 A32]))
> ]) arm-crash.c:25 393 {*load_multiple}
>  (expr_list:REG_UNUSED (reg:SI 0 r0)
> (nil)))
> 
> 
> into the invalid:
> (insn 55 67 70 3 (parallel [
> (set (reg:SI 0 r0)
> (mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
> (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
> (const_int -4 [0xfffc])) [4 %sfp+-12
> S4 A32])
> (mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
> (const_int 4 [0x4])) [2 c+4 S4 A32]))
> ]) arm-crash.c:25 393 {*load_multiple}
>  (nil))
> 
> The operands of *load_multiple are not validated through constraints
> like LRA is used to, but rather through
> a match_parallel predicate which ends up calling ldm_stm_operation_p to
> validate the multiple sets.
> But this means that LRA cannot reason about the constraints properly.
> This two-regiseter load should not have used *load_multiple anyway, it
> should have used *ldm2_ from ldmstm.md
> and indeed it did until the loop2_invariant pass which copied the ldm2_
> pattern:
> (insn 27 23 28 4 (parallel [
> (set (reg:SI 0 r0)
> (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
> (set (reg:SI 1 r1)
> (mem/u/c:SI (plus:SI (reg/f:SI 147)
> (const_int 4 [0x4])) [2 c+4 S4 A32]))
> ]) "ldm.c":25 385 {*ldm2_}
>  (nil))
> 
> into:
> (insn 55 19 67 3 (parallel [
> (set (reg:SI 0 r0)
> (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
> (set (reg:SI 158)
> (mem/u/c:SI (plus:SI (reg/f:SI 147)
> (const_int 4 [0x4])) [2 c+4 S4 A32]))
> ]) "ldm.c":25 404 {*load_multiple}
>  (expr_list:REG_UNUSED (reg:SI 0 r0)
> (nil)))
> 
> Note that it now got recognised as load_multiple because the second
> register is not a hard register but the pseudo 158.
> In any case, the solution suggested in the PR (and I agree with it) is
> to restrict *load_multiple to after reload.
> The similar pattern *load_multiple_with_writeback also has a similar
> condition and the comment above *load_multiple says that
> it's used to generate epilogues, which is done after reload anyway. For
> pre-reload load-multiples the patterns in ldmstm.md
> should do just fine.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 

I don't think this is right.  Firstly, these patterns look to me like
the ones used for memcpy expansion, so not recognizing them could lead
to compiler aborts.

Secondly, the bug is when we generate

 (insn 55 67 70 3 (parallel [
 (set (reg:SI 0 r0)
 (mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
 (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
 (const_int -4 [0xfffc])) [4 %sfp+-12
 S4 A32])
 (mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
 (const_int 4 [0x4])) [2 c+4 S4 A32]))
 ]) arm-crash.c:25 393 {*load_multiple}
  (nil))

These patterns are supposed to enforce that the load (store) target
register is a hard register that is higher numbered than the hard
register in the memory slot that precedes it (thus satisfying the
constraints for a normal ldm/stm instruction.

The real question is why did ldm_stm_operation_p permit this modified
pattern through, or was the pattern validation bypassed incorrectly by
the loop2 invariant code when it copied the insn and made changes?

R.

> Thanks,
> Kyrill
> 
> 2016-11-30  Kyrylo Tkachov  
> 
> PR target/71436
> * config/arm/arm.md (*load_multiple): Add reload_completed to
> matching condition.
> 
> 2016-11-30  Kyrylo Tkachov  
> 
> PR target/71436
> * gcc.c-torture/compile/pr71436.c: New test.
> 
> arm-ldm.patch
> 
> 
> commit 996d28e2353badd1b29ef000f94d40c7dab9010f
> Author: Kyrylo Tkachov 
> Date:   Tue Nov 29 15:07:30 2016 +
> 
> [ARM] Restrict *load_multiple pattern till after LRA
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 74c44f3..22d2a84 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -11807,12 +11807,15 @@ (define_insn ""
>  
>  ;; Patterns in ldmstm.md don't cover more than 4 registers. This pattern 
> covers
>  ;; large lists without explicit writeback generated for APCS_FRAME epilogue.
> +;; The operands are validated through the load_multiple_operation
> +;; match_parallel predicate rather than through constraints so enable it only
> +;; after reload.

Re: [PATCH][ARM] PR target/71436: Restrict *load_multiple pattern till after LRA

2016-12-15 Thread Richard Earnshaw (lists)
On 15/12/16 09:55, Richard Earnshaw (lists) wrote:
> On 30/11/16 16:47, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In this awkward ICE we have a *load_multiple pattern that is being
>> transformed in reload from:
>> (insn 55 67 151 3 (parallel [
>> (set (reg:SI 0 r0)
>> (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
>> (set (reg:SI 158 [ c+4 ])
>> (mem/u/c:SI (plus:SI (reg/f:SI 147)
>> (const_int 4 [0x4])) [2 c+4 S4 A32]))
>> ]) arm-crash.c:25 393 {*load_multiple}
>>  (expr_list:REG_UNUSED (reg:SI 0 r0)
>> (nil)))
>>
>>
>> into the invalid:
>> (insn 55 67 70 3 (parallel [
>> (set (reg:SI 0 r0)
>> (mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
>> (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
>> (const_int -4 [0xfffc])) [4 %sfp+-12
>> S4 A32])
>> (mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
>> (const_int 4 [0x4])) [2 c+4 S4 A32]))
>> ]) arm-crash.c:25 393 {*load_multiple}
>>  (nil))
>>
>> The operands of *load_multiple are not validated through constraints
>> like LRA is used to, but rather through
>> a match_parallel predicate which ends up calling ldm_stm_operation_p to
>> validate the multiple sets.
>> But this means that LRA cannot reason about the constraints properly.
>> This two-regiseter load should not have used *load_multiple anyway, it
>> should have used *ldm2_ from ldmstm.md
>> and indeed it did until the loop2_invariant pass which copied the ldm2_
>> pattern:
>> (insn 27 23 28 4 (parallel [
>> (set (reg:SI 0 r0)
>> (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
>> (set (reg:SI 1 r1)
>> (mem/u/c:SI (plus:SI (reg/f:SI 147)
>> (const_int 4 [0x4])) [2 c+4 S4 A32]))
>> ]) "ldm.c":25 385 {*ldm2_}
>>  (nil))
>>
>> into:
>> (insn 55 19 67 3 (parallel [
>> (set (reg:SI 0 r0)
>> (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
>> (set (reg:SI 158)
>> (mem/u/c:SI (plus:SI (reg/f:SI 147)
>> (const_int 4 [0x4])) [2 c+4 S4 A32]))
>> ]) "ldm.c":25 404 {*load_multiple}
>>  (expr_list:REG_UNUSED (reg:SI 0 r0)
>> (nil)))
>>
>> Note that it now got recognised as load_multiple because the second
>> register is not a hard register but the pseudo 158.
>> In any case, the solution suggested in the PR (and I agree with it) is
>> to restrict *load_multiple to after reload.
>> The similar pattern *load_multiple_with_writeback also has a similar
>> condition and the comment above *load_multiple says that
>> it's used to generate epilogues, which is done after reload anyway. For
>> pre-reload load-multiples the patterns in ldmstm.md
>> should do just fine.
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>
>> Ok for trunk?
>>
> 
> I don't think this is right.  Firstly, these patterns look to me like
> the ones used for memcpy expansion, so not recognizing them could lead
> to compiler aborts.
> 
> Secondly, the bug is when we generate
> 
>  (insn 55 67 70 3 (parallel [
>  (set (reg:SI 0 r0)
>  (mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
>  (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
>  (const_int -4 [0xfffc])) [4 %sfp+-12
>  S4 A32])
>  (mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
>  (const_int 4 [0x4])) [2 c+4 S4 A32]))
>  ]) arm-crash.c:25 393 {*load_multiple}
>   (nil))

sorry, pasted the wrong bit of code.

That should read when we generate:

(insn 55 19 67 3 (parallel [
(set (reg:SI 0 r0)
(mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
(set (reg:SI 158)
(mem/u/c:SI (plus:SI (reg/f:SI 147)
(const_int 4 [0x4])) [2 c+4 S4 A32]))
]) "ldm.c":25 404 {*load_multiple}
 (expr_list:REG_UNUSED (reg:SI 0 r0)
(nil)))

ie when we put a pseudo into the register load list.

> 
> These patterns are supposed to enforce that the load (store) target
> register is a hard register that is higher numbered than the hard
> register in the memory slot that precedes it (thus satisfying the
> constraints for a normal ldm/stm instruction.
> 
> The real question is why did ldm_stm_operation_p permit this modified
> pattern through, or was the pattern validation bypassed incorrectly by
> the loop2 invariant code when it copied the insn and made changes?
> 
> R.
> 
>> Thanks,
>> Kyrill
>>
>> 2016-11-30  Kyrylo Tkachov  
>>
>> PR target/71436
>> * config/arm/arm.md (*load_multiple): Add reload_completed to
>> matching condition.
>>
>> 2016-11-30  Kyrylo Tkachov  
>>
>> PR target/71436
>> * gcc.c-torture/compile/pr71436.c: New test.
>>
>> arm-ldm.patch
>>
>>
>> commit 996d28e2353badd1b29ef000f94d40c7

Re: [Patch Doc] Update documentation for __fp16 type

2016-12-15 Thread Jakub Jelinek
On Mon, Dec 12, 2016 at 02:19:47PM -0700, Sandra Loosemore wrote:
> One small grammar nit at the end:
> 
> > > +It is recommended that code which is intended to be portable use the
> > > +@code{_Float16} type defined by ISO/IEC TS 18661-3:2015
> > > +(@xref{Floating Types}).
> 
> Either s/which/that/ or rewrite the beginning of the sentence as "It is
> recommended that portable code use..."
> 
> The patch is OK to commit with that fixed.

I've noticed
../../gcc/doc/extend.texi:1060: warning: `.' or `,' must follow @xref, not )
warning with this.  The following patch adjusts it similarly how other
(@xref{...}.) look like.  @xref actually emits
See Section 6.11 [Floating Types], page 415
in pdf or
*Note Floating Types::
in info.
Tested with building gcc.info and gcc.pdf and checking what is in there.

Ok for trunk?

2016-12-15  Jakub Jelinek  

* doc/extend.texi: Add a dot after @xref{...}.

--- gcc/doc/extend.texi.jj  2016-12-14 20:28:12.0 +0100
+++ gcc/doc/extend.texi 2016-12-15 10:56:14.470702563 +0100
@@ -1057,7 +1057,7 @@ implements conversions between @code{__f
 calls.
 
 It is recommended that portable code use the @code{_Float16} type defined
-by ISO/IEC TS 18661-3:2015 (@xref{Floating Types}).
+by ISO/IEC TS 18661-3:2015.  (@xref{Floating Types}.)
 
 @node Decimal Float
 @section Decimal Floating Types


Jakub


Re: [Patch Doc] Update documentation for __fp16 type

2016-12-15 Thread Andreas Schwab
On Dez 15 2016, Jakub Jelinek  wrote:

> --- gcc/doc/extend.texi.jj2016-12-14 20:28:12.0 +0100
> +++ gcc/doc/extend.texi   2016-12-15 10:56:14.470702563 +0100
> @@ -1057,7 +1057,7 @@ implements conversions between @code{__f
>  calls.
>  
>  It is recommended that portable code use the @code{_Float16} type defined
> -by ISO/IEC TS 18661-3:2015 (@xref{Floating Types}).
> +by ISO/IEC TS 18661-3:2015.  (@xref{Floating Types}.)

I think the parens should be removed.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2016-12-15 Thread Tamar Christina

> On a high level, presumably there's no real value in keeping the old
> code to "fold" fpclassify.  By exposing those operations as integer
> logicals for the fast path, if the FP value becomes a constant during
> the optimization pipeline we'll see the reinterpreted values flowing
> into the new integer logical tests and they'll simplify just like
> anything else.  Right?

Yes, if it becomes a constant it will be folded away, both in the integer and 
the fp case.

> The old IBM format is still supported, though they are expected to be
> moveing towards a standard ieee 128 bit format.  So my only concern is
> that we preserve correct behavior for those cases -- I don't really care
> about optimizing them.  So I think you need to keep them.

Yes, I re-added them. It's mostly a copy paste from what they were in the
other functions. But I have no way of testing it.

> For documenting builtins, using existing builtins as a template.

Yeah, I based them off the fpclassify documentation.

> > +{
> > +  tree type = TREE_TYPE (arg);
> > +
> > +  machine_mode mode = TYPE_MODE (type);
> > +
> > +  const real_format *format = REAL_MODE_FORMAT (mode);
>  > +  const HOST_WIDE_INT type_width = TYPE_PRECISION (type);
> > +  return (format->is_binary_ieee_compatible
> > +   && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
> > +   /* We explicitly disable quad float support on 32 bit systems.  */
> > +   && !(UNITS_PER_WORD == 4 && type_width == 128)
> > +   && targetm.scalar_mode_supported_p (mode));
> > +}
> Presumably this is why you needed the target.h inclusion.
>
> Note that on some systems we even disable 64bit floating point support.
> I suspect this check needs a little re-thinking as I don't think that
> checking for a specific UNITS_PER_WORD is correct, nor is checking the
> width of the type.  I'm not offhand sure what the test should be, just
> that I think we need something better here.

I think what I really wanted to test here is if there was an integer mode 
available
which has the exact width as the floating point one. So I have replaced this 
with
just a call to int_mode_for_mode. Which is probably more correct.

> > +
> > +/* Determines if the given number is a NaN value.
> > +   This function is the last in the chain and only has to
> > +   check if it's preconditions are true.  */
> > +static tree
> > +is_nan (gimple_seq *seq, tree arg, location_t loc)
> So in the old code we checked UNGT_EXPR, in the new code's slow path you
> check UNORDERED.  Was that change intentional?

The old FP code used UNORDERED and the new one was using ORDERED and negating 
the result.
I've replaced it with UNORDERED, but both are correct.

Thanks for the review,
I'll get the new patch out ASAP.

Tamar


Re: [Patch Doc] Update documentation for __fp16 type

2016-12-15 Thread Jakub Jelinek
On Thu, Dec 15, 2016 at 11:11:37AM +0100, Andreas Schwab wrote:
> On Dez 15 2016, Jakub Jelinek  wrote:
> 
> > --- gcc/doc/extend.texi.jj  2016-12-14 20:28:12.0 +0100
> > +++ gcc/doc/extend.texi 2016-12-15 10:56:14.470702563 +0100
> > @@ -1057,7 +1057,7 @@ implements conversions between @code{__f
> >  calls.
> >  
> >  It is recommended that portable code use the @code{_Float16} type defined
> > -by ISO/IEC TS 18661-3:2015 (@xref{Floating Types}).
> > +by ISO/IEC TS 18661-3:2015.  (@xref{Floating Types}.)
> 
> I think the parens should be removed.

We have it in other spots:

extend.texi-stores it into the union as the integer @code{i}, since it is
extend.texi:an integer.  (@xref{Cast to Union}.)
extend.texi-

extend.texi-yields an lvalue, not an rvalue like true casts do.
extend.texi:(@xref{Compound Literals}.)
extend.texi-

invoke.texi-which applies only to functions that are declared using the 
@code{dllexport}
invoke.texi:attribute or declspec (@xref{Function Attributes,,Declaring 
Attributes of
invoke.texi-Functions}.)

invoke.texi-needs for some languages.
invoke.texi:(@xref{Interface,,Interfacing to GCC Output,gccint,GNU Compiler
invoke.texi-Collection (GCC) Internals},
invoke.texi-for more discussion of @file{libgcc.a}.)

The last one looks correct to me, the third one looks weird (the dot inside
of (), but no dot before (.

So, shall we change also the first 3?

Jakub


Re: [Patch Doc] Update documentation for __fp16 type

2016-12-15 Thread Andreas Schwab
On Dez 15 2016, Jakub Jelinek  wrote:

> So, shall we change also the first 3?

Yes, I'd think so.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


RE: [PATCH, testsuite] MIPS: Upgrade to R2 for -mmicromips.

2016-12-15 Thread Toma Tabacu
Committed as r243687.

Regards,
Toma

> Toma Tabacu writes:
> > microMIPS is not supported on pre-R2 architectures, but the testsuite allows
> > it to be used on pre-R2 architectures, which results in test failures.
> >
> > This patch makes the testsuite upgrade to R2 if -mmicromips is used in a 
> > test.
> >
> > Tested with mips-mti-elf.
> >
> > Regards,
> > Toma
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to R2 for
> > -mmicromips.
> 
> OK, thanks.
> 
> Matthew


Re: [Patch Doc] Update documentation for __fp16 type

2016-12-15 Thread Jakub Jelinek
On Thu, Dec 15, 2016 at 11:23:14AM +0100, Andreas Schwab wrote:
> On Dez 15 2016, Jakub Jelinek  wrote:
> 
> > So, shall we change also the first 3?
> 
> Yes, I'd think so.

So here is it in patch form.  Is this ok for trunk?

2016-12-15  Jakub Jelinek  

* doc/extend.texi: Clean up @xref{...} uses.
* doc/invoke.texi: Likewise.

--- gcc/doc/extend.texi.jj  2016-12-14 20:28:12.0 +0100
+++ gcc/doc/extend.texi 2016-12-15 11:26:07.867736292 +0100
@@ -1057,7 +1057,7 @@ implements conversions between @code{__f
 calls.
 
 It is recommended that portable code use the @code{_Float16} type defined
-by ISO/IEC TS 18661-3:2015 (@xref{Floating Types}).
+by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
 
 @node Decimal Float
 @section Decimal Floating Types
@@ -2089,7 +2089,7 @@ union foo f = @{ .d = 4 @};
 converts 4 to a @code{double} to store it in the union using
 the second element.  By contrast, casting 4 to type @code{union foo}
 stores it into the union as the integer @code{i}, since it is
-an integer.  (@xref{Cast to Union}.)
+an integer.  @xref{Cast to Union}.
 
 You can combine this technique of naming elements with ordinary C
 initialization of successive elements.  Each initializer element that
@@ -2181,7 +2181,7 @@ specified is a union type.  You can spec
 @code{union} keyword or with a @code{typedef} name that refers to
 a union.  A cast to a union actually creates a compound literal and
 yields an lvalue, not an rvalue like true casts do.
-(@xref{Compound Literals}.)
+@xref{Compound Literals}.
 
 The types that may be cast to the union type are those of the members
 of the union.  Thus, given the following union and variables:
--- gcc/doc/invoke.texi.jj  2016-12-15 10:26:15.0 +0100
+++ gcc/doc/invoke.texi 2016-12-15 11:25:19.226386092 +0100
@@ -7262,8 +7262,8 @@ release to an another.
 @opindex fno-keep-inline-dllexport
 This is a more fine-grained version of @option{-fkeep-inline-functions},
 which applies only to functions that are declared using the @code{dllexport}
-attribute or declspec (@xref{Function Attributes,,Declaring Attributes of
-Functions}.)
+attribute or declspec.  @xref{Function Attributes,,Declaring Attributes of
+Functions}.
 
 @item -fkeep-inline-functions
 @opindex fkeep-inline-functions


Jakub


Re: [PATCH] combine: Replace sign_extend with zero_extend more often.

2016-12-15 Thread Segher Boessenkool
On Wed, Dec 14, 2016 at 01:39:13PM +0100, Dominik Vogt wrote:
> There may be a slight imprecision in expand_compound_operation.
> When it encounters a SIGN_EXTEND where it's already known that the
> sign bit is zero, it may replace that with a ZERO_EXTEND (and
> tries to simplify that further).  However, the pattern is only
> replaced if the new set_src_cost() is _lower_ than the old cost.
> 
> The patch changes that to "not higher than", assuming that the
> ZERO_EXTEND form is generally preferrable unless there is a reason
> to believe it's not (i.e. its cost is higher).  The comment atop
> this code block seems to support this:
> 
>   /* Convert sign extension to zero extension, if we know that the high
>  bit is not set, as this is easier to optimize.  It will be converted
>  back to cheaper alternative in make_extraction.  */
> 
> On s390[x] this gets rid of some SIGN_EXTENDs completely.
> 
> (The patched code uses the cheaper of both replacement patterns.)

That looks fine.  But see below.

> The patch hasn't got a lot of testing yet as I'd like to hear your
> opinion on the patch first.

I am testing it on powerpc.  Please also test on x86?

> gcc/ChangeLog-signextend-1
> 
>   * combine.c (expand_compound_operation): Substitute ZERO_EXTEND for
>   SIGN_EXTEND if the costs are equal or lower.
>   Choose the cheapest replacement.

>/* Make sure this is a profitable operation.  */
>if (set_src_cost (x, mode, optimize_this_for_speed_p)
> -  > set_src_cost (temp2, mode, optimize_this_for_speed_p))
> -   return temp2;
> -  else if (set_src_cost (x, mode, optimize_this_for_speed_p)
> -   > set_src_cost (temp, mode, optimize_this_for_speed_p))
> -   return temp;
> -  else
> -   return x;
> +   >= set_src_cost (temp2, mode, optimize_this_for_speed_p))
> + x = temp2;
> +  if (set_src_cost (x, mode, optimize_this_for_speed_p)
> +   >= set_src_cost (temp, mode, optimize_this_for_speed_p))
> + x = temp;
> +  return x;
>  }

So this prefers the zero_extend version over the expand_compound_operation
version, I wonder if that is a good idea.


Segher


[PATCH] Fix optimized out volatile MEM_REF (PR, tree-optimization/78810)

2016-12-15 Thread Martin Liška
The patch adds TREE_THIS_VOLATILE check that was removed in r239778.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From dc8ec6815fa51b352fe5f1a02d3510022053e0ad Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Dec 2016 16:07:56 +0100
Subject: [PATCH] Fix optimized out volatile MEM_REF (PR
 tree-optimization/78810)

gcc/testsuite/ChangeLog:

2016-12-14  Martin Liska  

	PR tree-optimization/78810
	* g++.dg/tree-ssa/pr78810.C: New test.

gcc/ChangeLog:

2016-12-14  Martin Liska  

	PR tree-optimization/78810
	* tree-ssa.c (non_rewritable_mem_ref_base): Add TREE_THIS_VOLATILE
	check removed in r239778.
---
 gcc/testsuite/g++.dg/tree-ssa/pr78810.C | 26 ++
 gcc/tree-ssa.c  |  3 ++-
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr78810.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr78810.C b/gcc/testsuite/g++.dg/tree-ssa/pr78810.C
new file mode 100644
index 000..1cda30c5bd8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr78810.C
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+class NODE;
+typedef unsigned long VALUE;
+
+struct parser_params
+{
+};
+
+struct parser_params *get_parser();
+NODE *yycompile(parser_params *parser, VALUE a, VALUE b);
+
+NODE*
+rb_parser_compile_file_path(VALUE vparser, VALUE fname, VALUE file, int start)
+{
+struct parser_params *parser;
+parser = get_parser(); 
+
+NODE *node = yycompile(parser, fname, start);
+(*({volatile VALUE *rb_gc_guarded_ptr = (&(vparser)); rb_gc_guarded_ptr;}));
+
+return node;
+}
+
+/* { dg-final { scan-tree-dump "MEM\\\[\\\(volatile\\\ VALUE\\\ \\\*\\\)" "optimized" } } */
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 62eea8bb8a4..b92513fcf23 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1385,7 +1385,8 @@ non_rewritable_mem_ref_base (tree ref)
   if (! DECL_P (decl))
 	return NULL_TREE;
   if (! is_gimple_reg_type (TREE_TYPE (base))
-	  || VOID_TYPE_P (TREE_TYPE (base)))
+	  || VOID_TYPE_P (TREE_TYPE (base))
+	  || TREE_THIS_VOLATILE (decl) != TREE_THIS_VOLATILE (base))
 	return decl;
   if ((TREE_CODE (TREE_TYPE (decl)) == VECTOR_TYPE
 	   || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
-- 
2.11.0



[PATCH] [Match & Simplify] Optimize some minmax patterns

2016-12-15 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that optimizes some patterns
in maxmin on same variabes with constants.

Bootstrapped and Regression tested on x86_64 & aarch64-thunder-linux.

Please review the patch and let us know if its okay?

2016-12-15  Andrew Pinski  
 Naveen H.S 
gcc
* match.pd (max:c @0 (plus@2 @0 INTEGER_CST@1)): New Pattern.
(min:c @0 (plus@2 @0 INTEGER_CST@1)) : New Pattern.
gcc/testsuite
* gcc.dg/max.c: New Testcase.
* gcc.dg/min.c: New Testcase.
   

diff --git a/gcc/match.pd b/gcc/match.pd
index f4cc2d8..ff5e97b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1324,6 +1324,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax().  */
 
+/* max (a, a + CST) -> a + CST where CST is positive.  */
+/* max (a, a + CST) -> a where CST is negative.  */
+(simplify
+ (max:c @0 (plus@2 @0 INTEGER_CST@1))
+  (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+   (if (tree_int_cst_sgn (@1) > 0)
+@2
+@0)))
+
+/* min (a, a + CST) -> a where CST is positive.  */
+/* min (a, a + CST) -> a + CST where CST is negative. */
+(simplify
+ (min:c @0 (plus@2 @0 INTEGER_CST@1))
+  (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+   (if (tree_int_cst_sgn (@1) > 0)
+@0
+@2)))
+
 (for minmax (min max FMIN FMAX)
  (simplify
   (minmax @0 @0)
diff --git a/gcc/testsuite/gcc.dg/max.c b/gcc/testsuite/gcc.dg/max.c
new file mode 100644
index 000..e979810
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/max.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+static inline int
+max (int a, int b)
+{
+  return a < b ? b : a;
+}
+
+int
+test_00 (int a)
+{
+  return max (a, a + 8);
+}
+
+int
+test_01 (int a)
+{
+  return max (a, a - 8);
+}
+
+/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/min.c b/gcc/testsuite/gcc.dg/min.c
new file mode 100644
index 000..d847270
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/min.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+static inline int
+min (int a, int b)
+{
+  return a < b ? a : b;
+}
+
+int
+test_00 (int a)
+{
+  return min (a, a + 8);
+}
+
+int
+test_01 (int a)
+{
+  return min (a, a - 8);
+}
+
+/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */


Re: [PATCH] Fill bitregion_{start,end} in store_constructor (PR, tree-optimization/78428).

2016-12-15 Thread Martin Liška
On 12/13/2016 03:35 PM, Richard Biener wrote:
> OK from my POV.
> 
> Thanks,
> Richard.

Hi.

I prepared backport (and tested) for GCC 5 and 6 branches,
may I install the patch after 6.3 will be released?

Thanks,
Martin


Re: Pretty printers for versioned namespace

2016-12-15 Thread Jonathan Wakely

On 14/12/16 22:49 +0100, François Dumont wrote:

On 09/12/2016 16:18, Jonathan Wakely wrote:


But I don't know how to fix this so for the moment I just adapt it 
to correctly handle std::__7::string.


But that's not correct. Please try to understand the point I'm making:
The name "std::__7::string" does not appear in a symbol name.
Ok, the only point I don't get yet is why std::string is a symbol but 
std::__7::string is not. It seems inconsistent.


The demangler has special handling for std::basic_string, std::allocator> to treat it as
std::string, because that's much more user-friendly. But when it sees
a different symbol, like foo::basic_string, or std::__7::... it
doesn't match and doesn't get special handling.

Being consistent doesn't matter, the point is to be user-friendly.

Users want to see std::string, not std::basic_string.

Looking at all the ugly changes needed to the tests, I'm wondering if
we actually want to strip the __7:: namespace out of the typenames
displayed by printers. That would mean we don't need to change the
tests, but more importantly, it would mean the users see "std::vector"
not "std::__7::vector". The versioned namespace is an implementation
detail, not something they write explicitly in their code.

Your change makes the printer tests pass for the versioned namespace,
but does it make the printers useful for users? We want the printers
to be useful, not just pass our testsuite.




This works for me:

@@ -946,9 +950,10 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
   m = re.match(rx, func.function.name)
   if not m:
   raise ValueError("Unknown manager function in %s" % 
self.typename)

-
-# FIXME need to expand 'std::string' so that 
gdb.lookup_type works
-mgrname = re.sub("std::string(?!\w)", 
str(gdb.lookup_type('std::string').strip_typedefs()), m.group(1))

+mgrname = m.group(1)
+if not typename.startswith('std::' + vers_nsp):
+# FIXME need to expand 'std::string' so that 
gdb.lookup_type works
+mgrname = re.sub("std::string(?!\w)", 
str(gdb.lookup_type('std::string').strip_typedefs()), mgrname)


I think it doesn't work in all situations as this code is also used 
for std::experimental::any so typename doesn't start with std::__7:: 
but there is still no std::string symbol.


Ah yes, I forgot about the experimental one.


So I propose:

   mgrname = m.group(1)
   if 'std::string' in mgrname:
   # FIXME need to expand 'std::string' so that 
gdb.lookup_type works
   mgrname = re.sub("std::string(?!\w)", 
str(gdb.lookup_type('std::string').strip_typedefs()), m.group(1))


as you will see in attach patch.


Looks good.

I eventually use '__7' explicitely in some pretty printers tests 
because '__\d+' was not working, don't know.


Yes, I also found that wasn't working. Presumably Tcl doesn't support
that syntax.


Ok to commit once tests have completed ?




Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-12-15 Thread Jonathan Wakely

On 14/12/16 18:38 -0300, Felipe Magno de Almeida wrote:

Hello Jonathan,

Sorry for the delay, I was in mid-vacation.

Comments are inline.

On Tue, Dec 6, 2016 at 3:45 PM, Jonathan Wakely  wrote:

On 16/09/16 02:53 -0300, Felipe Magno de Almeida wrote:




[snip]


I've tried both approaches. Templates were causing problems of not
defined instantations because they were being used as ints too
in other _M_extract functions through a tmp integer. And typedef's
caused the same problem of having to use a tmp value of the right
type but for example _M_extract_wday_or_month could not have the
same type (in AVR they do) and I'd have to use a temporary anyway
then.

This was the least intrusive way.



Did you consider something like this?


I have, but I did not like how it could run code that is not very explicit
in case of exceptions, possibly assigning from a non-initialized variable,


OK, but that assignment from a possibly-uninitialized variable was
also present in your original patch.


causing, theoretically UB. Besides, __mem was already used for
tm_mom, so it seemed best to just mimic for the rest.


Yes, but now we have that overhead of extra reads and writes for every
function, not just that one.


I have another patch attached which fixes the problem you mentioned
and the same problem with the tm_wday in a different switch-case.
I've removed all initializations of __mem, the same way that tm_mon
already did not initialize it, because the value is not actually used at
all in extract_num and variants.


Doesn't this now mean that if an error happens in the _M_extract_xxx
function we set tm.tm_xxx to garbage?

The existing tm.tm_mon case handles that correctly, by checking for an
error first.


I have ran the test and nothing on locale failed, I've ran again with the fixes,
and there are also no errors on locale. Testing with x86_64.


Good, thanks for testing it.



[Patch] Undermine the jump threading cost model to fix PR77445.

2016-12-15 Thread James Greenhalgh

Hi,

As mentioned in PR77445, the improvements to the jump threading cost model
this year have caused substantial regressions in the amount of jump threading
we do and the performance of workloads which rely on that threading.

This patch represents the low-bar in fixing the performance issues reported
in PR77445 - by weakening the cost model enough that we thread in a way much
closer to GCC 6. I don't think this patch is likely to be acceptable for
trunk, but I'm posting it for consideration regardless. 

Under the new cost model, if the edge doesn't pass optimize_edge_for_speed_p,
then we don't thread. The problem in late threading is bad edge profile
data makes the edge look cold, and thus it fails optimize_edge_for_speed_p
and is no longer considered a candidate for threading. As an aside, I think
this is the wrong cost model for jump threading, where you get the most
impact if you can resolve unpredictable switch statements - which by their
nature may have multiple cold edges in need of threading.

Early threading should avoid these issues, as there is no edge profile
info yet. optimize_edge_for_speed_p is therefore more likely to hold, but
the condition for threading is:

  if (speed_p && optimize_edge_for_speed_p (taken_edge))
{
  if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
{
  [...reject threading...]
}
}
  else if (n_insns > 1)
{
  [...reject threading...]
}

With speed_p is hardwired to false for the early threader
( pass_early_thread_jumps::execute ):

find_jump_threads_backwards (bb, false);

So we always fall to the n_insns > 1 case and thus only rarely get to
thread.

In this patch I change that call in pass_early_thread_jumps::execute to
instead look at optimize_bb_for_speed_p (bb) . That allows the speed_p
check to pass in the main threading cost model, and then the
optimize_edge_for_speed_p can also pass. That gets the first stage of
jump-threading back working in a proprietary benchmark which is sensitive
to this optimisation.

To get the rest of the required jump threading, I also have to weaken the
cost model - and this is obviously a hack! The easy hack is to special case
when the taken edge has frequency zero, and permit jump threading there.

I know this patch is likely not the preferred way to fix this. For me that
would be a change to the cost model, which as I mentioned above I think
misses the point about which edges we want to thread. By far the best
fix would be to the junk edge profiling data we create during threading.

However, this patch does fix the performance issues identified in PR77445,
and does highlight a fundamental issue with the early threader (which
doesn't seem to me like it will be effective while it sets speed_p to
false), so I'd like it to be considered for trunk if no better fix appears
before stage 4.

Bootstrapped on x86_64 with no issues. The testsuite changes just reshuffle
which passes spot the threading opportunities.

OK?

Thanks,
James

---
gcc/

2016-12-15  James Greenhalgh  

PR tree-optimization/77445
* tree-ssa-threadbackward.c (profitable_jump_thread_path) Work
around sometimes corrupt edge frequency data.
(pass_early_thread_jumps::execute): Pass
optimize_bb_for_speed_p as the speed_p parameter to
find_jump_threads_backwards to enable threading in more cases.

gcc/testsuite/

2016-12-15  James Greenhalgh  

PR tree-optimization/77445
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust options and dump passes.
* gcc.dg/tree-ssa/pr66752-3.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index 896c8bf..39ec3d6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
+/* { dg-options "-O2 -fdump-tree-ethread-details -fdump-tree-dce2" } */
 
 extern int status, pt;
 extern int count;
@@ -34,7 +34,7 @@ foo (int N, int c, int b, int *a)
 
 /* There are 4 FSM jump threading opportunities, all of which will be
realized, which will eliminate testing of FLAG, completely.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 4 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 4 "ethread"} } */
 
 /* There should be no assignments or references to FLAG, verify they're
eliminated as early as possible.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 9a9d1cb..5b087fb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,8 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-gu

Re: [PATCH][AArch64] Split X-reg UBFX into W-reg LSR when possible

2016-12-15 Thread James Greenhalgh
On Thu, Dec 08, 2016 at 09:35:08AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> In this patch we split X-register UBFX instructions that extract up to the
> edge of a W-register into a W-register LSR instruction. So for the example in
> the testcase instead of:

> UBFXX0, X0, 24, 8
> 
> we'd generate:
> LSR w0, w0, 24
> 
> An LSR is a simpler instruction and there's a higher chance that it can be
> combined with other instructions.
> 
> To do this the patch separates the sign_extract case from the zero_extract
> case in the * ANY_EXTRACT pattern and further splits the
> SImode/DImode patterns from the resulting zrero_extract pattern.
> The DImode zero_extract pattern then becomes a define_insn_and_split that
> splits into a zero_extend of an lshiftrt when the bitposition and width of
> the zero_extract add up to 32.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low
> risk.  I'm happy for it to wait for the next release if necessary.

I'm OK with the idea, and I'm OK taking this in for Stage 3, but I'm not
convinced by the implementation.

> 2016-12-08  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.md (*): Split into...
> (*extv): ...This...
> (*extzvsi): ...This...
> (*extzvdi:): ... And this.  Add splitting to lshiftrt when possible.

Why do we want to to it this way, rather than simply defining a single
"split" which works in the case you're trying to catch.

i.e. (untested)

(define_split
   [(set (match_operand:DI 0 "register_operand")
(zero_extract:DI (match_operand:DI 1 "register_operand")
 (match_operand 2
   "aarch64_simd_shift_imm_offset_di")
 (match_operand 3
   "aarch64_simd_shift_imm_di")))]
  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
 1, GET_MODE_BITSIZE (DImode) - 1)
   && (INTVAL (operands[2]) + INTVAL (operands[3]))
== GET_MODE_BITSIZE (SImode)"
  [(set (match_dup 0)
(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3]
  {
operands[4] = gen_lowpart (SImode, operands[1]);
  }
)

Thanks,
James

> 
> 2016-12-08  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/ubfx_lsr_1.c: New test.

> diff --git a/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c 
> b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
> new file mode 100644
> index 
> ..bc083862976a88190dbef97a247be8a10b277a12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Check that an X-reg UBFX can be simplified into a W-reg LSR.  */
> +
> +int
> +f (unsigned long long x)
> +{
> +   x = (x >> 24) & 255;
> +   return x + 1;
> +}
> +
> +/* { dg-final { scan-assembler "lsr\tw" } } */
> +/* { dg-final { scan-assembler-not "ubfx\tx" } } */



Re: [PATCH][AArch64] Split X-reg UBFIZ into W-reg LSL when possible

2016-12-15 Thread James Greenhalgh
On Thu, Dec 08, 2016 at 09:35:09AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> Similar to the previous patch this transforms X-reg UBFIZ instructions into
> W-reg LSL instructions when the UBFIZ operands add up to 32, so we can take
> advantage of the implicit zero-extension to DImode
> when writing to a W-register.
> 
> This is done by splitting the existing *andim_ashift_bfi pattern into
> its two SImode and DImode specialisations and changing the DImode pattern
> into a define_insn_and_split that splits into a
> zero-extended SImode ashift when the operands match up.
> 
> So for the code in the testcase we generate:
> LSL W0, W0, 5
> 
> instead of:
> UBFIZ   X0, X0, 5, 27
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low
> risk.  I'm happy for it to wait for the next release if necessary.

My comments on the previous patch also apply here. This patch should only
need to add one new split pattern.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2016-12-08  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.md (*andim_ashift_bfiz): Split into...
> (*andim_ashiftsi_bfiz): ...This...
> (*andim_ashiftdi_bfiz): ...And this.  Add split to ashift when
> possible.
> 
> 2016-12-08  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/ubfiz_lsl_1.c: New test.




Re: C++ Patch Ping

2016-12-15 Thread Nathan Sidwell

On 12/15/2016 03:34 AM, Jakub Jelinek wrote:

Hi!

I'd like to ping the

http://gcc.gnu.org/ml/gcc-patches/2016-12/msg00698.html
P0490R0 GB 20: decomposition declaration should commit to tuple interpretation 
early


+  if (inst == error_mark_node)
+return NULL_TREE;

This check is unneeded, because complete_type DTRT with error_mark_node

+  inst = complete_type (inst);
+  if (!COMPLETE_TYPE_P (inst))
+return NULL_TREE;


--
Nathan Sidwell


Re: C++ Patch Ping

2016-12-15 Thread Jakub Jelinek
On Thu, Dec 15, 2016 at 07:14:15AM -0500, Nathan Sidwell wrote:
> On 12/15/2016 03:34 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > I'd like to ping the
> > 
> > http://gcc.gnu.org/ml/gcc-patches/2016-12/msg00698.html
> > P0490R0 GB 20: decomposition declaration should commit to tuple 
> > interpretation early
> 
> +  if (inst == error_mark_node)
> +return NULL_TREE;
> 
> This check is unneeded, because complete_type DTRT with error_mark_node
> 
> +  inst = complete_type (inst);
> +  if (!COMPLETE_TYPE_P (inst))
> +return NULL_TREE;

I don't think so.  complete_type (error_mark_node) returns error_mark_node,
and COMPLETE_TYPE_P (error_mark_node) is invalid (should fail TYPE_CHECK in
checking compiler).

I can write it as
  inst = complete_type (inst);
  if (inst == error_mark_node || !COMPLETE_TYPE_P (inst))
return NULL_TREE;

Jakub


[PATCH] c++/pr77585 bogus this error with generic lambda

2016-12-15 Thread Nathan Sidwell
77585 concerns the instantiation of a generic lambda that contains a 
call to a non-dependent non-static member function.


  auto lam = [&](auto) { return Share (); };
  r += Eat (lam);  // instantation of lambda::operator() here

During instantiation of the call to Share, maybe_resolve_dummy gets 
called and uses current_nonlambda_class_type, which peeks up the 
current_current_class stack.


That peeking presupposes we're actually pushing and popping class scopes 
as we enter them all the way from the global scope.  But that doesn't 
always happen in instantiation.  push_nested_class pushes the 
immediately enclosing scopes, but stops at function scope.  So we don't 
get the class scope of that function pushed.  Thus stack peeking fails.


This hasn't previously been an instantiation problem, because templates 
couldn't be defined at local scope.  But generic lambdas now have that 
property (wrt this capture at least).


This patch amends instantiate_decl to first push the containing 
non-lambda class scope before start_preparsed_function does its stack 
pushing.


ok?

nathan
--
Nathan Sidwell
2016-12-14  Nathan Sidwell  

	PR c++/77585
	* pt.c (instantiate_decl): Push to class scope lambda resides
	within when instantiating a generic lambda function.

	PR c++/77585
	* g++.dg/cpp1y/pr77585.C: New.

Index: cp/pt.c
===
--- cp/pt.c	(revision 243661)
+++ cp/pt.c	(working copy)
@@ -22483,6 +22483,7 @@ instantiate_decl (tree d, int defer_ok,
   tree tmpl_parm;
   tree spec_parm;
   tree block = NULL_TREE;
+  tree lambda_ctx = NULL_TREE;
 
   /* Save away the current list, in case we are instantiating one
 	 template from within the body of another.  */
@@ -22496,7 +22497,23 @@ instantiate_decl (tree d, int defer_ok,
 	  && TREE_CODE (DECL_CONTEXT (code_pattern)) == FUNCTION_DECL)
 	block = push_stmt_list ();
   else
-	start_preparsed_function (d, NULL_TREE, SF_PRE_PARSED);
+	{
+	  if (LAMBDA_FUNCTION_P (d))
+	{
+	  /* When instantiating a lambda's templated function
+		 operator, we need to push the non-lambda class scope
+		 of the lambda itself so that the nested function
+		 stack is sufficiently correct to deal with this
+		 capture.  */
+	  lambda_ctx = DECL_CONTEXT (d);
+	  do 
+		lambda_ctx = decl_type_context (TYPE_NAME (lambda_ctx));
+	  while (lambda_ctx && LAMBDA_TYPE_P (lambda_ctx));
+	  if (lambda_ctx)
+		push_nested_class (lambda_ctx);
+	}
+	  start_preparsed_function (d, NULL_TREE, SF_PRE_PARSED);
+	}
 
   /* Some typedefs referenced from within the template code need to be
 	 access checked at template instantiation time, i.e now. These
@@ -22564,6 +22581,8 @@ instantiate_decl (tree d, int defer_ok,
 	  d = finish_function (0);
 	  expand_or_defer_fn (d);
 	}
+  if (lambda_ctx)
+	pop_nested_class ();
 
   if (DECL_OMP_DECLARE_REDUCTION_P (code_pattern))
 	cp_check_omp_declare_reduction (d);
Index: testsuite/g++.dg/cpp1y/pr77585.C
===
--- testsuite/g++.dg/cpp1y/pr77585.C	(revision 0)
+++ testsuite/g++.dg/cpp1y/pr77585.C	(working copy)
@@ -0,0 +1,41 @@
+// PR c++/77585
+// { dg-do run { target c++14 } }
+
+// Confusion about this capture when instantiating generic lambda's
+// function operator
+
+template  int Eat (F &&f) { return f (1); }
+
+struct Foo {
+  int x = 1;
+  int Share () { return x++; }
+  int Frob (int);
+};
+
+int Foo::Frob (int r)
+{
+  auto lam = [&](auto) { return Share (); };
+  r += Eat (lam);
+
+  auto lam0 = [&](auto) {
+auto lam1 = [&](auto) { return Share (); };
+return Eat (lam1); };
+  r += Eat (lam0);
+
+  return r;
+}
+
+int Frob (int r) 
+{
+  auto lam = [&](auto) { return 1; };
+  r += Eat (lam);
+  return r;
+}
+
+
+int main ()
+{
+  Foo f;
+  
+  return Frob (f.Frob (0)) == 4 ? 0 : 1;
+}


Re: C++ Patch Ping

2016-12-15 Thread Nathan Sidwell

On 12/15/2016 07:26 AM, Jakub Jelinek wrote:


I don't think so.  complete_type (error_mark_node) returns error_mark_node,
and COMPLETE_TYPE_P (error_mark_node) is invalid (should fail TYPE_CHECK in
checking compiler).

I can write it as
  inst = complete_type (inst);
  if (inst == error_mark_node || !COMPLETE_TYPE_P (inst))
return NULL_TREE;


that's probably better, because complete_type can return error_mark_node 
if 'something goes horribly wrong'



--
Nathan Sidwell


[PATCH] Tweak formatting and docs for pretty printers

2016-12-15 Thread Jonathan Wakely

Trivial tweaks.

* python/libstdcxx/v6/printers.py (UniquePointerPrinter.to_string):
Remove redundant parentheses.
(RbtreeIterator, StdRbtreeIteratorPrinter): Add docstrings.
(StdForwardListPrinter.to_string): Remove redundant parentheses.
(StdExpOptionalPrinter.to_string): Use string formatting instead of
concatenation.
(StdVariantPrinter.to_string, StdNodeHandlePrinter.to_string)
(TemplateTypePrinter): Adjust whitespace.

Tested x86_64-linux, committed to trunk.

commit c0404cb36fc51cd8bc954978988598819d334b35
Author: Jonathan Wakely 
Date:   Thu Dec 15 11:56:58 2016 +

Tweak formatting and docs for pretty printers

* python/libstdcxx/v6/printers.py (UniquePointerPrinter.to_string):
Remove redundant parentheses.
(RbtreeIterator, StdRbtreeIteratorPrinter): Add docstrings.
(StdForwardListPrinter.to_string): Remove redundant parentheses.
(StdExpOptionalPrinter.to_string): Use string formatting instead of
concatenation.
(StdVariantPrinter.to_string, StdNodeHandlePrinter.to_string)
(TemplateTypePrinter): Adjust whitespace.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 3a111d7..9d84b4f 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -133,8 +133,8 @@ class UniquePointerPrinter:
 v = self.val['_M_t']['_M_head_impl']
 else:
 raise ValueError("Unsupported implementation for unique_ptr: %s" % 
self.val.type.fields()[0].type.tag)
-return ('std::unique_ptr<%s> containing %s' % (str(v.type.target()),
-   str(v)))
+return 'std::unique_ptr<%s> containing %s' % (str(v.type.target()),
+  str(v))
 
 def get_value_from_aligned_membuf(buf, valtype):
 """Returns the value held in a __gnu_cxx::__aligned_membuf."""
@@ -428,6 +428,11 @@ class StdStackOrQueuePrinter:
 return None
 
 class RbtreeIterator(Iterator):
+"""
+Turn an RB-tree-based container (std::map, std::set etc.) into
+a Python iterable object.
+"""
+
 def __init__(self, rbtree):
 self.size = rbtree['_M_t']['_M_impl']['_M_node_count']
 self.node = rbtree['_M_t']['_M_impl']['_M_header']['_M_left']
@@ -480,7 +485,7 @@ def get_value_from_Rb_tree_node(node):
 # std::map::iterator), and has nothing to do with the RbtreeIterator
 # class above.
 class StdRbtreeIteratorPrinter:
-"Print std::map::iterator"
+"Print std::map::iterator, std::set::iterator, etc."
 
 def __init__ (self, typename, val):
 self.val = val
@@ -891,8 +896,8 @@ class StdForwardListPrinter:
 
 def to_string(self):
 if self.val['_M_impl']['_M_head']['_M_next'] == 0:
-return 'empty %s' % (self.typename)
-return '%s' % (self.typename)
+return 'empty %s' % self.typename
+return '%s' % self.typename
 
 class SingleObjContainerPrinter(object):
 "Base class for printers of containers of single objects"
@@ -994,9 +999,10 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 
 def to_string (self):
 if self.contained_value is None:
-return self.typename + " [no contained value]"
+return "%s [no contained value]" % self.typename
 if hasattr (self.visualizer, 'children'):
-return self.typename + " containing " + self.visualizer.to_string 
()
+return "%s containing %s" % (self.typename,
+ self.visualizer.to_string())
 return self.typename
 
 class StdVariantPrinter(SingleObjContainerPrinter):
@@ -1032,7 +1038,8 @@ class StdVariantPrinter(SingleObjContainerPrinter):
 if self.contained_value is None:
 return "%s [no contained value]" % self.typename
 if hasattr(self.visualizer, 'children'):
-return "%s [index %d] containing %s" % (self.typename, self.index, 
self.visualizer.to_string())
+return "%s [index %d] containing %s" % (self.typename, self.index,
+
self.visualizer.to_string())
 return "%s [index %d]" % (self.typename, self.index)
 
 class StdNodeHandlePrinter(SingleObjContainerPrinter):
@@ -1060,7 +1067,6 @@ class StdNodeHandlePrinter(SingleObjContainerPrinter):
'array')
 
 def to_string(self):
-
 desc = 'node handle for '
 if not self.is_rb_tree_node:
 desc += 'unordered '
@@ -1230,7 +1236,8 @@ class Printer(object):
 libstdcxx_printer = None
 
 class TemplateTypePrinter(object):
-r"""A type printer for class templates.
+r"""
+A type printer for class templates.
 
 Recognizes type names that match a regular expression.
 Replaces them with a

[PATCH] Add GDB XMethods for shared_ptr and unique_ptr

2016-12-15 Thread Jonathan Wakely

This makes the Xmethods work for unique_ptr, including
conditionally enabling operator* and operator-> only for non-arrays,
and enabling operator[] only for arrays. And then adds similar
Xmethods for shared_ptr.

* python/libstdcxx/v6/xmethods.py (UniquePtrGetWorker.__init__): Use
correct element type for unique_ptr.
(UniquePtrGetWorker._supports, UniquePtrDerefWorker._supports): New
functions to disable unsupported operators for unique_ptr.
(UniquePtrSubscriptWorker): New worker for operator[].
(UniquePtrMethodsMatcher.__init__): Register UniquePtrSubscriptWorker.
(UniquePtrMethodsMatcher.match): Call _supports on the chosen worker.
(SharedPtrGetWorker, SharedPtrDerefWorker, SharedPtrSubscriptWorker)
(SharedPtrUseCountWorker, SharedPtrUniqueWorker): New workers.
(SharedPtrMethodsMatcher): New matcher for shared_ptr.
(register_libstdcxx_xmethods): Register SharedPtrMethodsMatcher.
* testsuite/libstdc++-xmethods/unique_ptr.cc: Test arrays.
* testsuite/libstdc++-xmethods/shared_ptr.cc: New test.

Tested x86_64-linux, committed to trunk.


commit 15de524115a74e9415d6a13378c1cc608d018459
Author: Jonathan Wakely 
Date:   Thu Dec 15 11:46:15 2016 +

Add GDB XMethods for shared_ptr and unique_ptr

* python/libstdcxx/v6/xmethods.py (UniquePtrGetWorker.__init__): Use
correct element type for unique_ptr.
(UniquePtrGetWorker._supports, UniquePtrDerefWorker._supports): New
functions to disable unsupported operators for unique_ptr.
(UniquePtrSubscriptWorker): New worker for operator[].
(UniquePtrMethodsMatcher.__init__): Register UniquePtrSubscriptWorker.
(UniquePtrMethodsMatcher.match): Call _supports on the chosen worker.
(SharedPtrGetWorker, SharedPtrDerefWorker, SharedPtrSubscriptWorker)
(SharedPtrUseCountWorker, SharedPtrUniqueWorker): New workers.
(SharedPtrMethodsMatcher): New matcher for shared_ptr.
(register_libstdcxx_xmethods): Register SharedPtrMethodsMatcher.
* testsuite/libstdc++-xmethods/unique_ptr.cc: Test arrays.
* testsuite/libstdc++-xmethods/shared_ptr.cc: New test.

diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index 045b661..1c9bf3a 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -565,8 +565,14 @@ class 
AssociativeContainerMethodsMatcher(gdb.xmethod.XMethodMatcher):
 # Xmethods for std::unique_ptr
 
 class UniquePtrGetWorker(gdb.xmethod.XMethodWorker):
+"Implements std::unique_ptr::get() and std::unique_ptr::operator->()"
+
 def __init__(self, elem_type):
-self._elem_type = elem_type
+self._is_array = elem_type.code == gdb.TYPE_CODE_ARRAY
+if self._is_array:
+self._elem_type = elem_type.target()
+else:
+self._elem_type = elem_type
 
 def get_arg_types(self):
 return None
@@ -574,6 +580,10 @@ class UniquePtrGetWorker(gdb.xmethod.XMethodWorker):
 def get_result_type(self, obj):
 return self._elem_type.pointer()
 
+def _supports(self, method_name):
+"operator-> is not supported for unique_ptr"
+return method_name == 'get' or not self._is_array
+
 def __call__(self, obj):
 impl_type = obj.dereference().type.fields()[0].type.tag
 if impl_type.startswith('std::__uniq_ptr_impl<'): # New implementation
@@ -583,15 +593,40 @@ class UniquePtrGetWorker(gdb.xmethod.XMethodWorker):
 return None
 
 class UniquePtrDerefWorker(UniquePtrGetWorker):
+"Implements std::unique_ptr::operator*()"
+
 def __init__(self, elem_type):
 UniquePtrGetWorker.__init__(self, elem_type)
 
 def get_result_type(self, obj):
 return self._elem_type
 
+def _supports(self, method_name):
+"operator* is not supported for unique_ptr"
+return not self._is_array
+
 def __call__(self, obj):
 return UniquePtrGetWorker.__call__(self, obj).dereference()
 
+class UniquePtrSubscriptWorker(UniquePtrGetWorker):
+"Implements std::unique_ptr::operator[](size_t)"
+
+def __init__(self, elem_type):
+UniquePtrGetWorker.__init__(self, elem_type)
+
+def get_arg_types(self):
+return get_std_size_type()
+
+def get_result_type(self, obj, index):
+return self._elem_type
+
+def _supports(self, method_name):
+"operator[] is only supported for unique_ptr"
+return self._is_array
+
+def __call__(self, obj, index):
+return UniquePtrGetWorker.__call__(self, obj)[index]
+
 class UniquePtrMethodsMatcher(gdb.xmethod.XMethodMatcher):
 def __init__(self):
 gdb.xmethod.XMethodMatcher.__init__(self,
@@ -600,6 +635,7 @@ class UniquePtrMethodsMatcher(gdb.xmethod.XMethodMatcher):
 'get': LibStdCxxXMethod('get', UniquePtrGetWorker),
 'operator->

Re: [PATCH] Fix PR78515

2016-12-15 Thread David Edelsohn
On Thu, Dec 15, 2016 at 3:23 AM, Jakub Jelinek  wrote:
> On Wed, Dec 14, 2016 at 05:10:23PM -0700, Martin Sebor wrote:
>> The regression test is failing on powerpc64le due to the warnings
>> below:
>>
>> FAIL: gcc.dg/torture/pr78515.c   -O0  (test for excess errors)
>> Excess errors:
>> /src/gcc/trunk/gcc/testsuite/gcc.dg/torture/pr78515.c:11:1: warning: GCC
>> vector returned by reference: non-standard ABI extension with no
>> compatibility guarantee [-Wpsabi]
>> /src/gcc/trunk/gcc/testsuite/gcc.dg/torture/pr78515.c:10:1: warning: GCC
>> vector passed by reference: non-standard ABI extension with no compatibility
>> guarantee [-Wpsabi]
>
> David has fixed this recently, but just for AIX.  Generally, -Wno-psabi
> is beneficial for all targets if it is needed on just one, I've committed
> following:
>
> 2016-12-15  Jakub Jelinek  
>
> * gcc.dg/tree-ssa/forwprop-35.c: Use -Wno-psabi everywhere.
> * gcc.dg/torture/pr78515.c: Likewise.
> * gcc.dg/pr69634.c: Likewise.

Thanks for the fixes and clarification.

There are a few more test cases with similar problems but I was unsure
if we wanted -Wno-psabi in the general options. I will add -Wno-psabi
to the additional test cases general options later today.

Thanks, David


Re: [PATCH] combine: Replace sign_extend with zero_extend more often.

2016-12-15 Thread Dominik Vogt
On Thu, Dec 15, 2016 at 04:32:34AM -0600, Segher Boessenkool wrote:
> On Wed, Dec 14, 2016 at 01:39:13PM +0100, Dominik Vogt wrote:
> > There may be a slight imprecision in expand_compound_operation.
> > When it encounters a SIGN_EXTEND where it's already known that the
> > sign bit is zero, it may replace that with a ZERO_EXTEND (and
> > tries to simplify that further).  However, the pattern is only
> > replaced if the new set_src_cost() is _lower_ than the old cost.
> > 
> > The patch changes that to "not higher than", assuming that the
> > ZERO_EXTEND form is generally preferrable unless there is a reason
> > to believe it's not (i.e. its cost is higher).  The comment atop
> > this code block seems to support this:
> > 
> >   /* Convert sign extension to zero extension, if we know that the high
> >  bit is not set, as this is easier to optimize.  It will be converted
> >  back to cheaper alternative in make_extraction.  */
> > 
> > On s390[x] this gets rid of some SIGN_EXTENDs completely.
> > 
> > (The patched code uses the cheaper of both replacement patterns.)
> 
> That looks fine.  But see below.
> 
> > The patch hasn't got a lot of testing yet as I'd like to hear your
> > opinion on the patch first.
> 
> I am testing it on powerpc.  Please also test on x86?
> 
> > gcc/ChangeLog-signextend-1
> > 
> > * combine.c (expand_compound_operation): Substitute ZERO_EXTEND for
> > SIGN_EXTEND if the costs are equal or lower.
> > Choose the cheapest replacement.
> 
> >/* Make sure this is a profitable operation.  */
> >if (set_src_cost (x, mode, optimize_this_for_speed_p)
> > -  > set_src_cost (temp2, mode, optimize_this_for_speed_p))
> > -   return temp2;
> > -  else if (set_src_cost (x, mode, optimize_this_for_speed_p)
> > -   > set_src_cost (temp, mode, optimize_this_for_speed_p))
> > -   return temp;
> > -  else
> > -   return x;
> > + >= set_src_cost (temp2, mode, optimize_this_for_speed_p))
> > +   x = temp2;
> > +  if (set_src_cost (x, mode, optimize_this_for_speed_p)
> > + >= set_src_cost (temp, mode, optimize_this_for_speed_p))
> > +   x = temp;
> > +  return x;
> >  }
> 
> So this prefers the zero_extend version over the expand_compound_operation
> version, I wonder if that is a good idea.

Maybe this is a little less disruptive:

  int ctemp = set_src_cost (temp, mode, optimize_this_for_speed_p);
  int ctemp2 = set_src_cost (temp2, mode, optimize_this_for_speed_p);

  /* Make sure this is a profitable operation.  */
  if (MIN (ctemp, ctemp2)
  <= set_src_cost (x, mode, optimize_this_for_speed_p))
x = (ctemp < ctemp2) ? temp : temp2;
  return x;

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[Patch, Fortran, OOP] PR 78800: ICE in compare_parameter, at fortran/interface.c:2246

2016-12-15 Thread Janus Weil
Hi all,

the attached patch deals with error recovery only and fixes an
ICE-on-invalid problem.

Regtests cleanly on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus


2016-12-15  Janus Weil  

PR fortran/78800
* interface.c (compare_allocatable): Avoid additional errors on bad
class declarations.
(compare_parameter): Put the result of gfc_expr_attr into a variable,
in order to avoid calling it multiple times. Exit early on bad class
declarations to avoid ICE.

2016-12-15  Janus Weil  

PR fortran/78800
* gfortran.dg/unlimited_polymorphic_27.f90: New test case.
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c (revision 243621)
+++ gcc/fortran/interface.c (working copy)
@@ -2075,13 +2075,13 @@ done:
 static int
 compare_allocatable (gfc_symbol *formal, gfc_expr *actual)
 {
-  symbol_attribute attr;
-
   if (formal->attr.allocatable
   || (formal->ts.type == BT_CLASS && CLASS_DATA 
(formal)->attr.allocatable))
 {
-  attr = gfc_expr_attr (actual);
-  if (!attr.allocatable)
+  symbol_attribute attr = gfc_expr_attr (actual);
+  if (actual->ts.type == BT_CLASS && !attr.class_ok)
+   return 1;
+  else if (!attr.allocatable)
return 0;
 }
 
@@ -2237,6 +2237,10 @@ compare_parameter (gfc_symbol *formal, gfc_expr *a
   return 0;
 }
 
+  symbol_attribute actual_attr = gfc_expr_attr (actual);
+  if (actual->ts.type == BT_CLASS && !actual_attr.class_ok)
+return 1;
+
   if ((actual->expr_type != EXPR_NULL || actual->ts.type != BT_UNKNOWN)
   && actual->ts.type != BT_HOLLERITH
   && formal->ts.type != BT_ASSUMED
@@ -2278,9 +2282,6 @@ compare_parameter (gfc_symbol *formal, gfc_expr *a
  return 0;
}
 
-  if (!gfc_expr_attr (actual).class_ok)
-   return 0;
-
   if ((!UNLIMITED_POLY (formal) || !UNLIMITED_POLY(actual))
  && !gfc_compare_derived_types (CLASS_DATA (actual)->ts.u.derived,
 CLASS_DATA (formal)->ts.u.derived))
@@ -2345,7 +2346,7 @@ compare_parameter (gfc_symbol *formal, gfc_expr *a
   /* F2015, 12.5.2.8.  */
   if (formal->attr.dimension
  && (formal->attr.contiguous || formal->as->type != AS_ASSUMED_SHAPE)
- && gfc_expr_attr (actual).dimension
+ && actual_attr.dimension
  && !gfc_is_simply_contiguous (actual, true, true))
{
  if (where)
@@ -2406,7 +2407,7 @@ compare_parameter (gfc_symbol *formal, gfc_expr *a
 }
 
   if (formal->attr.allocatable && !formal->attr.codimension
-  && gfc_expr_attr (actual).codimension)
+  && actual_attr.codimension)
 {
   if (formal->attr.intent == INTENT_OUT)
{
! { dg-do compile }
!
! PR 78800: [OOP] ICE in compare_parameter, at fortran/interface.c:2246
!
! Contributed by Gerhard Steinmetz 

program p
   type t
   end type
   class(*) :: z  ! { dg-error "must be dummy, allocatable or pointer" }
   call s(z)
contains
   subroutine s(x)
  type(t) :: x
   end
end


[PATCH] PR59161 make pretty printers always return strings

2016-12-15 Thread Jonathan Wakely

As discussed in the PR, when we return a gdb.Value from a printer's
to_string() method GDB converts it to a string using a simplified
format that omits the address part of pointer/reference fields. I
think that's simply wrong, but in order to work with existing versions
of GDB we need to always convert to a string explicitly, to avoid that
simplified format. This ensures we don't print silliness like {ref = }

PR libstdc++/59161
* python/libstdcxx/v6/printers.py (StdListIteratorPrinter.to_string)
(StdSlistIteratorPrinter.to_string, StdVectorIteratorPrinter.to_string)
(StdRbtreeIteratorPrinter.to_string, StdDequeIteratorPrinter.to_string)
(StdDebugIteratorPrinter.to_string): Return string instead of
gdb.Value.
* testsuite/libstdc++-prettyprinters/59161.cc: New test.

Tested x86_64-linux, committed to trunk.

commit 73cddeb79a106fc7ad00a57698f797bd30738e39
Author: Jonathan Wakely 
Date:   Thu Dec 15 13:24:07 2016 +

PR59161 make pretty printers always return strings

PR libstdc++/59161
* python/libstdcxx/v6/printers.py (StdListIteratorPrinter.to_string)
(StdSlistIteratorPrinter.to_string, StdVectorIteratorPrinter.to_string)
(StdRbtreeIteratorPrinter.to_string, StdDequeIteratorPrinter.to_string)
(StdDebugIteratorPrinter.to_string): Return string instead of
gdb.Value.
* testsuite/libstdc++-prettyprinters/59161.cc: New test.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 9d84b4f..ab3592a 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -203,7 +203,7 @@ class StdListIteratorPrinter:
 nodetype = find_type(self.val.type, '_Node')
 nodetype = nodetype.strip_typedefs().pointer()
 node = self.val['_M_node'].cast(nodetype).dereference()
-return get_value_from_list_node(node)
+return str(get_value_from_list_node(node))
 
 class StdSlistPrinter:
 "Print a __gnu_cxx::slist"
@@ -248,7 +248,7 @@ class StdSlistIteratorPrinter:
 def to_string(self):
 nodetype = find_type(self.val.type, '_Node')
 nodetype = nodetype.strip_typedefs().pointer()
-return self.val['_M_node'].cast(nodetype).dereference()['_M_data']
+return str(self.val['_M_node'].cast(nodetype).dereference()['_M_data'])
 
 class StdVectorPrinter:
 "Print a std::vector"
@@ -333,7 +333,7 @@ class StdVectorIteratorPrinter:
 self.val = val
 
 def to_string(self):
-return self.val['_M_current'].dereference()
+return str(self.val['_M_current'].dereference())
 
 class StdTuplePrinter:
 "Print a std::tuple"
@@ -495,7 +495,7 @@ class StdRbtreeIteratorPrinter:
 
 def to_string (self):
 node = self.val['_M_node'].cast(self.link_type).dereference()
-return get_value_from_Rb_tree_node(node)
+return str(get_value_from_Rb_tree_node(node))
 
 class StdDebugIteratorPrinter:
 "Print a debug enabled version of an iterator"
@@ -511,7 +511,7 @@ class StdDebugIteratorPrinter:
 if not safe_seq or self.val['_M_version'] != safe_seq['_M_version']:
 return "invalid iterator"
 itype = self.val.type.template_argument(0)
-return self.val.cast(itype)
+return str(self.val.cast(itype))
 
 def num_elements(num):
 """Return either "1 element" or "N elements" depending on the argument."""
@@ -708,7 +708,7 @@ class StdDequeIteratorPrinter:
 self.val = val
 
 def to_string(self):
-return self.val['_M_cur'].dereference()
+return str(self.val['_M_cur'].dereference())
 
 class StdStringPrinter:
 "Print a std::basic_string of some kind"
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/59161.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/59161.cc
new file mode 100644
index 000..d8fef27
--- /dev/null
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/59161.cc
@@ -0,0 +1,70 @@
+// { dg-do run }
+// { dg-options "-g -O0" }
+
+// Copyright (C) 2011-2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct C {
+  C(int& i) : ref(i) { }
+  in

Re: [Patch, Fortran, OOP] PR 78800: ICE in compare_parameter, at fortran/interface.c:2246

2016-12-15 Thread Andre Vehreschild
Looks good to me.

- Andre

On Thu, 15 Dec 2016 14:18:28 +0100
Janus Weil  wrote:

> Hi all,
> 
> the attached patch deals with error recovery only and fixes an
> ICE-on-invalid problem.
> 
> Regtests cleanly on x86_64-linux-gnu. Ok for trunk?
> 
> Cheers,
> Janus
> 
> 
> 2016-12-15  Janus Weil  
> 
> PR fortran/78800
> * interface.c (compare_allocatable): Avoid additional errors on bad
> class declarations.
> (compare_parameter): Put the result of gfc_expr_attr into a variable,
> in order to avoid calling it multiple times. Exit early on bad class
> declarations to avoid ICE.
> 
> 2016-12-15  Janus Weil  
> 
> PR fortran/78800
> * gfortran.dg/unlimited_polymorphic_27.f90: New test case.


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [Patch, Fortran, OOP] PR 78800: ICE in compare_parameter, at fortran/interface.c:2246

2016-12-15 Thread Janus Weil
2016-12-15 14:30 GMT+01:00 Andre Vehreschild :
> Looks good to me.

Thanks, Andre. Committed to trunk as r243691.

Cheers,
Janus



> On Thu, 15 Dec 2016 14:18:28 +0100
> Janus Weil  wrote:
>
>> Hi all,
>>
>> the attached patch deals with error recovery only and fixes an
>> ICE-on-invalid problem.
>>
>> Regtests cleanly on x86_64-linux-gnu. Ok for trunk?
>>
>> Cheers,
>> Janus
>>
>>
>> 2016-12-15  Janus Weil  
>>
>> PR fortran/78800
>> * interface.c (compare_allocatable): Avoid additional errors on bad
>> class declarations.
>> (compare_parameter): Put the result of gfc_expr_attr into a variable,
>> in order to avoid calling it multiple times. Exit early on bad class
>> declarations to avoid ICE.
>>
>> 2016-12-15  Janus Weil  
>>
>> PR fortran/78800
>> * gfortran.dg/unlimited_polymorphic_27.f90: New test case.
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH] PR59170 make pretty printers check for singular iterators

2016-12-15 Thread Jonathan Wakely

This is another partial fix for PR 59170, this time adding checks for
normal mode iterators that are default-constructed, so we don't try to
dereference null pointers.

We still auto-dereference past-the-end iterators, and iterators that
have been invalidated by container mutation. The former can be
detected in debug mode (with some work to teach each iterator type how
to compare itself to the container's end()) and the latter is already
handled for debug mode. Neither case can be derected in normal mode.

I feel quite strongly that we should disable the printers for
iterators (the ones that make "print iter" automatically dereference
the iterator and print what it points to ... or garbage ... or crash).
I'm going to add Xmethods for all our iterator types so that it will
always be possible to do "print *iter", so if GDB supports Xmethods
then we don't need to register the iterator printers.

PR libstdc++/59170
* python/libstdcxx/v6/printers.py (StdListIteratorPrinter.to_string)
(StdSlistIteratorPrinter.to_string, StdVectorIteratorPrinter.to_string)
(StdRbtreeIteratorPrinter.to_string)
(StdDequeIteratorPrinter.to_string): Add check for value-initialized
iterators.
* testsuite/libstdc++-prettyprinters/simple.cc: Test them.
* testsuite/libstdc++-prettyprinters/simple11.cc: Likewise.

Tested x86_64-linux, committed to trunk.
commit 7b73bc563a5b4828b80e18d34cc06c0cbdae12ef
Author: Jonathan Wakely 
Date:   Thu Dec 15 14:05:24 2016 +

PR59170 make pretty printers check for singular iterators

PR libstdc++/59170
* python/libstdcxx/v6/printers.py (StdListIteratorPrinter.to_string)
(StdSlistIteratorPrinter.to_string, StdVectorIteratorPrinter.to_string)
(StdRbtreeIteratorPrinter.to_string)
(StdDequeIteratorPrinter.to_string): Add check for value-initialized
iterators.
* testsuite/libstdc++-prettyprinters/simple.cc: Test them.
* testsuite/libstdc++-prettyprinters/simple11.cc: Likewise.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index ab3592a..86de1ca 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -200,6 +200,8 @@ class StdListIteratorPrinter:
 self.typename = typename
 
 def to_string(self):
+if not self.val['_M_node']:
+return 'non-dereferenceable iterator for std::list'
 nodetype = find_type(self.val.type, '_Node')
 nodetype = nodetype.strip_typedefs().pointer()
 node = self.val['_M_node'].cast(nodetype).dereference()
@@ -246,6 +248,8 @@ class StdSlistIteratorPrinter:
 self.val = val
 
 def to_string(self):
+if not self.val['_M_node']:
+return 'non-dereferenceable iterator for __gnu_cxx::slist'
 nodetype = find_type(self.val.type, '_Node')
 nodetype = nodetype.strip_typedefs().pointer()
 return str(self.val['_M_node'].cast(nodetype).dereference()['_M_data'])
@@ -333,6 +337,8 @@ class StdVectorIteratorPrinter:
 self.val = val
 
 def to_string(self):
+if not self.val['_M_current']:
+return 'non-dereferenceable iterator for std::vector'
 return str(self.val['_M_current'].dereference())
 
 class StdTuplePrinter:
@@ -494,6 +500,8 @@ class StdRbtreeIteratorPrinter:
 self.link_type = nodetype.strip_typedefs().pointer()
 
 def to_string (self):
+if not self.val['_M_node']:
+return 'non-dereferenceable iterator for associative container'
 node = self.val['_M_node'].cast(self.link_type).dereference()
 return str(get_value_from_Rb_tree_node(node))
 
@@ -708,6 +716,8 @@ class StdDequeIteratorPrinter:
 self.val = val
 
 def to_string(self):
+if not self.val['_M_cur']:
+return 'non-dereferenceable iterator for std::deque'
 return str(self.val['_M_cur'].dereference())
 
 class StdStringPrinter:
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
index fb8e0d7..35fbb90 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 int
@@ -50,6 +51,9 @@ main()
   deq.push_back("two");
 // { dg-final { note-test deq {std::deque with 2 elements = {"one", "two"}} } }
 
+  std::deque::iterator deqiter0;
+// { dg-final { note-test deqiter0 {non-dereferenceable iterator for 
std::deque} } }
+
   std::deque::iterator deqiter = deq.begin();
 // { dg-final { note-test deqiter {"one"} } }
 
@@ -58,6 +62,9 @@ main()
   lst.push_back("two");
 // { dg-final { note-test lst {std::list = {[0] = "one", [1] = "two"}} } }
 
+  std::list::iterator lstiter0;
+// { dg-final { note-test lstiter0 {non-dereferenceable iterator for 

Re: Pretty printers for versioned namespace

2016-12-15 Thread Jonathan Wakely

On 14/12/16 22:49 +0100, François Dumont wrote:

@@ -1321,7 +1328,7 @@ def register_type_printers(obj):
if not _use_type_printing:
return

-for pfx in ('', 'w'):
+for pfx in ('', 'w', vers_nsp, vers_nsp + 'w'):
add_one_type_printer(obj, 'basic_string', pfx + 'string')
add_one_type_printer(obj, 'basic_string_view', pfx + 'string_view')
add_one_type_printer(obj, 'basic_ios', pfx + 'ios')


Looking at this part again, can't we handle the std::__7:: cases
inside add_one_type_printer instead of here?

The "pfx" prefixes here are intended for names that are imilar, like
std::string and std::wstring. If we want to handle both with an
alternative namespace then the place to do that is where we prepend
the namespace, surely?

def add_one_type_printer(obj, match, name):
printer = FilteringTypePrinter(match, 'std::' + name)
gdb.types.register_type_printer(obj, printer)
+printer = FilteringTypePrinter(match, 'std::__7::' + name)
+gdb.types.register_type_printer(obj, printer)

That will make the patch *much* smaller, and the logic is easier to
follow.

For the template type printers I think we just want to add (__7::)? to
the regular expressions. If we get a type like

 std::__7::vector >

Then I think we want to print that as std::vector, without __7::.



[PATCH, testsuite] MIPS: Relax instruction order check in msa-builtins.c.

2016-12-15 Thread Toma Tabacu
Hi,

The 32-bit insert.d case in msa-builtins.c is failing with O2 and Os because
the order of the emitted instructions is slightly different compared to the
other optimization levels.

This patch tweaks the regular expression for 32-bit insert.d to accept the 
alternate instruction order.

Tested with mips-mti-elf.

Regards,
Toma

gcc/testsuite/ChangeLog:

* gcc.target/mips/msa-builtins.c (dg-final): Tweak regex for the 32-bit
insert.d case.

diff --git a/gcc/testsuite/gcc.target/mips/msa-builtins.c 
b/gcc/testsuite/gcc.target/mips/msa-builtins.c
index 6db3d66..a679f06 100644
--- a/gcc/testsuite/gcc.target/mips/msa-builtins.c
+++ b/gcc/testsuite/gcc.target/mips/msa-builtins.c
@@ -481,7 +481,7 @@
 /* { dg-final { scan-assembler-times "msa_insert_h:.*insert\\.h.*msa_insert_h" 
1 } } */
 /* { dg-final { scan-assembler-times "msa_insert_w:.*insert\\.w.*msa_insert_w" 
1 } } */
 /* { dg-final { scan-assembler-times "msa_insert_d:.*insert\\.d.*msa_insert_d" 
1 { target mips64 } } } */
-/* { dg-final { scan-assembler-times 
"msa_insert_d:.*sra.*insert.w.*insert.w.*msa_insert_d" 1 { target {! mips64 } } 
} } */
+/* { dg-final { scan-assembler 
"msa_insert_d:.*(sra.*insert.w.*insert.w|insert.w.*sra.*insert.w).*msa_insert_d"
 { target {! mips64 } } } } */
 /* { dg-final { scan-assembler-times "msa_insve_b:.*insve\\.b.*msa_insve_b" 1 
} } */
 /* { dg-final { scan-assembler-times "msa_insve_h:.*insve\\.h.*msa_insve_h" 1 
} } */
 /* { dg-final { scan-assembler-times "msa_insve_w:.*insve\\.w.*msa_insve_w" 1 
} } */



Re: Pretty printers for versioned namespace

2016-12-15 Thread Jonathan Wakely

On 14/12/16 22:49 +0100, François Dumont wrote:

On 09/12/2016 16:18, Jonathan Wakely wrote:


But I don't know how to fix this so for the moment I just adapt it 
to correctly handle std::__7::string.


But that's not correct. Please try to understand the point I'm making:
The name "std::__7::string" does not appear in a symbol name.
Ok, the only point I don't get yet is why std::string is a symbol but 
std::__7::string is not. It seems inconsistent.




This works for me:

@@ -946,9 +950,10 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
   m = re.match(rx, func.function.name)
   if not m:
   raise ValueError("Unknown manager function in %s" % 
self.typename)

-
-# FIXME need to expand 'std::string' so that 
gdb.lookup_type works
-mgrname = re.sub("std::string(?!\w)", 
str(gdb.lookup_type('std::string').strip_typedefs()), m.group(1))

+mgrname = m.group(1)
+if not typename.startswith('std::' + vers_nsp):
+# FIXME need to expand 'std::string' so that 
gdb.lookup_type works
+mgrname = re.sub("std::string(?!\w)", 
str(gdb.lookup_type('std::string').strip_typedefs()), mgrname)


I think it doesn't work in all situations as this code is also used 
for std::experimental::any so typename doesn't start with std::__7:: 
but there is still no std::string symbol.


So I propose:

   mgrname = m.group(1)
   if 'std::string' in mgrname:
   # FIXME need to expand 'std::string' so that 
gdb.lookup_type works
   mgrname = re.sub("std::string(?!\w)", 
str(gdb.lookup_type('std::string').strip_typedefs()), m.group(1))


as you will see in attach patch.

I eventually use '__7' explicitely in some pretty printers tests 
because '__\d+' was not working, don't know.


Ok to commit once tests have completed ?

François




diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 3a111d7..9aba69a 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -36,6 +36,8 @@ import sys
# We probably can't do much about this until this GDB PR is addressed:
# 

+vers_nsp = '__7::'
+
if sys.version_info[0] > 2:
### Python 3 stuff
Iterator = object
@@ -127,9 +129,9 @@ class UniquePointerPrinter:

def to_string (self):
impl_type = self.val.type.fields()[0].type.tag
-if impl_type.startswith('std::__uniq_ptr_impl<'): # New implementation
+if re.match('^std::(' + vers_nsp + ')?__uniq_ptr_impl<.*>$', 
impl_type): # New implementation
v = self.val['_M_t']['_M_t']['_M_head_impl']
-elif impl_type.startswith('std::tuple<'):
+elif re.match('^std::(' + vers_nsp + ')?tuple<.*>$', impl_type):
v = self.val['_M_t']['_M_head_impl']
else:
raise ValueError("Unsupported implementation for unique_ptr: %s" % 
self.val.type.fields()[0].type.tag)


And we could avoid three re.match expressions with complicated regular
expressions by creating a helper function to do the "startswith"
checks:

def is_specialization_of(type, template_name):
   return re.match('^std::(%s)?%s<.*>$' % (vers_nsp, template_name), type) is 
not None

Then replace impl_type.startswith('std::__uniq_ptr_impl<') with
is_specialization_of(impl_type, '__uniq_ptr_impl')

And replace impl_type.startswith('std::tuple<') with
is_specialization_of(impl_type, 'tuple')

And replace nodetype.name.startswith('std::_Rb_tree_node') with
is_specialization_of(nodetype.name, '_Rb_tree_node')

That makes the code much easier to read.



Re: [PATCH] combine: Replace sign_extend with zero_extend more often.

2016-12-15 Thread Segher Boessenkool
On Thu, Dec 15, 2016 at 01:57:06PM +0100, Dominik Vogt wrote:
> > > The patch hasn't got a lot of testing yet as I'd like to hear your
> > > opinion on the patch first.
> > 
> > I am testing it on powerpc.  Please also test on x86?
> > 
> > > gcc/ChangeLog-signextend-1
> > > 
> > >   * combine.c (expand_compound_operation): Substitute ZERO_EXTEND for
> > >   SIGN_EXTEND if the costs are equal or lower.
> > >   Choose the cheapest replacement.
> > 
> > >/* Make sure this is a profitable operation.  */
> > >if (set_src_cost (x, mode, optimize_this_for_speed_p)
> > > -  > set_src_cost (temp2, mode, optimize_this_for_speed_p))
> > > -   return temp2;
> > > -  else if (set_src_cost (x, mode, optimize_this_for_speed_p)
> > > -   > set_src_cost (temp, mode, optimize_this_for_speed_p))
> > > -   return temp;
> > > -  else
> > > -   return x;
> > > +   >= set_src_cost (temp2, mode, optimize_this_for_speed_p))
> > > + x = temp2;
> > > +  if (set_src_cost (x, mode, optimize_this_for_speed_p)
> > > +   >= set_src_cost (temp, mode, optimize_this_for_speed_p))
> > > + x = temp;
> > > +  return x;
> > >  }
> > 
> > So this prefers the zero_extend version over the expand_compound_operation
> > version, I wonder if that is a good idea.
> 
> Maybe this is a little less disruptive:
> 
>   int ctemp = set_src_cost (temp, mode, optimize_this_for_speed_p);
>   int ctemp2 = set_src_cost (temp2, mode, optimize_this_for_speed_p);
> 
>   /* Make sure this is a profitable operation.  */
>   if (MIN (ctemp, ctemp2)
> <= set_src_cost (x, mode, optimize_this_for_speed_p))
>   x = (ctemp < ctemp2) ? temp : temp2;
>   return x;

Or just swap the temp and temp2 cases in your original patch.  Which btw
tested fine on powerpc64-linux {-m32,-m64}.


Segher


[PATCH 00/21] [ARM] Automatic selection of FPU

2016-12-15 Thread Richard Earnshaw (lists)
As discussed at this year's Cauldron, it has concerned me for a while
now that when a user of the ARM compiler specifies a CPU they also
need to specify which floating-point unit it has (even though the
choice is almost invariably one).

This patch implements the ability to make the selection automatic when
a CPU is specified (as opposed to an architecture), by specifying
-mfpu=auto.

Long term, I'd like to make this also work in conjunction with
architecture strings and then move towards deprecating -mfpu entirely,
but that's considerably more work and more suited to GCC 8 than GCC 7.

I've discussed the patch set with a couple of the other ARM
maintainers and they are happy for these patches to go in now.

  [arm] Separte tuning flags from architectural flags in CPU tables.
  [arm] Add new isa bits method
  [arm] Introduce arm_active_target.
  [arm] Use arm_active_target for architecture and tune operations.
  [arm] Reduce usage of arm_selected_cpu.
  [arm] Add new isa quirk bit for Cortex-M3 ldrd issue.
  [arm] Use arm_active_target when configuring builtins
  [arm] Remove insn_flags.
  [arm] Rework arm-common to use new feature bits.
  [arm] Remove remaining references to arm feature sets.
  [arm] Delete unused arm_fp_model.
  [arm] Eliminate vfp_reg_type
  [arm] Remove FPU rev field
  [arm] Add isa features to FPU descriptions
  [arm] Initialize fpu capability bits in arm_active_target.
  [arm] Eliminate TARGET_FPU_NAME.
  [arm] Use arm_active_target for most FP feature tests.
  [arm] Use cl_target_options for configuring the active target.
  [arm] Use ISA feature sets for determining inlinability.
  [arm] Remove FEATURES field from FPU descriptions.
  [arm] Permit 'auto' in -mfpu.

 gcc/common/config/arm/arm-common.c |  27 +-
 gcc/config/arm/arm-arches.def  |  90 +++---
 gcc/config/arm/arm-builtins.c  |  35 +--
 gcc/config/arm/arm-c.c |   3 +
 gcc/config/arm/arm-cores.def   | 239 +++
 gcc/config/arm/arm-flags.h | 195 +---
 gcc/config/arm/arm-fpus.def|  48 +--
 gcc/config/arm/arm-isa.h   | 156 ++
 gcc/config/arm/arm-opts.h  |  13 +-
 gcc/config/arm/arm-protos.h|  39 ++-
 gcc/config/arm/arm-tables.opt  |  75 +++--
 gcc/config/arm/arm-tune.md |   8 +-
 gcc/config/arm/arm.c   | 588
+
 gcc/config/arm/arm.h   |  72 +
 gcc/config/arm/arm.opt |   8 +-
 gcc/config/arm/genopt.sh   |  15 +-
 16 files changed, 909 insertions(+), 702 deletions(-)
 create mode 100644 gcc/config/arm/arm-isa.h




[PATCH 01/21] [arm] Separte tuning flags from architectural flags in CPU tables.

2016-12-15 Thread Richard Earnshaw (lists)
We start out by separating the 'tuning flags' in a CPU or architecture
specification into a new field in the data structures.  Because there
aren't very many of these (and we'd like to get rid of them entirely,
eventually, moving to entries in the tuning tables), we just use a
simple unsigned word.  This frees up a number of bits in the main
flags data structure, but we don't consolidate them as we'll be
getting rid of them entirely shortly.

There's one small user-visible change, the slow multiply flag is moved
from being treated as an architectural flag to a tuning flag.  This
has two consequences: it's now ignored for architectural matching to a
CPU and specifying a -mtune option will now correctly apply the
multiply performance to the decision as to which sequences to
synthesise.

* arm-arches.def (ARM_ARCH): Add extra field TUNE_FLAGS, move
tuning properties from architectural FLAGS field.
* arm-cores.def (ARM_CORE): Likewise.
* arm-protos.h (TF_LDSCHED, TF_WBUF, TF_CO_PROC): New macros.
(TF_SMALLMUL, TF_STRONG, TF_SCALE, TF_NOMODE32): New macros.
(FL_LDSCHED, FL_STRONG, FL_WBUF, FL_SMALLMUL): Delete.
(FL_TUNE): Remove deleted elements.
(tune_flags): Convert type to unsigned int.
* arm.c (struct processors): Add new field tune_flags.
(all_cores, all_arches): Initialize it.
(arm_option_override): Adapt uses of tune_flags.  Use tune_flags
for deciding when we should have slow multiply operations.
---
 gcc/common/config/arm/arm-common.c |   4 +-
 gcc/config/arm/arm-arches.def  |  85 ++
 gcc/config/arm/arm-cores.def   | 224
++---
 gcc/config/arm/arm-flags.h |  24 ++--
 gcc/config/arm/arm-opts.h  |   2 +-
 gcc/config/arm/arm-protos.h|   2 +-
 gcc/config/arm/arm.c   |  29 ++---
 7 files changed, 184 insertions(+), 186 deletions(-)



diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index c0de5d2..93a13c8 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -107,12 +107,12 @@ struct arm_arch_core_flag
 static const struct arm_arch_core_flag arm_arch_core_flags[] =
 {
 #undef ARM_CORE
-#define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
+#define ARM_CORE(NAME, X, IDENT, TUNE_FLAGS, ARCH, FLAGS, COSTS) \
   {NAME, FLAGS},
 #include "config/arm/arm-cores.def"
 #undef ARM_CORE
 #undef ARM_ARCH
-#define ARM_ARCH(NAME, CORE, ARCH, FLAGS) \
+#define ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, FLAGS)	\
   {NAME, FLAGS},
 #include "config/arm/arm-arches.def"
 #undef ARM_ARCH
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 71cabcc..d81a471 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -19,7 +19,7 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_ARCH(NAME, CORE, ARCH, FLAGS)
+  ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, FLAGS)
 
The NAME is the name of the architecture, represented as a string
constant.  The CORE is the identifier for a core representative of
@@ -28,52 +28,41 @@
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
-ARM_ARCH("armv2a",  arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
-ARM_ARCH("armv3",   arm6,   3,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3))
-ARM_ARCH("armv3m",  arm7m,  3M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M))
-ARM_ARCH("armv4",   arm7tdmi,   4,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4))
+ARM_ARCH("armv2",   arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv2a",  arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv3",   arm6,   TF_CO_PROC, 3,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH("armv3m",  arm7m,  TF_CO_PROC, 3M,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH("armv4",   arm7tdmi,   TF_CO_PROC, 4,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   4T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH4T))
-ARM_ARCH("armv5",   arm10tdmi,  5,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5))
-ARM_ARCH("armv5t",  arm10tdmi,  5T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5T))
-ARM_ARCH("armv5e",  arm1026ejs, 5E,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5E))
-ARM_ARCH("armv5te", arm1026ejs, 5TE,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5TE))
-ARM_ARCH("armv6",   arm1136js,  6,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH6))
-ARM_ARCH("arm

[PATCH 03/21] [arm] Introduce arm_active_target.

2016-12-15 Thread Richard Earnshaw (lists)

This patch creates a new data structure for carrying around the data
relating to the current compilation target.  The idea behind this is
that this data structure can be updated to reflect the overall
compilation target as new information is gathered (from command line
options) or architectural extensions.  We will no-longer have to grub
around looking in multiple places for this information.

There are some small behaviour changes around how we handle selecting
a default CPU if thumb or interworking are specified on the command
line and the default CPU does not support thumb, but I believe the
existing code was broken in that respect.  This code will go away once
we obsolete pre-armv4t devices.

* arm-protos.h (arm_build_target): New structure.
(arm_active_target): Declare it.
* arm.c (arm_active_target): New variable.
(bitmap_popcount): New function.
(feature_count): Delete.
(arm_initialize_isa): New function.
isa_fpubits): New variable.
(arm_configure_build_target): New function.
(arm_option_override): Initialize isa_fpubits and arm_active_target.isa.
Use arm_configure_build_target.
---
 gcc/config/arm/arm-protos.h |  25 ++
 gcc/config/arm/arm.c| 203
+++-
 2 files changed, 168 insertions(+), 60 deletions(-)


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 58d2ae3..7673e3a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -449,6 +449,31 @@ extern int arm_arch_no_volatile_ce;
than core registers.  */
 extern int prefer_neon_for_64bits;
 
+/* Structure defining the current overall architectural target and tuning.  */
+struct arm_build_target
+{
+  /* Name of the target CPU, if known, or NULL if the target CPU was not
+ specified by the user (and inferred from the -march option).  */
+  const char *core_name;
+  /* Name of the target ARCH.  NULL if there is a selected CPU.  */
+  const char *arch_name;
+  /* Preprocessor substring (never NULL).  */
+  const char *arch_pp_name;
+  /* CPU identifier for the core we're compiling for (architecturally).  */
+  enum processor_type arch_core;
+  /* The base architecture value.  */
+  enum base_architecture base_arch;
+  /* Bitmap encapsulating the isa_bits for the target environment.  */
+  sbitmap isa;
+  /* Flags used for tuning.  Long term, these move into tune_params.  */
+  unsigned int tune_flags;
+  /* Tables with more detailed tuning information.  */
+  const struct tune_params *tune;
+  /* CPU identifier for the tuning target.  */
+  enum processor_type tune_core;
+};
+
+extern struct arm_build_target arm_active_target;
 
 
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bf04a06..deab528 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -88,7 +88,7 @@ static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 unsigned HOST_WIDE_INT, rtx, rtx, int, int);
 static unsigned bit_count (unsigned long);
-static unsigned feature_count (const arm_feature_set*);
+static unsigned bitmap_popcount (const sbitmap);
 static int arm_address_register_rtx_p (rtx, int);
 static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int);
 static bool is_called_in_ARM_mode (tree);
@@ -791,6 +791,10 @@ unsigned int tune_flags = 0;
target.  */
 enum base_architecture arm_base_arch = BASE_ARCH_0;
 
+/* Active target architecture and tuning.  */
+
+struct arm_build_target arm_active_target;
+
 /* The following are used in the arm.md file as equivalents to bits
in the above two flag variables.  */
 
@@ -2376,12 +2380,17 @@ bit_count (unsigned long value)
   return count;
 }
 
-/* Return the number of features in feature-set SET.  */
+/* Return the number of bits set in BMAP.  */
 static unsigned
-feature_count (const arm_feature_set * set)
+bitmap_popcount (const sbitmap bmap)
 {
-  return (bit_count (ARM_FSET_CPU1 (*set))
-	  + bit_count (ARM_FSET_CPU2 (*set)));
+  unsigned int count = 0;
+  unsigned int n = 0;
+  sbitmap_iterator sbi;
+
+  EXECUTE_IF_SET_IN_BITMAP (bmap, 0, n, sbi)
+count++;
+  return count;
 }
 
 typedef struct
@@ -3038,100 +3047,149 @@ arm_option_override_internal (struct gcc_options *opts,
 #endif
 }
 
-/* Fix up any incompatible options that the user has specified.  */
+/* Convert a static initializer array of feature bits to sbitmap
+   representation.  */
 static void
-arm_option_override (void)
+arm_initialize_isa (sbitmap isa, const enum isa_feature *isa_bits)
+{
+  bitmap_clear (isa);
+  while (*isa_bits != isa_nobit)
+bitmap_set_bit (isa, *(isa_bits++));
+}
+
+static sbitmap isa_fpubits;
+
+/* Configure a build target TARGET from the user-specified options OPTS and
+   OPTS_SET.  If WARN_COMPATIBLE, emit a diagnostic if both the CPU and
+   architecture have been specified, but the two are not identical.  */
+static void
+arm_c

[PATCH 04/21] [arm] Use arm_active_target for architecture and tune operations.

2016-12-15 Thread Richard Earnshaw (lists)

We now start to make more use of the new data structure.  This allows
us to eliminate two of the existing static variables,
arm_selected_arch and arm_selected tune.

* arm.c (arm_selected_tune): Delete static variable.
(arm_selected_arch): Likewise.
(arm_configure_build_target): Declare local versions of arm_selected
target and arm_selected_arch.  Initialize more fields in target
data structure.
(arm_option_override): Use arm_active_target instead of
arm_selected_tune and arm_selected_arch.
(asm_file_start): Use arm_active_target.
---
 gcc/config/arm/arm.c | 58
+---
 1 file changed, 41 insertions(+), 17 deletions(-)


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index deab528..a4d370c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2319,9 +2319,7 @@ static const struct processors all_architectures[] =
 
 /* These are populated as commandline arguments are processed, or NULL
if not specified.  */
-static const struct processors *arm_selected_arch;
 static const struct processors *arm_selected_cpu;
-static const struct processors *arm_selected_tune;
 
 /* The name of the preprocessor macro to define for this architecture.  PROFILE
is replaced by the architecture name (eg. 8A) in arm_option_override () and
@@ -3068,9 +3066,10 @@ arm_configure_build_target (struct arm_build_target *target,
 			struct gcc_options *opts_set,
 			bool warn_compatible)
 {
-  arm_selected_arch = NULL;
+  const struct processors *arm_selected_tune = NULL;
+  const struct processors *arm_selected_arch = NULL;
+
   arm_selected_cpu = NULL;
-  arm_selected_tune = NULL;
 
   bitmap_clear (target->isa);
   target->core_name = NULL;
@@ -3119,17 +3118,24 @@ arm_configure_build_target (struct arm_build_target *target,
 		 Prefer the CPU setting.  */
 	  arm_selected_arch = NULL;
 	}
+
+	  target->core_name = arm_selected_cpu->name;
 	}
   else
 	{
 	  /* Pick a CPU based on the architecture.  */
 	  arm_selected_cpu = arm_selected_arch;
 	  target->arch_name = arm_selected_arch->name;
+	  /* Note: target->core_name is left unset in this path.  */
 	}
 }
-
+  else if (arm_selected_cpu)
+{
+  target->core_name = arm_selected_cpu->name;
+  arm_initialize_isa (target->isa, arm_selected_cpu->isa_bits);
+}
   /* If the user did not specify a processor, choose one for them.  */
-  if (!arm_selected_cpu)
+  else
 {
   const struct processors * sel;
   auto_sbitmap sought_isa (isa_num_bits);
@@ -3229,16 +3235,27 @@ arm_configure_build_target (struct arm_build_target *target,
 	}
 	  arm_selected_cpu = sel;
 	}
+
+  /* Now we know the CPU, we can finally initialize the target
+	 structure.  */
+  target->core_name = arm_selected_cpu->name;
+  arm_initialize_isa (target->isa, arm_selected_cpu->isa_bits);
 }
 
   gcc_assert (arm_selected_cpu);
+
   /* The selected cpu may be an architecture, so lookup tuning by core ID.  */
   if (!arm_selected_tune)
 arm_selected_tune = &all_cores[arm_selected_cpu->core];
 
+  /* Finish initializing the target structure.  */
   target->arch_pp_name = arm_selected_cpu->arch;
+  target->base_arch = arm_selected_cpu->base_arch;
+  target->arch_core = arm_selected_cpu->core;
+
   target->tune_flags = arm_selected_tune->tune_flags;
   target->tune = arm_selected_tune->tune;
+  target->tune_core = arm_selected_tune->core;
 }
 
 /* Fix up any incompatible options that the user has specified.  */
@@ -3263,9 +3280,9 @@ arm_option_override (void)
   insn_flags = arm_selected_cpu->flags;
   arm_base_arch = arm_selected_cpu->base_arch;
 
-  arm_tune = arm_selected_tune->core;
-  tune_flags = arm_selected_tune->tune_flags;
-  current_tune = arm_selected_tune->tune;
+  arm_tune = arm_active_target.tune_core;
+  tune_flags = arm_active_target.tune_flags;
+  current_tune = arm_active_target.tune;
 
   /* TBD: Dwarf info for apcs frame is not handled yet.  */
   if (TARGET_APCS_FRAME)
@@ -25957,10 +25974,16 @@ arm_file_start (void)
 
   if (TARGET_BPABI)
 {
-  if (arm_selected_arch)
+  /* We don't have a specified CPU.  Use the architecture to
+	 generate the tags.
+
+	 Note: it might be better to do this unconditionally, then the
+	 assembler would not need to know about all new CPU names as
+	 they are added.  */
+  if (!arm_active_target.core_name)
 {
 	  /* armv7ve doesn't support any extensions.  */
-	  if (strcmp (arm_selected_arch->name, "armv7ve") == 0)
+	  if (strcmp (arm_active_target.arch_name, "armv7ve") == 0)
 	{
 	  /* Keep backward compatability for assemblers
 		 which don't support armv7ve.  */
@@ -25972,20 +25995,21 @@ arm_file_start (void)
 	}
 	  else
 	{
-	  const char* pos = strchr (arm_selected_arch->name, '+');
+	  const char* pos = strchr (arm_active_target.arch_name, '+');
 	  if (pos)
 		{
 		  char buf[32];
-		  gcc_assert (strlen (ar

[PATCH 02/21] [arm] Add new isa bits method

2016-12-15 Thread Richard Earnshaw (lists)

This patch adds the new ISA data structures.  The idea is to use an
sbitmap for carrying these around internally.  We don't make much use
of this yet, but will increasingly migrate over to this in the
following patches.  All cores and architectures currently have both
old and new encodings for now.

For simplicity and clarity we introduce internally the concept of
ARMv7ve.  It doesn't change any visible behaviour.

There's also a bit of tidying up of the various supported cores,
sorting them by profile.

* arm-isa.h: New file.
* arm-protos.h: Include it.
* arm-arches.def: Add new ISA field to all entries.  Drop bogus
armv8.1-a+crc architecture.
* arm-cores.def: Similarly.  Group ARMv8 cores by profile.
* arm-opts.h (enum processor_type): Adjust for new field.
* arm.c (struct processors): New field 'isa_bits'.
(all_cores, all_architectures): Initialize new field.
* arm-tables.opt: Regenerated.
* arm-tune.md: Regenerated.
---
 gcc/common/config/arm/arm-common.c |   4 +-
 gcc/config/arm/arm-arches.def  |  78 ++--
 gcc/config/arm/arm-cores.def   | 235
+++--
 gcc/config/arm/arm-isa.h   | 127 
 gcc/config/arm/arm-opts.h  |   2 +-
 gcc/config/arm/arm-protos.h|   1 +
 gcc/config/arm/arm-tables.opt  |  29 ++---
 gcc/config/arm/arm-tune.md |   8 +-
 gcc/config/arm/arm.c   |  15 ++-
 9 files changed, 315 insertions(+), 184 deletions(-)
 create mode 100644 gcc/config/arm/arm-isa.h


diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index 93a13c8..79e3f1f 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -107,12 +107,12 @@ struct arm_arch_core_flag
 static const struct arm_arch_core_flag arm_arch_core_flags[] =
 {
 #undef ARM_CORE
-#define ARM_CORE(NAME, X, IDENT, TUNE_FLAGS, ARCH, FLAGS, COSTS) \
+#define ARM_CORE(NAME, X, IDENT, TUNE_FLAGS, ARCH, ISA, FLAGS, COSTS)	\
   {NAME, FLAGS},
 #include "config/arm/arm-cores.def"
 #undef ARM_CORE
 #undef ARM_ARCH
-#define ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, FLAGS)	\
+#define ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA, FLAGS)	\
   {NAME, FLAGS},
 #include "config/arm/arm-arches.def"
 #undef ARM_ARCH
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index d81a471..02ece42 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -19,50 +19,50 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, FLAGS)
+  ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA, FLAGS)
 
The NAME is the name of the architecture, represented as a string
constant.  The CORE is the identifier for a core representative of
-   this architecture.  ARCH is the architecture revision.  FLAGS is
-   the set of feature flags implied by the architecture.
+   this architecture.  ARCH is the architecture revision.  ISA is the
+   detailed architectural capabilities of the core (see arm-isa.h).
+   FLAGS is the set of feature flags implied by the architecture.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
-ARM_ARCH("armv2a",  arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
-ARM_ARCH("armv3",   arm6,   TF_CO_PROC, 3,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3))
-ARM_ARCH("armv3m",  arm7m,  TF_CO_PROC, 3M,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3M))
-ARM_ARCH("armv4",   arm7tdmi,   TF_CO_PROC, 4,	ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH4))
+ARM_ARCH("armv2",   arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,		ISA_FEAT(ISA_ARMv2) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv2a",  arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,		ISA_FEAT(ISA_ARMv2) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv3",   arm6,   TF_CO_PROC,   		 3,		ISA_FEAT(ISA_ARMv3) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH("armv3m",  arm7m,  TF_CO_PROC, 		 3M,	ISA_FEAT(ISA_ARMv3m) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH("armv4",   arm7tdmi,   TF_CO_PROC, 		 4,		ISA_FEAT(ISA_ARMv4) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   TF_CO_PROC, 4T,	ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH4T))
-ARM_ARCH("armv5",   arm10tdmi,  TF_CO_PROC, 5,	ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH5))
-ARM_ARCH("armv5t",  arm10tdmi,  TF_CO_PROC, 5T,	ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH5T))
-ARM_ARCH("armv5e",  arm1026ejs, TF_CO_PROC, 5E,	

[PATCH 05/21] [arm] Reduce usage of arm_selected_cpu.

2016-12-15 Thread Richard Earnshaw (lists)

Make more use of the new data structure for initializing existing
variables.

* arm.c (arm_option_override): Use arm_active_target as source of
information for arm_base_arch and arm_arch_name.
* (arm_file_start): Use arm_active_target for core name.
---
 gcc/config/arm/arm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a4d370c..3806226 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3276,9 +3276,9 @@ arm_option_override (void)
   SUBTARGET_OVERRIDE_OPTIONS;
 #endif
 
-  sprintf (arm_arch_name, "__ARM_ARCH_%s__", arm_selected_cpu->arch);
+  sprintf (arm_arch_name, "__ARM_ARCH_%s__", arm_active_target.arch_pp_name);
   insn_flags = arm_selected_cpu->flags;
-  arm_base_arch = arm_selected_cpu->base_arch;
+  arm_base_arch = arm_active_target.base_arch;
 
   arm_tune = arm_active_target.tune_core;
   tune_flags = arm_active_target.tune_flags;
@@ -26012,12 +26012,13 @@ arm_file_start (void)
 			 arm_active_target.arch_name);
 	}
 }
-  else if (strncmp (arm_selected_cpu->name, "generic", 7) == 0)
-	asm_fprintf (asm_out_file, "\t.arch %s\n", arm_selected_cpu->name + 8);
+  else if (strncmp (arm_active_target.core_name, "generic", 7) == 0)
+	asm_fprintf (asm_out_file, "\t.arch %s\n",
+		 arm_active_target.core_name + 8);
   else
 	{
 	  const char* truncated_name
-	= arm_rewrite_selected_cpu (arm_selected_cpu->name);
+	= arm_rewrite_selected_cpu (arm_active_target.core_name);
 	  asm_fprintf (asm_out_file, "\t.cpu %s\n", truncated_name);
 	}
 



[PATCH 06/21] [arm] Add new isa quirk bit for Cortex-M3 ldrd issue.

2016-12-15 Thread Richard Earnshaw (lists)

With the new data structures it is trivial to add a new field and we
aren't (too) limited as to the number we have.  This patch adds a new
bit to describe the need for a particular compiler behaviour
modification: in this case a quirk in the cortex-m3.

* arm-isa.h (enum isa_feature): Add isa_quirk_cm3_ldrd.
(ISA_ALL_QUIRKS): New macro.
* arm-cores.def (cortex-m3): Add isa_quirk_cm3_ldrd to isa feature list.
* arm.c (isa_quirkbits): New feature-list bitmap.
(arm_configure_build_target): Ignore quirk bits when comparing an
architecture feature list with a CPU feature list.
(arm_option_override): Initialize_isa_quirkbits.  If the user has
not specified -m[no-]fix-cortex-m3-ldrd, automatically enable the
feature if isa_quirk_cm3_ldrd appears in the isa feature list.
---
 gcc/config/arm/arm-cores.def | 2 +-
 gcc/config/arm/arm-isa.h | 9 -
 gcc/config/arm/arm.c | 9 -
 3 files changed, 17 insertions(+), 3 deletions(-)


diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 7c951f3..7f64a1f 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -160,7 +160,7 @@ ARM_CORE("cortex-r7",		cortexr7, cortexr7,		TF_LDSCHED, 	  7R,	ISA_FEAT(ISA_ARMv
 ARM_CORE("cortex-r8",		cortexr8, cortexr7,		TF_LDSCHED, 	  7R,	ISA_FEAT(ISA_ARMv7r) ISA_FEAT(isa_bit_adiv), ARM_FSET_MAKE_CPU1 (FL_ARM_DIV | FL_FOR_ARCH7R), cortex)
 ARM_CORE("cortex-m7",		cortexm7, cortexm7,		TF_LDSCHED, 	  7EM,	ISA_FEAT(ISA_ARMv7em) ISA_FEAT(isa_quirk_no_volatile_ce), ARM_FSET_MAKE_CPU1 (FL_NO_VOLATILE_CE | FL_FOR_ARCH7EM), cortex_m7)
 ARM_CORE("cortex-m4",		cortexm4, cortexm4,		TF_LDSCHED, 	  7EM,	ISA_FEAT(ISA_ARMv7em), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH7EM), v7m)
-ARM_CORE("cortex-m3",		cortexm3, cortexm3,		TF_LDSCHED, 	  7M,	ISA_FEAT(ISA_ARMv7m), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH7M), v7m)
+ARM_CORE("cortex-m3",		cortexm3, cortexm3,		TF_LDSCHED, 	  7M,	ISA_FEAT(ISA_ARMv7m) ISA_FEAT(isa_quirk_cm3_ldrd), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH7M), v7m)
 ARM_CORE("marvell-pj4",		marvell_pj4, marvell_pj4,	TF_LDSCHED, 	  7A,	ISA_FEAT(ISA_ARMv7a), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH7A), marvell_pj4)
 
 /* V7 big.LITTLE implementations */
diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h
index 15eb6e1..2d47c1b 100644
--- a/gcc/config/arm/arm-isa.h
+++ b/gcc/config/arm/arm-isa.h
@@ -58,9 +58,11 @@ enum isa_feature
 isa_bit_neon,	/* Advanced SIMD instructions.  */
 isa_bit_fp16,	/* FP16 extension (half-precision float).  */
 
-/* ISA Quirks (errata?).  */
+/* ISA Quirks (errata?).  Don't forget to add this to the list of
+   all quirks below.  */
 isa_quirk_no_volatile_ce,	/* No volatile memory in IT blocks.  */
 isa_quirk_ARMv6kz,		/* Previously mis-identified by GCC.  */
+isa_quirk_cm3_ldrd,		/* Cortex-M3 LDRD quirk.  */
 
 /* Aren't currently, but probably should be tuning bits.  */
 isa_bit_smallmul,	/* Slow multiply operations.  */
@@ -120,6 +122,11 @@ enum isa_feature
default.  */
 #define ISA_ALL_FPU	isa_bit_VFPv2, isa_bit_VFPv3, isa_bit_neon
 
+/* List of all quirk bits to strip out when comparing CPU features with
+   architectures.  */
+#define ISA_ALL_QUIRKS	isa_quirk_no_volatile_ce, isa_quirk_ARMv6kz,	\
+isa_quirk_cm3_ldrd
+
 /* Helper macro so that we can concatenate multiple features together
with arm-*.def files, since macro substitution can't have commas within an
argument that lacks parenthesis.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3806226..c6be4d8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3056,6 +3056,7 @@ arm_initialize_isa (sbitmap isa, const enum isa_feature *isa_bits)
 }
 
 static sbitmap isa_fpubits;
+static sbitmap isa_quirkbits;
 
 /* Configure a build target TARGET from the user-specified options OPTS and
OPTS_SET.  If WARN_COMPATIBLE, emit a diagnostic if both the CPU and
@@ -3097,6 +3098,8 @@ arm_configure_build_target (struct arm_build_target *target,
 
 	  arm_initialize_isa (cpu_isa, arm_selected_cpu->isa_bits);
 	  bitmap_xor (cpu_isa, cpu_isa, target->isa);
+	  /* Ignore any bits that are quirk bits.  */
+	  bitmap_and_compl (cpu_isa, cpu_isa, isa_quirkbits);
 	  /* Ignore (for now) any bits that might be set by -mfpu.  */
 	  bitmap_and_compl (cpu_isa, cpu_isa, isa_fpubits);
 
@@ -3263,6 +3266,10 @@ static void
 arm_option_override (void)
 {
   static const enum isa_feature fpu_bitlist[] = { ISA_ALL_FPU, isa_nobit };
+  static const enum isa_feature quirk_bitlist[] = { ISA_ALL_QUIRKS, isa_nobit};
+
+  isa_quirkbits = sbitmap_alloc (isa_num_bits);
+  arm_initialize_isa (isa_quirkbits, quirk_bitlist);
 
   isa_fpubits = sbitmap_alloc (isa_num_bits);
   arm_initialize_isa (isa_fpubits, fpu_bitlist);
@@ -3510,7 +3517,7 @@ arm_option_override (void)
   /* Enable -mfix-cortex-m3-ldrd by default for Cortex-M3 cores.  */
   if (fix_cm3_ldrd == 2)
 {
-  if (arm_selected_cpu-

[PATCH 07/21] [arm] Use arm_active_target when configuring builtins

2016-12-15 Thread Richard Earnshaw (lists)

This patch uses the new ISA data structure to determine which builtins
to add.  It entirely eliminates the need for insn_flags to be a global
variable, but we're about to delete that in the following patches, so
for now we leave it as a global.

* arm-builtins.c: Include sbitmap.h.
(def_mbuiltin): Change first parameter to a flag bit.  Use it to test
available features in the current target.
(struct builtin_description): Change type of feature field.
(IWMMXT_BUILTIN): Use the isa_features types.
(IWMMXT2_BUILTIN): Likewise.
(IWMMXT_BUILTIN2): Likewise.
(IWMMXT2_BUILTIN2): Likewise.
(CRC32_BUILTIN): Likewise.
(CRYPTO_BUILTIN): Likewise.
(iwmmx_builtin): Likewise.
(iwmmx2_builtin): Likewise.
(arm_iwmmxt_builtin): Check for specific feature bits.
---
 gcc/config/arm/arm-builtins.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 120..80d3b67 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -37,6 +37,7 @@
 #include "expr.h"
 #include "langhooks.h"
 #include "case-cfn-macros.h"
+#include "sbitmap.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 5
 
@@ -1154,11 +1155,11 @@ arm_init_crypto_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(FLAGS, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
   do	\
 {	\
-  const arm_feature_set flags = FLAGS;\
-  if (ARM_FSET_CPU_SUBSET (flags, insn_flags))			\
+  if (FLAG == isa_nobit		\
+	  || bitmap_bit_p (arm_active_target.isa, FLAG))		\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
@@ -1170,7 +1171,7 @@ arm_init_crypto_builtins (void)
 
 struct builtin_description
 {
-  const arm_feature_setfeatures;
+  const enum isa_feature   feature;
   const enum insn_code icode;
   const char * const   name;
   const enum arm_builtins  code;
@@ -1181,12 +1182,12 @@ struct builtin_description
 static const struct builtin_description bdesc_2arg[] =
 {
 #define IWMMXT_BUILTIN(code, string, builtin) \
-  { ARM_FSET_MAKE_CPU1 (FL_IWMMXT), CODE_FOR_##code, \
+  { isa_bit_iwmmxt, CODE_FOR_##code, \
 "__builtin_arm_" string,			 \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
 #define IWMMXT2_BUILTIN(code, string, builtin) \
-  { ARM_FSET_MAKE_CPU1 (FL_IWMMXT2), CODE_FOR_##code, \
+  { isa_bit_iwmmxt2, CODE_FOR_##code, \
 "__builtin_arm_" string,			  \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
@@ -1270,11 +1271,11 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN (iwmmxt_walignr3, "walignr3", WALIGNR3)
 
 #define IWMMXT_BUILTIN2(code, builtin) \
-  { ARM_FSET_MAKE_CPU1 (FL_IWMMXT), CODE_FOR_##code, NULL, \
+  { isa_bit_iwmmxt, CODE_FOR_##code, NULL, \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
 #define IWMMXT2_BUILTIN2(code, builtin) \
-  { ARM_FSET_MAKE_CPU2 (FL_IWMMXT2), CODE_FOR_##code, NULL, \
+  { isa_bit_iwmmxt2, CODE_FOR_##code, NULL, \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
   IWMMXT2_BUILTIN2 (iwmmxt_waddbhusm, WADDBHUSM)
@@ -1290,7 +1291,7 @@ static const struct builtin_description bdesc_2arg[] =
 
 
 #define FP_BUILTIN(L, U) \
-  {ARM_FSET_EMPTY, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
+  {isa_nobit, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
UNKNOWN, 0},
 
   FP_BUILTIN (get_fpscr, GET_FPSCR)
@@ -1298,7 +1299,7 @@ static const struct builtin_description bdesc_2arg[] =
 #undef FP_BUILTIN
 
 #define CRC32_BUILTIN(L, U) \
-  {ARM_FSET_EMPTY, CODE_FOR_##L, "__builtin_arm_"#L, \
+  {isa_nobit, CODE_FOR_##L, "__builtin_arm_"#L, \
ARM_BUILTIN_##U, UNKNOWN, 0},
CRC32_BUILTIN (crc32b, CRC32B)
CRC32_BUILTIN (crc32h, CRC32H)
@@ -1310,7 +1311,7 @@ static const struct builtin_description bdesc_2arg[] =
 
 
 #define CRYPTO_BUILTIN(L, U)	   \
-  {ARM_FSET_EMPTY, CODE_FOR_crypto_##L,	"__builtin_arm_crypto_"#L, \
+  {isa_nobit, CODE_FOR_crypto_##L,	"__builtin_arm_crypto_"#L, \
ARM_BUILTIN_CRYPTO_##U, UNKNOWN, 0},
 #undef CRYPTO1
 #undef CRYPTO2
@@ -1567,9 +1568,9 @@ arm_init_iwmmxt_builtins (void)
   machine_mode mode;
   tree type;
 
-  if (d->name == 0 ||
-	  !(ARM_FSET_HAS_CPU1 (d->features, FL_IWMMXT) ||
-	ARM_FSET_HAS_CPU1 (d->features, FL_IWMMXT2)))
+  if (d->name == 0
+	  || !(d->feature == isa_bit_iwmmxt
+	   || d->feature == isa_bit_iwmmxt2))
 	continue;
 
   mode = insn_data[d->icode].operand[1].mode;
@@ -1593,16 +1594,16 @@ arm_init_iwmmxt_builtins (void)
 	  gcc_unreachable ();
 	}
 
-  def_mbuiltin (d->features, d->name, type, d->code);
+  def_mbuiltin (d->feature, d->name, type, d->code);
 }
 
   /* Add the remaining MMX insns with somewhat more complicated types.  */
 #define iwmmx_mbuiltin(NAME, TYPE, CODE)			\
-  def_mbuiltin (ARM_FSET_

[PATCH 11/21] [arm] Delete unused arm_fp_model.

2016-12-15 Thread Richard Earnshaw (lists)

The arm_fp_model enumeration type has only had one useful value since
the FPA support was removed, and it's no-longer used anywhere.  This
patch just cleans that up by removing it.

* arm.h (arm_fp_model): Delete.
---
 gcc/config/arm/arm.h | 8 
 1 file changed, 8 deletions(-)


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 6661314..7690e70 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -340,14 +340,6 @@ typedef unsigned long arm_fpu_feature_set;
 #define FPU_FL_FP16	(1 << 1)	/* Half-precision.  */
 #define FPU_FL_CRYPTO	(1 << 2)	/* Crypto extensions.  */
 
-/* Which floating point model to use.  */
-enum arm_fp_model
-{
-  ARM_FP_MODEL_UNKNOWN,
-  /* VFP floating point model.  */
-  ARM_FP_MODEL_VFP
-};
-
 enum vfp_reg_type
 {
   VFP_NONE = 0,



[PATCH 08/21] [arm] Remove insn_flags.

2016-12-15 Thread Richard Earnshaw (lists)

This patch finishes the job of removing insn_flags and moves the logic
over to using the new data structures.  I've added a new boolean
variable to detect when we have ARMv7ve-like capabilities and thus
have 64-bit atomic operations since that would be a complex query and
expensive to do in full.  It might be better to add a specific bit to
the ISA data structures to indicate this capability directly.

* arm-protos.h (insn_flags): Delete declaration.
(arm_arch7ve): Declare.
* arm.c (insn_flags): Delete.
(arm_arch7ve): New variable.
(arm_selected_cpu): Delete.
(arm_option_check_internal): Use new ISA bitmap.
(arm_option_override_internal): Likewise.
(arm_configure_build_target): Declare arm_selected_cpu locally.
(arm_option_override): Use new ISA bitmap.  Initialize arm_arch7ve.
Rearrange variable intialization by general function.
* arm.h (TARGET_HAVE_LPAE): Use arm_arch7ve.
---
 gcc/config/arm/arm-protos.h |   7 ++-
 gcc/config/arm/arm.c| 103
+++-
 gcc/config/arm/arm.h|   3 +-
 3 files changed, 57 insertions(+), 56 deletions(-)


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 7673e3a..659959b 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -353,10 +353,6 @@ extern void arm_cpu_cpp_builtins (struct cpp_reader *);
 
 extern bool arm_is_constant_pool_ref (rtx);
 
-/* The bits in this mask specify which
-   instructions we are allowed to generate.  */
-extern arm_feature_set insn_flags;
-
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
 extern unsigned int tune_flags;
@@ -391,6 +387,9 @@ extern int arm_arch6m;
 /* Nonzero if this chip supports the ARM 7 extensions.  */
 extern int arm_arch7;
 
+/* Nonzero if this chip supports the ARM 7ve extensions.  */
+extern int arm_arch7ve;
+
 /* Nonzero if instructions not present in the 'M' profile can be used.  */
 extern int arm_arch_notm;
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c6be4d8..0b82714 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -779,10 +779,6 @@ int arm_fpu_attr;
 rtx thumb_call_via_label[14];
 static int thumb_call_reg_needed;
 
-/* The bits in this mask specify which
-   instructions we are allowed to generate.  */
-arm_feature_set insn_flags = ARM_FSET_EMPTY;
-
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
 unsigned int tune_flags = 0;
@@ -828,6 +824,9 @@ int arm_arch6m = 0;
 /* Nonzero if this chip supports the ARM 7 extensions.  */
 int arm_arch7 = 0;
 
+/* Nonzero if this chip supports the ARM 7ve extensions.  */
+int arm_arch7ve = 0;
+
 /* Nonzero if instructions not present in the 'M' profile can be used.  */
 int arm_arch_notm = 0;
 
@@ -2316,11 +2315,6 @@ static const struct processors all_architectures[] =
   {NULL, TARGET_CPU_arm_none, 0, NULL, BASE_ARCH_0, {isa_nobit}, ARM_FSET_EMPTY, NULL}
 };
 
-
-/* These are populated as commandline arguments are processed, or NULL
-   if not specified.  */
-static const struct processors *arm_selected_cpu;
-
 /* The name of the preprocessor macro to define for this architecture.  PROFILE
is replaced by the architecture name (eg. 8A) in arm_option_override () and
is thus chosen to be big enough to hold the longest architecture name.  */
@@ -2821,13 +2815,14 @@ arm_option_check_internal (struct gcc_options *opts)
   const struct arm_fpu_desc *fpu_desc = &all_fpus[opts->x_arm_fpu_index];
 
   /* iWMMXt and NEON are incompatible.  */
-if (TARGET_IWMMXT
-	&& ARM_FPU_FSET_HAS (fpu_desc->features, FPU_FL_NEON))
+  if (TARGET_IWMMXT
+  && ARM_FPU_FSET_HAS (fpu_desc->features, FPU_FL_NEON))
 error ("iWMMXt and NEON are incompatible");
 
   /* Make sure that the processor choice does not conflict with any of the
  other command line choices.  */
-  if (TARGET_ARM_P (flags) && !ARM_FSET_HAS_CPU1 (insn_flags, FL_NOTM))
+  if (TARGET_ARM_P (flags)
+  && !bitmap_bit_p (arm_active_target.isa, isa_bit_notm))
 error ("target CPU does not support ARM mode");
 
   /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
@@ -2949,7 +2944,7 @@ arm_option_override_internal (struct gcc_options *opts,
 {
   arm_override_options_after_change_1 (opts);
 
-  if (TARGET_INTERWORK && !ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB))
+  if (TARGET_INTERWORK && !bitmap_bit_p (arm_active_target.isa, isa_bit_thumb))
 {
   /* The default is to enable interworking, so this warning message would
 	 be confusing to users who have just compiled with, eg, -march=armv3.  */
@@ -2958,7 +2953,7 @@ arm_option_override_internal (struct gcc_options *opts,
 }
 
   if (TARGET_THUMB_P (opts->x_target_flags)
-  && !(ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB)))
+  && !bitmap_bit_p (arm_active_target.isa, isa_bit_thumb))
 {
   warning (0, "target CPU 

[PATCH 09/21] [arm] Rework arm-common to use new feature bits.

2016-12-15 Thread Richard Earnshaw (lists)

This converts the recently added implicit -mthumb support code to use
the new data structures.  Since we have a very simple query and no
initialized copies of the sbitmaps, for now we simply scan the list of
features to look for the one of interest.

* arm-opts.h (struct arm_arch_core_flag): Add new field ISA.
Initialize it.
(arm_arch_core_flag): Delete flags field.
(arm_arch_core_flags): Don't initialize flags field.
* common/config/arm/arm-common.c (check_isa_bits_for): New function.
(arm_target_thumb_only): Use new isa bits arrays.
---
 gcc/common/config/arm/arm-common.c | 23 +++
 gcc/config/arm/arm-opts.h  |  1 +
 2 files changed, 20 insertions(+), 4 deletions(-)


diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index 79e3f1f..dca3682 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -101,23 +101,37 @@ arm_rewrite_mcpu (int argc, const char **argv)
 struct arm_arch_core_flag
 {
   const char *const name;
-  const arm_feature_set flags;
+  const enum isa_feature isa_bits[isa_num_bits];
 };
 
 static const struct arm_arch_core_flag arm_arch_core_flags[] =
 {
 #undef ARM_CORE
 #define ARM_CORE(NAME, X, IDENT, TUNE_FLAGS, ARCH, ISA, FLAGS, COSTS)	\
-  {NAME, FLAGS},
+  {NAME, {ISA isa_nobit}},
 #include "config/arm/arm-cores.def"
 #undef ARM_CORE
 #undef ARM_ARCH
 #define ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA, FLAGS)	\
-  {NAME, FLAGS},
+  {NAME, {ISA isa_nobit}},
 #include "config/arm/arm-arches.def"
 #undef ARM_ARCH
 };
 
+/* Scan over a raw feature array BITS checking for BIT being present.
+   This is slower than the normal bitmask checks, but we would spend longer
+   initializing that than doing the check this way.  Returns true iff
+   BIT is found.  */
+static bool
+check_isa_bits_for (const enum isa_feature* bits, enum isa_feature bit)
+{
+  while (*bits != isa_nobit)
+if (*bits++ == bit)
+  return true;
+
+  return false;
+}
+
 /* Called by the driver to check whether the target denoted by current
command line options is a Thumb-only target.  ARGV is an array of
-march and -mcpu values (ie. it contains the rhs after the equal
@@ -132,7 +146,8 @@ arm_target_thumb_only (int argc, const char **argv)
 {
   for (opt = 0; opt < (ARRAY_SIZE (arm_arch_core_flags)); opt++)
 	if ((strcmp (argv[argc - 1], arm_arch_core_flags[opt].name) == 0)
-	&& !ARM_FSET_HAS_CPU1(arm_arch_core_flags[opt].flags, FL_NOTM))
+	&& !check_isa_bits_for (arm_arch_core_flags[opt].isa_bits,
+isa_bit_notm))
 	  return "-mthumb";
 
   return NULL;
diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
index a62ac46..52c69e9 100644
--- a/gcc/config/arm/arm-opts.h
+++ b/gcc/config/arm/arm-opts.h
@@ -26,6 +26,7 @@
 #define ARM_OPTS_H
 
 #include "arm-flags.h"
+#include "arm-isa.h"
 
 /* The various ARM cores.  */
 enum processor_type



[PATCH 10/21] [arm] Remove remaining references to arm feature sets.

2016-12-15 Thread Richard Earnshaw (lists)

Nothing uses the old feature sets now, so we can delete them entirely.

* arm-cores.def: Remove FLAGS field from all core definitions.
* arm-arches.def: Likewise.
* arm-opts.h (enum processor_type): Remove FLAGS parameter from
ARM_CORES macro.
(arm_arch_core_flags): Likewise, plus ARM_ARCH macro.
* arm-protos.h (FL_*): Delete.
(arm_feature_set): Delete.
(ARM_FSET_*): Delete.
* arm.c (struct processors): Delete flags field.
(all_cores): Delete FLAGS parameter from macro, don't initialize flags.
(all architectures): Likewise.
---
 gcc/common/config/arm/arm-common.c |   4 +-
 gcc/config/arm/arm-arches.def  |  75 ++---
 gcc/config/arm/arm-cores.def   | 224
++---
 gcc/config/arm/arm-flags.h | 185 --
 gcc/config/arm/arm-opts.h  |   2 +-
 gcc/config/arm/arm.c   |  14 +--
 6 files changed, 157 insertions(+), 347 deletions(-)


diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index dca3682..611675b 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -107,12 +107,12 @@ struct arm_arch_core_flag
 static const struct arm_arch_core_flag arm_arch_core_flags[] =
 {
 #undef ARM_CORE
-#define ARM_CORE(NAME, X, IDENT, TUNE_FLAGS, ARCH, ISA, FLAGS, COSTS)	\
+#define ARM_CORE(NAME, X, IDENT, TUNE_FLAGS, ARCH, ISA, COSTS)	\
   {NAME, {ISA isa_nobit}},
 #include "config/arm/arm-cores.def"
 #undef ARM_CORE
 #undef ARM_ARCH
-#define ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA, FLAGS)	\
+#define ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA)	\
   {NAME, {ISA isa_nobit}},
 #include "config/arm/arm-arches.def"
 #undef ARM_ARCH
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 02ece42..ed6b0b6 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -19,50 +19,49 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA, FLAGS)
+  ARM_ARCH(NAME, CORE, TUNE_FLAGS, ARCH, ISA)
 
The NAME is the name of the architecture, represented as a string
constant.  The CORE is the identifier for a core representative of
this architecture.  ARCH is the architecture revision.  ISA is the
detailed architectural capabilities of the core (see arm-isa.h).
-   FLAGS is the set of feature flags implied by the architecture.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,		ISA_FEAT(ISA_ARMv2) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
-ARM_ARCH("armv2a",  arm2,   (TF_CO_PROC | TF_NO_MODE32), 2,		ISA_FEAT(ISA_ARMv2) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH2))
-ARM_ARCH("armv3",   arm6,   TF_CO_PROC,   		 3,		ISA_FEAT(ISA_ARMv3) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3))
-ARM_ARCH("armv3m",  arm7m,  TF_CO_PROC, 		 3M,	ISA_FEAT(ISA_ARMv3m) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH3M))
-ARM_ARCH("armv4",   arm7tdmi,   TF_CO_PROC, 		 4,		ISA_FEAT(ISA_ARMv4) ISA_FEAT(isa_bit_mode26), ARM_FSET_MAKE_CPU1 (FL_MODE26 | FL_FOR_ARCH4))
-/* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
+ARM_ARCH("armv2",   arm2,	(TF_CO_PROC | TF_NO_MODE32), 2,		ISA_FEAT(ISA_ARMv2) ISA_FEAT(isa_bit_mode26))
+ARM_ARCH("armv2a",  arm2,	(TF_CO_PROC | TF_NO_MODE32), 2,		ISA_FEAT(ISA_ARMv2) ISA_FEAT(isa_bit_mode26))
+ARM_ARCH("armv3",   arm6,	TF_CO_PROC,		 3,		ISA_FEAT(ISA_ARMv3) ISA_FEAT(isa_bit_mode26))
+ARM_ARCH("armv3m",  arm7m,	TF_CO_PROC,		 3M,	ISA_FEAT(ISA_ARMv3m) ISA_FEAT(isa_bit_mode26))
+ARM_ARCH("armv4",   arm7tdmi,	TF_CO_PROC,		 4,		ISA_FEAT(ISA_ARMv4) ISA_FEAT(isa_bit_mode26))
+/* Strictly, isa_bit_mode26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   TF_CO_PROC,		 4T,	ISA_FEAT(ISA_ARMv4t), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH4T))
-ARM_ARCH("armv5",   arm10tdmi,  TF_CO_PROC, 		 5,		ISA_FEAT(ISA_ARMv5), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH5))
-ARM_ARCH("armv5t",  arm10tdmi,  TF_CO_PROC, 		 5T,	ISA_FEAT(ISA_ARMv5t), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH5T))
-ARM_ARCH("armv5e",  arm1026ejs, TF_CO_PROC, 		 5E,	ISA_FEAT(ISA_ARMv5e), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH5E))
-ARM_ARCH("armv5te", arm1026ejs, TF_CO_PROC, 		 5TE,	ISA_FEAT(ISA_ARMv5te), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH5TE))
-ARM_ARCH("armv6",   arm1136js,  TF_CO_PROC, 		 6,		ISA_FEAT(ISA_ARMv6), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH6))
-ARM_ARCH("armv6j",  arm1136js,  TF_CO_PROC, 		 6J,	ISA_FEAT(ISA_ARMv6j), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH6J))
-ARM_ARCH("armv6k",  mpcore,	TF_CO_PROC, 		 6K,	ISA_FEAT(ISA_ARMv6k), ARM_FSET_MAKE_CPU1 (FL_FOR_ARCH6K))
-ARM_AR

[PATCH 15/21] [arm] Initialize fpu capability bits in arm_active_target.

2016-12-15 Thread Richard Earnshaw (lists)

Now that we can describe the FPU with the standard ISA bits we need to
initialize them.  However, the FPU settings can be changed with target build
attributes, so we also need to reset them if things change.  This requires
a bit of juggling about with the existing code to ensure that the active
target is reconfigured after each change to the target options.

* arm-protos.h: Include sbitmap.h
(arm_configure_build_target): Make public.
* arm.c (arm_configure_build_target): Now not static.
(arm_valid_target_attribute_rec): Move internal option check to...
(arm_valid_target_attribute_tree0: ... here.  Also reconfingure the
active target.
(arm_override_options_after_change): Call arm_configure_build_target.
(isa_all_fpubits): Renamed from isa_fpubits.
(arm_option_restore): New function.
(TARGET_OPTION_RESTORE): Register it.
(arm_configure_build_target): Initialize the FPU capability bits in
the isa.
(arm_option_override): Move the code that forces the setting of the
FPU option before the call to arm_configure_build_target.
* arm.opt (march): Mark as Save.
(mcpu, mtune): Likewise.
* arm-c.c (arm_pragma_target_parse): Reconfigure the build target
after pragmas change the target options.
---
 gcc/config/arm/arm-c.c  |  2 ++
 gcc/config/arm/arm-protos.h |  4 +++
 gcc/config/arm/arm.c| 69
++---
 gcc/config/arm/arm.opt  |  6 ++--
 4 files changed, 55 insertions(+), 26 deletions(-)


diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index b592134..9dd9a8d 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -243,6 +243,8 @@ arm_pragma_target_parse (tree args, tree pop_target)
   /* handle_pragma_pop_options and handle_pragma_reset_options will set
target_option_current_node, but not handle_pragma_target.  */
   target_option_current_node = cur_tree;
+  arm_configure_build_target (&arm_active_target, &global_options,
+  &global_options_set, false);
 }
 
   /* Update macros if target_node changes. The global state will be restored
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 659959b..da3484f 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -24,6 +24,7 @@
 
 #include "arm-flags.h"
 #include "arm-isa.h"
+#include "sbitmap.h"
 
 extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
@@ -223,6 +224,9 @@ extern bool arm_change_mode_p (tree);
 
 extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
 	 struct gcc_options *);
+extern void arm_configure_build_target (struct arm_build_target *,
+	struct gcc_options *,
+	struct gcc_options *, bool);
 extern void arm_pr_long_calls (struct cpp_reader *);
 extern void arm_pr_no_long_calls (struct cpp_reader *);
 extern void arm_pr_long_calls_off (struct cpp_reader *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bc246c9..437ee2d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -231,6 +231,8 @@ static tree arm_build_builtin_va_list (void);
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_option_restore (struct gcc_options *,
+struct cl_target_option *);
 static void arm_override_options_after_change (void);
 static void arm_option_print (FILE *, int, struct cl_target_option *);
 static void arm_set_current_function (tree);
@@ -408,6 +410,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
 #define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE arm_override_options_after_change
 
+#undef TARGET_OPTION_RESTORE
+#define TARGET_OPTION_RESTORE arm_option_restore
+
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT arm_option_print
 
@@ -2932,9 +2937,19 @@ arm_override_options_after_change_1 (struct gcc_options *opts)
 static void
 arm_override_options_after_change (void)
 {
+  arm_configure_build_target (&arm_active_target, &global_options,
+			  &global_options_set, false);
+
   arm_override_options_after_change_1 (&global_options);
 }
 
+static void
+arm_option_restore (struct gcc_options *opts, struct cl_target_option *ptr)
+{
+  arm_configure_build_target (&arm_active_target, opts, &global_options_set,
+			  false);
+}
+
 /* Reset options between modes that the user has specified.  */
 static void
 arm_option_override_internal (struct gcc_options *opts,
@@ -3048,13 +3063,13 @@ arm_initialize_isa (sbitmap isa, const enum isa_feature *isa_bits)
 bitmap_set_bit (isa, *(isa_bits++));
 }
 
-static sbitmap isa_fpubits;
+static sbitmap isa_all_fpubits;
 static sbitmap isa_quirkbits;
 
 /* Configure a build target TARGET from the user-specified opti

[PATCH 14/21] [arm] Add isa features to FPU descriptions

2016-12-15 Thread Richard Earnshaw (lists)

Similar to the new CPU and architecture ISA feature lists, we now add
similar capabilities to each FPU description.  We don't use these yet,
that will come in later patches.  These follow the same style as the
newly modified flag sets, but use slightly different defaults that
more accurately reflect the ISA specifications.

* arm-isa.h (isa_feature): Add bits for VFPv4, FPv5, fp16conv,
fP_dbl, fp_d32 and fp_crypto.
(ISA_ALL_FPU): Add all the new bits.
(ISA_VFPv2, ISA_VFPv3, ISA_VFPv4, ISA_FPv5): New macros.
(ISA_FP_ARMv8, ISA_FP_DBL, ISA_FP_D32, ISA_NEON, ISA_CRYPTO): Likewise.
* arm-fpus.def: Add ISA features to all FPUs.
* arm.h: (arm_fpu_desc): Add new field for ISA bits.
* arm.c (all_fpus): Initialize it.
* arm-tables.opt: Regenerated.
---
 gcc/config/arm/arm-fpus.def   | 44
+--
 gcc/config/arm/arm-isa.h  | 30 +
 gcc/config/arm/arm-tables.opt | 10 +-
 gcc/config/arm/arm.c  |  4 ++--
 gcc/config/arm/arm.h  |  1 +
 5 files changed, 56 insertions(+), 33 deletions(-)


diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 25e2ebd..1be718f 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,31 +19,31 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, FEATURES)
+  ARM_FPU(NAME, ISA, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",			FPU_VFPv2 | FPU_DBL)
-ARM_FPU("vfpv2",		FPU_VFPv2 | FPU_DBL)
-ARM_FPU("vfpv3",		FPU_VFPv3 | FPU_D32)
-ARM_FPU("vfpv3-fp16",		FPU_VFPv3 | FPU_D32 | FPU_FP16)
-ARM_FPU("vfpv3-d16",		FPU_VFPv3 | FPU_DBL)
-ARM_FPU("vfpv3-d16-fp16", 	FPU_VFPv3 | FPU_DBL | FPU_FP16)
-ARM_FPU("vfpv3xd",		FPU_VFPv3)
-ARM_FPU("vfpv3xd-fp16",		FPU_VFPv3 | FPU_FP16)
-ARM_FPU("neon",			FPU_VFPv3 | FPU_NEON)
-ARM_FPU("neon-vfpv3",		FPU_VFPv3 | FPU_NEON)
-ARM_FPU("neon-fp16",		FPU_VFPv3 | FPU_NEON | FPU_FP16)
-ARM_FPU("vfpv4",		FPU_VFPv4 | FPU_D32 | FPU_FP16)
-ARM_FPU("vfpv4-d16",		FPU_VFPv4 | FPU_DBL | FPU_FP16)
-ARM_FPU("fpv4-sp-d16",		FPU_VFPv4 | FPU_FP16)
-ARM_FPU("fpv5-sp-d16",		FPU_VFPv5 | FPU_FP16)
-ARM_FPU("fpv5-d16",		FPU_VFPv5 | FPU_DBL | FPU_FP16)
-ARM_FPU("neon-vfpv4",		FPU_VFPv4 | FPU_NEON | FPU_FP16)
-ARM_FPU("fp-armv8",		FPU_ARMv8 | FPU_D32 | FPU_FP16)
-ARM_FPU("neon-fp-armv8", 	FPU_ARMv8 | FPU_NEON | FPU_FP16)
-ARM_FPU("crypto-neon-fp-armv8", FPU_ARMv8 | FPU_CRYPTO | FPU_FP16)
+ARM_FPU("vfp",			ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv2 | FPU_DBL)
+ARM_FPU("vfpv2",		ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv2 | FPU_DBL)
+ARM_FPU("vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32),			 FPU_VFPv3 | FPU_D32)
+ARM_FPU("vfpv3-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32) ISA_FEAT(isa_bit_fp16conv), FPU_VFPv3 | FPU_D32 | FPU_FP16)
+ARM_FPU("vfpv3-d16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv3 | FPU_DBL)
+ARM_FPU("vfpv3-d16-fp16",	ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL) ISA_FEAT(isa_bit_fp16conv), FPU_VFPv3 | FPU_DBL | FPU_FP16)
+ARM_FPU("vfpv3xd",		ISA_FEAT(ISA_VFPv3),		 FPU_VFPv3)
+ARM_FPU("vfpv3xd-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(isa_bit_fp16conv),			 FPU_VFPv3 | FPU_FP16)
+ARM_FPU("neon",			ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON), FPU_VFPv3 | FPU_NEON)
+ARM_FPU("neon-vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON), FPU_VFPv3 | FPU_NEON)
+ARM_FPU("neon-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON) ISA_FEAT(isa_bit_fp16conv),   FPU_VFPv3 | FPU_NEON | FPU_FP16)
+ARM_FPU("vfpv4",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_FP_D32),			 FPU_VFPv4 | FPU_D32 | FPU_FP16)
+ARM_FPU("neon-vfpv4",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_NEON), FPU_VFPv4 | FPU_NEON | FPU_FP16)
+ARM_FPU("vfpv4-d16",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv4 | FPU_DBL | FPU_FP16)
+ARM_FPU("fpv4-sp-d16",		ISA_FEAT(ISA_VFPv4),		 FPU_VFPv4 | FPU_FP16)
+ARM_FPU("fpv5-sp-d16",		ISA_FEAT(ISA_FPv5),		 FPU_VFPv5 | FPU_FP16)
+ARM_FPU("fpv5-d16",		ISA_FEAT(ISA_FPv5) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv5 | FPU_DBL | FPU_FP16)
+ARM_FPU("fp-armv8",		ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_FP_D32),			 FPU_ARMv8 | FPU_D32 | FPU_FP16)
+ARM_FPU("neon-fp-armv8",	ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_NEON),			 FPU_ARMv8 | FPU_NEON | FPU_FP16)
+ARM_FPU("crypto-neon-fp-armv8", ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_CRYPTO),			 FPU_ARMv8 | FPU_CRYPTO | FPU_FP16)
 /* Compatibility aliases.  */
-ARM_FPU("vfp3",			FPU_VFPv3 | FPU_D32)
+ARM_FPU("vfp3",			ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32),			 FPU_VFPv3 | FPU_D32)
diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h
index 2d47c1b..25182e52 100644
--- a/gcc/config/arm/arm-isa.h
+++ b/gcc/config/arm/arm-isa.h
@@ -53,10 +53,18 @@ enum isa_feature
 isa_bit_ARMv8_2,	/* Architecutre rel 8.2.  */

[PATCH 16/21] [arm] Eliminate TARGET_FPU_NAME.

2016-12-15 Thread Richard Earnshaw (lists)

Rather than assuming a specific fpu name has been selected, we work
out the FPU from the ISA properties.  This is necessary since once we
have default FPUs selected by the processor, there will be no explicit
entry in the table of fpus to refer to.

This also fixes a bug with the code I added recently to permit new
aliases for existing FPU names: the new names cannot be passed to the
assembler since it does not recognize them.  By mapping the ISA
features back to the canonical names we avoid having to teach the
assembler about the new names.

* arm.h (TARGET_FPU_NAME): Delete.
* arm.c (arm_identify_fpu_from_isa): New function.
(arm_declare_function_name): Use it to get the name for the FPU.
---
 gcc/config/arm/arm.c | 26 --
 gcc/config/arm/arm.h |  1 -
 2 files changed, 24 insertions(+), 3 deletions(-)


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 437ee2d..df7a3ea 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3256,7 +3256,7 @@ arm_configure_build_target (struct arm_build_target *target,
   gcc_assert (arm_selected_cpu);
 
   arm_selected_fpu = &all_fpus[opts->x_arm_fpu_index];
-  auto_sbitmap fpu_bits(isa_num_bits);
+  auto_sbitmap fpu_bits (isa_num_bits);
 
   arm_initialize_isa (fpu_bits, arm_selected_fpu->isa_bits);
   bitmap_and_compl (target->isa, target->isa, isa_all_fpubits);
@@ -30433,6 +30433,26 @@ arm_valid_target_attribute_p (tree fndecl, tree ARG_UNUSED (name),
   return ret;
 }
 
+/* Match an ISA feature bitmap to a named FPU.  We always use the
+   first entry that exactly matches the feature set, so that we
+   effectively canonicalize the FPU name for the assembler.  */
+static const char*
+arm_identify_fpu_from_isa (sbitmap isa)
+{
+  auto_sbitmap fpubits (isa_num_bits);
+  auto_sbitmap cand_fpubits (isa_num_bits);
+
+  bitmap_and (fpubits, isa, isa_all_fpubits);
+  for (unsigned int i = 0; i < ARRAY_SIZE (all_fpus); i++)
+{
+  arm_initialize_isa (cand_fpubits, all_fpus[i].isa_bits);
+  if (bitmap_equal_p (fpubits, cand_fpubits))
+	return all_fpus[i].name;
+}
+  /* We must find an entry, or things have gone wrong.  */
+  gcc_unreachable ();
+}
+
 void
 arm_declare_function_name (FILE *stream, const char *name, tree decl)
 {
@@ -30454,7 +30474,9 @@ arm_declare_function_name (FILE *stream, const char *name, tree decl)
 fprintf (stream, "\t.arm\n");
 
   asm_fprintf (asm_out_file, "\t.fpu %s\n",
-	   TARGET_SOFT_FLOAT ? "softvfp" : TARGET_FPU_NAME);
+	   (TARGET_SOFT_FLOAT
+		? "softvfp"
+		: arm_identify_fpu_from_isa (arm_active_target.isa)));
 
   if (TARGET_POKE_FUNCTION_NAME)
 arm_poke_function_name (stream, (const char *) name);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 908e763..980bb74 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -369,7 +369,6 @@ extern const struct arm_fpu_desc
 
 /* Accessors.  */
 
-#define TARGET_FPU_NAME (all_fpus[arm_fpu_index].name)
 #define TARGET_FPU_FEATURES (all_fpus[arm_fpu_index].features)
 
 /* Which floating point hardware to schedule for.  */



[PATCH 17/21] [arm] Use arm_active_target for most FP feature tests.

2016-12-15 Thread Richard Earnshaw (lists)

Now that the isa feature bits are all available in arm_active_target
we can use that for most of the feature tests that are needed.

* arm.h (TARGET_VFPD32): Use arm_active_target.
(TARGET_VFP3): Likewise.
(TARGET_VFP5): Likewise.
(TARGET_VFP_SINGLE): Likewise.
(TARGET_VFP_DOUBLE): Likewise.
(TARGET_NEON_FP16): Likewise.
(TARGET_FP16): Likewise.
(TARGET_FMA): Likewise.
(TARGET_FPU_ARMV8): Likewise.
(TARGET_CRYPTO): Likewise.
(TARGET_NEON): Likewise.
(TARGET_FPU_FEATURES): Delete.
* arm.c (arm_option_check_internal): Check for iwmmxt conflict with
Neon using arm_active_target.
---
 gcc/config/arm/arm.c |  3 +--
 gcc/config/arm/arm.h | 33 ++---
 2 files changed, 15 insertions(+), 21 deletions(-)


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index df7a3ea..676c78b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2815,11 +2815,10 @@ static void
 arm_option_check_internal (struct gcc_options *opts)
 {
   int flags = opts->x_target_flags;
-  const struct arm_fpu_desc *fpu_desc = &all_fpus[opts->x_arm_fpu_index];
 
   /* iWMMXt and NEON are incompatible.  */
   if (TARGET_IWMMXT
-  && ARM_FPU_FSET_HAS (fpu_desc->features, FPU_FL_NEON))
+  && bitmap_bit_p (arm_active_target.isa, isa_bit_neon))
 error ("iWMMXt and NEON are incompatible");
 
   /* Make sure that the processor choice does not conflict with any of the
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 980bb74..17f030b 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -161,28 +161,27 @@ extern tree arm_fp16_type_node;
to be more careful with TARGET_NEON as noted below.  */
 
 /* FPU is has the full VFPv3/NEON register file of 32 D registers.  */
-#define TARGET_VFPD32 (TARGET_FPU_FEATURES & FPU_FL_D32)
+#define TARGET_VFPD32 (bitmap_bit_p (arm_active_target.isa, isa_bit_fp_d32))
 
 /* FPU supports VFPv3 instructions.  */
-#define TARGET_VFP3 (TARGET_FPU_FEATURES & FPU_FL_VFPv3)
+#define TARGET_VFP3 (bitmap_bit_p (arm_active_target.isa, isa_bit_VFPv3))
 
 /* FPU supports FPv5 instructions.  */
-#define TARGET_VFP5 (TARGET_FPU_FEATURES & FPU_FL_VFPv5)
+#define TARGET_VFP5 (bitmap_bit_p (arm_active_target.isa, isa_bit_FPv5))
 
 /* FPU only supports VFP single-precision instructions.  */
-#define TARGET_VFP_SINGLE ((TARGET_FPU_FEATURES & FPU_FL_DBL) == 0)
+#define TARGET_VFP_SINGLE (!TARGET_VFP_DOUBLE)
 
 /* FPU supports VFP double-precision instructions.  */
-#define TARGET_VFP_DOUBLE (TARGET_FPU_FEATURES & FPU_FL_DBL)
+#define TARGET_VFP_DOUBLE (bitmap_bit_p (arm_active_target.isa, isa_bit_fp_dbl))
 
 /* FPU supports half-precision floating-point with NEON element load/store.  */
 #define TARGET_NEON_FP16	\
-  (ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_NEON)		\
-   && ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_FP16))
+  (bitmap_bit_p (arm_active_target.isa, isa_bit_neon)		\
+   && bitmap_bit_p (arm_active_target.isa, isa_bit_fp16conv))
 
-/* FPU supports VFP half-precision floating-point.  */
-#define TARGET_FP16			\
-  (ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_FP16))
+/* FPU supports VFP half-precision floating-point conversions.  */
+#define TARGET_FP16 (bitmap_bit_p (arm_active_target.isa, isa_bit_fp16conv))
 
 /* FPU supports converting between HFmode and DFmode in a single hardware
step.  */
@@ -190,14 +189,14 @@ extern tree arm_fp16_type_node;
   (TARGET_HARD_FLOAT && (TARGET_FP16 && TARGET_VFP5))
 
 /* FPU supports fused-multiply-add operations.  */
-#define TARGET_FMA (TARGET_FPU_FEATURES & FPU_FL_VFPv4)
+#define TARGET_FMA (bitmap_bit_p (arm_active_target.isa, isa_bit_VFPv4))
 
 /* FPU is ARMv8 compatible.  */
-#define TARGET_FPU_ARMV8 (TARGET_FPU_FEATURES & FPU_FL_ARMv8)
+#define TARGET_FPU_ARMV8	\
+  (bitmap_bit_p (arm_active_target.isa, isa_bit_FP_ARMv8))
 
 /* FPU supports Crypto extensions.  */
-#define TARGET_CRYPTO			\
-  (ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_CRYPTO))
+#define TARGET_CRYPTO (bitmap_bit_p (arm_active_target.isa, isa_bit_crypto))
 
 /* FPU supports Neon instructions.  The setting of this macro gets
revealed via __ARM_NEON__ so we add extra guards upon TARGET_32BIT
@@ -205,7 +204,7 @@ extern tree arm_fp16_type_node;
available.  */
 #define TARGET_NEON			\
   (TARGET_32BIT && TARGET_HARD_FLOAT	\
-   && ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_NEON))
+   && bitmap_bit_p (arm_active_target.isa, isa_bit_neon))
 
 /* FPU supports ARMv8.1 Adv.SIMD extensions.  */
 #define TARGET_NEON_RDMA (TARGET_NEON && arm_arch8_1)
@@ -367,10 +366,6 @@ extern const struct arm_fpu_desc
   arm_fpu_feature_set features;
 } all_fpus[];
 
-/* Accessors.  */
-
-#define TARGET_FPU_FEATURES (all_fpus[arm_fpu_index].features)
-
 /* Which floating point hardware to schedule for.  */
 extern int arm_fpu_attr;
 



[PATCH 13/21] [arm] Remove FPU rev field

2016-12-15 Thread Richard Earnshaw (lists)

Similar to the main ISA, we convert the FPU revision into a set of feature
bits.  This permits a more complex set of capability relationships to be
expressed more easily.  For now we continue to use the traditional bitmasks.

* arm.h (FPU_FL_VFPv2) New feature bit.
(FPU_FL_VFPv3, FPU_FL_VFPv4, FPU_FL_VFPv5, FPU_FL_ARMv8): Likewise.
(FPU_VFPv2, FPU_VFPv3, FPU_VFPv4, FPU_VFPv5, FPU_ARMv8): New helper
macros.
(FPU_DBL, FPU_D32, FPU_NEON, FPU_CRYPTO, FPU_FP16): Likewise.
(TARGET_FPU_REV): Delete.
(TARGET_VFP3): Use feature bits.
(TARGET_VFP5): Likewise.
(TARGET_FMA): Likewise.
(TARGET_FPU_ARMV8): Likewise.
(struct arm_fpu_desc): Delete rev field.
* arm-fpus.def: Delete REV entry, use new feature bits and macros.
* arm.c (all_fpus): Delete rev field.
---
 gcc/config/arm/arm-fpus.def | 44
++--
 gcc/config/arm/arm.c|  4 ++--
 gcc/config/arm/arm.h| 28 ++--
 3 files changed, 46 insertions(+), 30 deletions(-)


diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index eca03bb..25e2ebd 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,31 +19,31 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, REV, FEATURES)
+  ARM_FPU(NAME, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",			2, FPU_FL_DBL)
-ARM_FPU("vfpv2",		2, FPU_FL_DBL)
-ARM_FPU("vfpv3",		3, FPU_FL_D32 | FPU_FL_DBL)
-ARM_FPU("vfpv3-fp16",		3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_FP16)
-ARM_FPU("vfpv3-d16",		3, FPU_FL_DBL)
-ARM_FPU("vfpv3-d16-fp16", 	3, FPU_FL_DBL | FPU_FL_FP16)
-ARM_FPU("vfpv3xd",		3, FPU_FL_NONE)
-ARM_FPU("vfpv3xd-fp16",		3, FPU_FL_FP16)
-ARM_FPU("neon",			3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON)
-ARM_FPU("neon-vfpv3",		3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON)
-ARM_FPU("neon-fp16",		3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16)
-ARM_FPU("vfpv4",		4, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_FP16)
-ARM_FPU("vfpv4-d16",		4, FPU_FL_DBL | FPU_FL_FP16)
-ARM_FPU("fpv4-sp-d16",		4, FPU_FL_FP16)
-ARM_FPU("fpv5-sp-d16",		5, FPU_FL_FP16)
-ARM_FPU("fpv5-d16",		5, FPU_FL_DBL | FPU_FL_FP16)
-ARM_FPU("neon-vfpv4",		4, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16)
-ARM_FPU("fp-armv8",		8, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_FP16)
-ARM_FPU("neon-fp-armv8", 	8, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16)
-ARM_FPU("crypto-neon-fp-armv8", 8, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
+ARM_FPU("vfp",			FPU_VFPv2 | FPU_DBL)
+ARM_FPU("vfpv2",		FPU_VFPv2 | FPU_DBL)
+ARM_FPU("vfpv3",		FPU_VFPv3 | FPU_D32)
+ARM_FPU("vfpv3-fp16",		FPU_VFPv3 | FPU_D32 | FPU_FP16)
+ARM_FPU("vfpv3-d16",		FPU_VFPv3 | FPU_DBL)
+ARM_FPU("vfpv3-d16-fp16", 	FPU_VFPv3 | FPU_DBL | FPU_FP16)
+ARM_FPU("vfpv3xd",		FPU_VFPv3)
+ARM_FPU("vfpv3xd-fp16",		FPU_VFPv3 | FPU_FP16)
+ARM_FPU("neon",			FPU_VFPv3 | FPU_NEON)
+ARM_FPU("neon-vfpv3",		FPU_VFPv3 | FPU_NEON)
+ARM_FPU("neon-fp16",		FPU_VFPv3 | FPU_NEON | FPU_FP16)
+ARM_FPU("vfpv4",		FPU_VFPv4 | FPU_D32 | FPU_FP16)
+ARM_FPU("vfpv4-d16",		FPU_VFPv4 | FPU_DBL | FPU_FP16)
+ARM_FPU("fpv4-sp-d16",		FPU_VFPv4 | FPU_FP16)
+ARM_FPU("fpv5-sp-d16",		FPU_VFPv5 | FPU_FP16)
+ARM_FPU("fpv5-d16",		FPU_VFPv5 | FPU_DBL | FPU_FP16)
+ARM_FPU("neon-vfpv4",		FPU_VFPv4 | FPU_NEON | FPU_FP16)
+ARM_FPU("fp-armv8",		FPU_ARMv8 | FPU_D32 | FPU_FP16)
+ARM_FPU("neon-fp-armv8", 	FPU_ARMv8 | FPU_NEON | FPU_FP16)
+ARM_FPU("crypto-neon-fp-armv8", FPU_ARMv8 | FPU_CRYPTO | FPU_FP16)
 /* Compatibility aliases.  */
-ARM_FPU("vfp3",			3, FPU_FL_D32 | FPU_FL_DBL)
+ARM_FPU("vfp3",			FPU_VFPv3 | FPU_D32)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 820a6ab..e555cf6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2323,8 +2323,8 @@ char arm_arch_name[] = "__ARM_ARCH_PROFILE__";
 
 const struct arm_fpu_desc all_fpus[] =
 {
-#define ARM_FPU(NAME, REV, FEATURES) \
-  { NAME, REV, FEATURES },
+#define ARM_FPU(NAME, FEATURES) \
+  { NAME, FEATURES },
 #include "arm-fpus.def"
 #undef ARM_FPU
 };
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index a412fb1..332f0fa 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -164,10 +164,10 @@ extern tree arm_fp16_type_node;
 #define TARGET_VFPD32 (TARGET_FPU_FEATURES & FPU_FL_D32)
 
 /* FPU supports VFPv3 instructions.  */
-#define TARGET_VFP3 (TARGET_FPU_REV >= 3)
+#define TARGET_VFP3 (TARGET_FPU_FEATURES & FPU_FL_VFPv3)
 
 /* FPU supports FPv5 instructions.  */
-#define TARGET_VFP5 (TARGET_FPU_REV >= 5)
+#define TARGET_VFP5 (TARGET_FPU_FEATURES & FPU_FL_VFPv5)
 
 /* FPU only supports VFP single-precision instructions.  */
 #define TARGET_VFP_SINGLE ((TARGET_FPU_FEATURES & FPU_FL_DBL) == 0)
@@ -190,10 +190,10 @@ extern tree arm_fp16_type_node;
   (TARGET_HARD_FLOAT && (TARGET_FP16

[PATCH 18/21] [arm] Use cl_target_options for configuring the active target.

2016-12-15 Thread Richard Earnshaw (lists)

It now becomes apparent that it would be better to use the the
cl_target_options as the basis for calling arm_configure_build_target;
it already contains exactly the same fields that we need.  I chose not
to rewrite the earlier patches as that would make the progression of
changes seem less logical than it currently is, with several early
changes having no immediate justification.

* arm-protos.h (arm_configure_build_target): Change second argument
to cl_target_options.
* arm.c (arm_configure_build_target): Likewise.
(arm_option_restore): Update accordingly.
(arm_option_override): Create the target_option_default_node before
calling arm_configure_build_target.  Use it in call of latter.
Resynchronize after all other overrides have been calculated.
(arm_valid_target_attribute_tree): Use the target options for
reconfiguration.  Resynchronize after performing override checks.
* arm-c.c (arm_pragma_target_parse): Use target optiosn from cur_tree
to reconfigure the build target.
---
 gcc/config/arm/arm-c.c  |  3 ++-
 gcc/config/arm/arm-protos.h |  2 +-
 gcc/config/arm/arm.c| 36 
 3 files changed, 27 insertions(+), 14 deletions(-)


diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 9dd9a8d..b57af69 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -243,7 +243,8 @@ arm_pragma_target_parse (tree args, tree pop_target)
   /* handle_pragma_pop_options and handle_pragma_reset_options will set
target_option_current_node, but not handle_pragma_target.  */
   target_option_current_node = cur_tree;
-  arm_configure_build_target (&arm_active_target, &global_options,
+  arm_configure_build_target (&arm_active_target,
+  TREE_TARGET_OPTION (cur_tree),
   &global_options_set, false);
 }
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index da3484f..d418ca9 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -225,7 +225,7 @@ extern bool arm_change_mode_p (tree);
 extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
 	 struct gcc_options *);
 extern void arm_configure_build_target (struct arm_build_target *,
-	struct gcc_options *,
+	struct cl_target_option *,
 	struct gcc_options *, bool);
 extern void arm_pr_long_calls (struct cpp_reader *);
 extern void arm_pr_no_long_calls (struct cpp_reader *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 676c78b..df520e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2936,16 +2936,17 @@ arm_override_options_after_change_1 (struct gcc_options *opts)
 static void
 arm_override_options_after_change (void)
 {
-  arm_configure_build_target (&arm_active_target, &global_options,
+  arm_configure_build_target (&arm_active_target,
+			  TREE_TARGET_OPTION (target_option_default_node),
 			  &global_options_set, false);
 
   arm_override_options_after_change_1 (&global_options);
 }
 
 static void
-arm_option_restore (struct gcc_options *opts, struct cl_target_option *ptr)
+arm_option_restore (struct gcc_options *, struct cl_target_option *ptr)
 {
-  arm_configure_build_target (&arm_active_target, opts, &global_options_set,
+  arm_configure_build_target (&arm_active_target, ptr, &global_options_set,
 			  false);
 }
 
@@ -3070,7 +3071,7 @@ static sbitmap isa_quirkbits;
architecture have been specified, but the two are not identical.  */
 void
 arm_configure_build_target (struct arm_build_target *target,
-			struct gcc_options *opts,
+			struct cl_target_option *opts,
 			struct gcc_options *opts_set,
 			bool warn_compatible)
 {
@@ -3306,7 +3307,13 @@ arm_option_override (void)
   gcc_assert (ok);
 }
 
-  arm_configure_build_target (&arm_active_target, &global_options,
+  /* Create the default target_options structure.  We need this early
+ to configure the overall build target.  */
+  target_option_default_node = target_option_current_node
+= build_target_option_node (&global_options);
+
+  arm_configure_build_target (&arm_active_target,
+			  TREE_TARGET_OPTION (target_option_default_node),
 			  &global_options_set, true);
 
 #ifdef SUBTARGET_OVERRIDE_OPTIONS
@@ -3657,14 +3664,12 @@ arm_option_override (void)
   arm_option_check_internal (&global_options);
   arm_option_params_internal ();
 
+  /* Resynchronize the saved target options.  */
+  cl_target_option_save (TREE_TARGET_OPTION (target_option_default_node),
+			 &global_options);
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
 
-  /* Save the initial options in case the user does function specific
- options or #pragma target.  */
-  target_option_default_node = target_option_current_node
-= build_target_option_node (&global_options);
-
   /* Init initial mode for testing.  */
   thumb_flipper = TARGET_THUMB

[PATCH 19/21] [arm] Use ISA feature sets for determining inlinability.

2016-12-15 Thread Richard Earnshaw (lists)

Now that we can construct the build target isa from the cl_target_options
data we can use this to determine inlinability.  This eliminates the
final remaining use of the FPU features field.

* arm.c (arm_can_inline_p): Use ISA features for determining
inlinability.
---
 gcc/config/arm/arm.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index df520e5..1d3bb89 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30227,6 +30227,7 @@ arm_can_inline_p (tree caller, tree callee)
 {
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
+  bool can_inline = true;
 
   struct cl_target_option *caller_opts
 	= TREE_TARGET_OPTION (caller_tree ? caller_tree
@@ -30236,19 +30237,29 @@ arm_can_inline_p (tree caller, tree callee)
 	= TREE_TARGET_OPTION (callee_tree ? callee_tree
 	   : target_option_default_node);
 
-  const struct arm_fpu_desc *caller_fpu
-= &all_fpus[caller_opts->x_arm_fpu_index];
-  const struct arm_fpu_desc *callee_fpu
-= &all_fpus[callee_opts->x_arm_fpu_index];
+  if (callee_opts == caller_opts)
+return true;
 
-  /* Callee's fpu features should be a subset of the caller's.  */
-  if ((caller_fpu->features & callee_fpu->features) != callee_fpu->features)
-return false;
+  /* Callee's ISA features should be a subset of the caller's.  */
+  struct arm_build_target caller_target;
+  struct arm_build_target callee_target;
+  caller_target.isa = sbitmap_alloc (isa_num_bits);
+  callee_target.isa = sbitmap_alloc (isa_num_bits);
+
+  arm_configure_build_target (&caller_target, caller_opts, &global_options_set,
+			  false);
+  arm_configure_build_target (&callee_target, callee_opts, &global_options_set,
+			  false);
+  if (!bitmap_subset_p (callee_target.isa, caller_target.isa))
+can_inline = false;
+
+  sbitmap_free (caller_target.isa);
+  sbitmap_free (callee_target.isa);
 
   /* OK to inline between different modes.
  Function with mode specific instructions, e.g using asm,
  must be explicitly protected with noinline.  */
-  return true;
+  return can_inline;
 }
 
 /* Hook to fix function's alignment affected by target attribute.  */



[PATCH 21/21] [arm] Permit 'auto' in -mfpu.

2016-12-15 Thread Richard Earnshaw (lists)

Now we finally have the infrastructure in place we can now derive
details of the FPU from a CPU entry.  This patch enables this for the
existing cores that already have an explicit FPU in their product names.

* arm-fpus.def: Add CNAME field to all FPU definitions.
* genopt.sh: Use explicit enumeration tags for FPU entries.
* arm-tables.opt: Regenerated.
* arm.opt (mfpu): Provide initial value.
* arm-opts.h (enum fpu_type): Build the enumeration from the list of
available FPUs.  Add 'auto' entry on the end.
* arm.c (arm_configure_build_target): Only do explicit configuration
of the FPU features if the selected FPU is not 'auto'.
(arm_option_override): Adjust initialization of arm_fpu_index.
Emit an error if we have a hard float ABI request, but the processor
does not support floating-point.
(arm_option_print): Handle -mfpu=auto.
(arm_valid_target_attribute_rec): Don't permit fpu=auto in pragmas
or function attributes.
(arm_identify_fpu_from_isa): Handle effective soft-float when
the FPU is automatically detected.
* arm-cores.def (arm1136jf-s): Add feature ISA_FP_DBL.
(arm1176jzf-s): Likewise.
(mpcore): Likewise.
(arm1156t2f-s): Likewise.
---
 gcc/config/arm/arm-cores.def  |  8 +++
 gcc/config/arm/arm-fpus.def   | 48 +++--
 gcc/config/arm/arm-opts.h | 10 
 gcc/config/arm/arm-tables.opt | 46 +++-
 gcc/config/arm/arm.c  | 55
++-
 gcc/config/arm/arm.opt|  2 +-
 gcc/config/arm/genopt.sh  | 15 +++-
 7 files changed, 117 insertions(+), 67 deletions(-)


diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index a232d37..544579c 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -124,13 +124,13 @@ ARM_CORE("arm1026ej-s",	arm1026ejs, arm1026ejs,	TF_LDSCHED,			  5TEJ,	ISA_FEAT(I
 
 /* V6 Architecture Processors */
 ARM_CORE("arm1136j-s",		arm1136js, arm1136js,		TF_LDSCHED,	  6J,	ISA_FEAT(ISA_ARMv6j), 9e)
-ARM_CORE("arm1136jf-s",		arm1136jfs, arm1136jfs,		TF_LDSCHED,	  6J,	ISA_FEAT(ISA_ARMv6j) ISA_FEAT(isa_bit_VFPv2), 9e)
+ARM_CORE("arm1136jf-s",		arm1136jfs, arm1136jfs,		TF_LDSCHED,	  6J,	ISA_FEAT(ISA_ARMv6j) ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL), 9e)
 ARM_CORE("arm1176jz-s",		arm1176jzs, arm1176jzs,		TF_LDSCHED,	  6KZ,	ISA_FEAT(ISA_ARMv6kz), 9e)
-ARM_CORE("arm1176jzf-s",	arm1176jzfs, arm1176jzfs,	TF_LDSCHED,	  6KZ,	ISA_FEAT(ISA_ARMv6kz) ISA_FEAT(isa_bit_VFPv2), 9e)
+ARM_CORE("arm1176jzf-s",	arm1176jzfs, arm1176jzfs,	TF_LDSCHED,	  6KZ,	ISA_FEAT(ISA_ARMv6kz) ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL), 9e)
 ARM_CORE("mpcorenovfp",		mpcorenovfp, mpcorenovfp,	TF_LDSCHED,	  6K,	ISA_FEAT(ISA_ARMv6k), 9e)
-ARM_CORE("mpcore",		mpcore, mpcore,			TF_LDSCHED,	  6K,	ISA_FEAT(ISA_ARMv6k) ISA_FEAT(isa_bit_VFPv2), 9e)
+ARM_CORE("mpcore",		mpcore, mpcore,			TF_LDSCHED,	  6K,	ISA_FEAT(ISA_ARMv6k) ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL), 9e)
 ARM_CORE("arm1156t2-s",		arm1156t2s, arm1156t2s,		TF_LDSCHED,	  6T2,	ISA_FEAT(ISA_ARMv6t2), v6t2)
-ARM_CORE("arm1156t2f-s",	arm1156t2fs, arm1156t2fs,	TF_LDSCHED,	  6T2,	ISA_FEAT(ISA_ARMv6t2) ISA_FEAT(isa_bit_VFPv2), v6t2)
+ARM_CORE("arm1156t2f-s",	arm1156t2fs, arm1156t2fs,	TF_LDSCHED,	  6T2,	ISA_FEAT(ISA_ARMv6t2) ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL), v6t2)
 
 /* V6M Architecture Processors */
 ARM_CORE("cortex-m1",		cortexm1, cortexm1,		TF_LDSCHED,	  6M,	ISA_FEAT(ISA_ARMv6m), v6m)
diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index ae8197d..f07711c 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,31 +19,33 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, ISA)
+  ARM_FPU(NAME, CNAME, ISA)
 
-   The arguments are the fields of struct arm_fpu_desc.
+   NAME is the publicly visible option name.
+   CNAME is a C-compatible variable name substring.
+   ISA is the list of feature bits that this FPU provides.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",			ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL))
-ARM_FPU("vfpv2",		ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL))
-ARM_FPU("vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32))
-ARM_FPU("vfpv3-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32) ISA_FEAT(isa_bit_fp16conv))
-ARM_FPU("vfpv3-d16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL))
-ARM_FPU("vfpv3-d16-fp16",	ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL) ISA_FEAT(isa_bit_fp16conv))
-ARM_FPU("vfpv3xd",		ISA_FEAT(ISA_VFPv3))
-ARM_FPU("vfpv3xd-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(isa_bit_fp16conv))
-ARM_FPU("neon",			ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON))
-ARM_FPU("neon-vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON))
-ARM_FPU("neon-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON) ISA_FEAT(isa_bit_fp16conv))
-ARM_FPU(

[PATCH 20/21] [arm] Remove FEATURES field from FPU descriptions.

2016-12-15 Thread Richard Earnshaw (lists)

Now that everything uses the new ISA features, we can remove the
FEATURES field from the FPU descriptions, along with all the macros
and definitions associated with it.

* arm-fpus.def (ARM_FPU): Remove features field from all definitions.
* arm.h (arm_fpu_feature_set): Delete typedef.
(FPU_FL_NONE): Delete.
(FPU_FL_NEON): Delete.
(FPU_FL_FP16): Delete.
(FPU_FL_CRYPTO): Delete.
(FPU_FL_DBL): Delete.
(FPU_FL_D32): Delete.
(FPU_FL_VFPv2): Delete.
(FPU_FL_VFPv3): Delete.
(FPU_FL_VFPv4): Delete.
(FPU_FL_VFPv5): Delete.
(FPU_FL_AMRv8): Delete.
(FPU_VFPv2): Delete.
(FPU_VFPv3): Delete.
(FPU_VFPv4): Delete.
(FPU_VFPv5): Delete.
(FPU_ARMv8): Delete.
(FPU_DBL): Delete.
(FPU_D32): Delete.
(FPU_NEON): Delete.
(FPU_CRYPTO): Delete.
(FPU_FP16): Delete.
(arm_fpu_desc): Delete features field.
* arm.c (all_fpus): Don't initialize feature field.
---
 gcc/config/arm/arm-fpus.def | 44
++--
 gcc/config/arm/arm.c|  4 ++--
 gcc/config/arm/arm.h| 34 --
 3 files changed, 24 insertions(+), 58 deletions(-)


diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 1be718f..ae8197d 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,31 +19,31 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, ISA, FEATURES)
+  ARM_FPU(NAME, ISA)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",			ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv2 | FPU_DBL)
-ARM_FPU("vfpv2",		ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv2 | FPU_DBL)
-ARM_FPU("vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32),			 FPU_VFPv3 | FPU_D32)
-ARM_FPU("vfpv3-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32) ISA_FEAT(isa_bit_fp16conv), FPU_VFPv3 | FPU_D32 | FPU_FP16)
-ARM_FPU("vfpv3-d16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv3 | FPU_DBL)
-ARM_FPU("vfpv3-d16-fp16",	ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL) ISA_FEAT(isa_bit_fp16conv), FPU_VFPv3 | FPU_DBL | FPU_FP16)
-ARM_FPU("vfpv3xd",		ISA_FEAT(ISA_VFPv3),		 FPU_VFPv3)
-ARM_FPU("vfpv3xd-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(isa_bit_fp16conv),			 FPU_VFPv3 | FPU_FP16)
-ARM_FPU("neon",			ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON), FPU_VFPv3 | FPU_NEON)
-ARM_FPU("neon-vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON), FPU_VFPv3 | FPU_NEON)
-ARM_FPU("neon-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON) ISA_FEAT(isa_bit_fp16conv),   FPU_VFPv3 | FPU_NEON | FPU_FP16)
-ARM_FPU("vfpv4",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_FP_D32),			 FPU_VFPv4 | FPU_D32 | FPU_FP16)
-ARM_FPU("neon-vfpv4",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_NEON), FPU_VFPv4 | FPU_NEON | FPU_FP16)
-ARM_FPU("vfpv4-d16",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv4 | FPU_DBL | FPU_FP16)
-ARM_FPU("fpv4-sp-d16",		ISA_FEAT(ISA_VFPv4),		 FPU_VFPv4 | FPU_FP16)
-ARM_FPU("fpv5-sp-d16",		ISA_FEAT(ISA_FPv5),		 FPU_VFPv5 | FPU_FP16)
-ARM_FPU("fpv5-d16",		ISA_FEAT(ISA_FPv5) ISA_FEAT(ISA_FP_DBL),			 FPU_VFPv5 | FPU_DBL | FPU_FP16)
-ARM_FPU("fp-armv8",		ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_FP_D32),			 FPU_ARMv8 | FPU_D32 | FPU_FP16)
-ARM_FPU("neon-fp-armv8",	ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_NEON),			 FPU_ARMv8 | FPU_NEON | FPU_FP16)
-ARM_FPU("crypto-neon-fp-armv8", ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_CRYPTO),			 FPU_ARMv8 | FPU_CRYPTO | FPU_FP16)
+ARM_FPU("vfp",			ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL))
+ARM_FPU("vfpv2",		ISA_FEAT(ISA_VFPv2) ISA_FEAT(ISA_FP_DBL))
+ARM_FPU("vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32))
+ARM_FPU("vfpv3-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_D32) ISA_FEAT(isa_bit_fp16conv))
+ARM_FPU("vfpv3-d16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL))
+ARM_FPU("vfpv3-d16-fp16",	ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_FP_DBL) ISA_FEAT(isa_bit_fp16conv))
+ARM_FPU("vfpv3xd",		ISA_FEAT(ISA_VFPv3))
+ARM_FPU("vfpv3xd-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(isa_bit_fp16conv))
+ARM_FPU("neon",			ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON))
+ARM_FPU("neon-vfpv3",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON))
+ARM_FPU("neon-fp16",		ISA_FEAT(ISA_VFPv3) ISA_FEAT(ISA_NEON) ISA_FEAT(isa_bit_fp16conv))
+ARM_FPU("vfpv4",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_FP_D32))
+ARM_FPU("neon-vfpv4",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_NEON))
+ARM_FPU("vfpv4-d16",		ISA_FEAT(ISA_VFPv4) ISA_FEAT(ISA_FP_DBL))
+ARM_FPU("fpv4-sp-d16",		ISA_FEAT(ISA_VFPv4))
+ARM_FPU("fpv5-sp-d16",		ISA_FEAT(ISA_FPv5))
+ARM_FPU("fpv5-d16",		ISA_FEAT(ISA_FPv5) ISA_FEAT(ISA_FP_DBL))
+ARM_FPU("fp-armv8",		ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_FP_D32))
+ARM_FPU("neon-fp-armv8",	ISA_FEAT(ISA_FP_ARMv8) ISA_FEAT(ISA_NEON))
+ARM_FPU("crypto-neon-fp-a

[PATCH 12/21] [arm] Eliminate vfp_reg_type

2016-12-15 Thread Richard Earnshaw (lists)

Remove the VFP_REGS field by converting its meanings into flag
attributes.  The new flag attributes build on each other describing
increasing capabilities.  This allows us to do a better job when
inlining functions with differing requiremetns on the fpu environment:
we can now inline A into B if B has at least the same register set
properties as B (previously we required identical register set
properties).

* arm.h (vfp_reg_type): Delete.
(TARGET_FPU_REGS): Delete.
(arm_fpu_desc): Delete regs field.
(FPU_FL_NONE, FPU_FL_NEON, FPU_FL_FP16, FPU_FL_CRYPTO): Use unsigned
values.
(FPU_FL_DBL, FPU_FL_D32): Define.
(TARGET_VFPD32): Use feature test.
(TARGET_VFP_SINGLE): Likewise.
(TARGET_VFP_DOUBLE): Likewise.
* arm-fpus.def: Update all entries for new feature bits.
* arm.c (all_fpus): Update initializer macro.
(arm_can_inline_p): Remove test on fpu regs.
---
 gcc/config/arm/arm-fpus.def | 44
++--
 gcc/config/arm/arm.c|  8 ++--
 gcc/config/arm/arm.h| 26 +-
 3 files changed, 33 insertions(+), 45 deletions(-)


diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 04b2ef1..eca03bb 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,31 +19,31 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, REV, VFP_REGS, FEATURES)
+  ARM_FPU(NAME, REV, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",		2, VFP_REG_D16, FPU_FL_NONE)
-ARM_FPU("vfpv2",	2, VFP_REG_D16, FPU_FL_NONE)
-ARM_FPU("vfpv3",	3, VFP_REG_D32, FPU_FL_NONE)
-ARM_FPU("vfpv3-fp16",	3, VFP_REG_D32, FPU_FL_FP16)
-ARM_FPU("vfpv3-d16",	3, VFP_REG_D16, FPU_FL_NONE)
-ARM_FPU("vfpv3-d16-fp16", 3, VFP_REG_D16, FPU_FL_FP16)
-ARM_FPU("vfpv3xd",	3, VFP_REG_SINGLE, FPU_FL_NONE)
-ARM_FPU("vfpv3xd-fp16",	3, VFP_REG_SINGLE, FPU_FL_FP16)
-ARM_FPU("neon",		3, VFP_REG_D32, FPU_FL_NEON)
-ARM_FPU("neon-vfpv3",	3, VFP_REG_D32, FPU_FL_NEON)
-ARM_FPU("neon-fp16",	3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
-ARM_FPU("vfpv4",	4, VFP_REG_D32, FPU_FL_FP16)
-ARM_FPU("vfpv4-d16",	4, VFP_REG_D16, FPU_FL_FP16)
-ARM_FPU("fpv4-sp-d16",	4, VFP_REG_SINGLE, FPU_FL_FP16)
-ARM_FPU("fpv5-sp-d16",	5, VFP_REG_SINGLE, FPU_FL_FP16)
-ARM_FPU("fpv5-d16",	5, VFP_REG_D16, FPU_FL_FP16)
-ARM_FPU("neon-vfpv4",	4, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
-ARM_FPU("fp-armv8",	8, VFP_REG_D32, FPU_FL_FP16)
-ARM_FPU("neon-fp-armv8", 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
-ARM_FPU("crypto-neon-fp-armv8", 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
+ARM_FPU("vfp",			2, FPU_FL_DBL)
+ARM_FPU("vfpv2",		2, FPU_FL_DBL)
+ARM_FPU("vfpv3",		3, FPU_FL_D32 | FPU_FL_DBL)
+ARM_FPU("vfpv3-fp16",		3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_FP16)
+ARM_FPU("vfpv3-d16",		3, FPU_FL_DBL)
+ARM_FPU("vfpv3-d16-fp16", 	3, FPU_FL_DBL | FPU_FL_FP16)
+ARM_FPU("vfpv3xd",		3, FPU_FL_NONE)
+ARM_FPU("vfpv3xd-fp16",		3, FPU_FL_FP16)
+ARM_FPU("neon",			3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON)
+ARM_FPU("neon-vfpv3",		3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON)
+ARM_FPU("neon-fp16",		3, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("vfpv4",		4, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_FP16)
+ARM_FPU("vfpv4-d16",		4, FPU_FL_DBL | FPU_FL_FP16)
+ARM_FPU("fpv4-sp-d16",		4, FPU_FL_FP16)
+ARM_FPU("fpv5-sp-d16",		5, FPU_FL_FP16)
+ARM_FPU("fpv5-d16",		5, FPU_FL_DBL | FPU_FL_FP16)
+ARM_FPU("neon-vfpv4",		4, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("fp-armv8",		8, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_FP16)
+ARM_FPU("neon-fp-armv8", 	8, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("crypto-neon-fp-armv8", 8, FPU_FL_D32 | FPU_FL_DBL | FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
-ARM_FPU("vfp3",		3, VFP_REG_D32, FPU_FL_NONE)
+ARM_FPU("vfp3",			3, FPU_FL_D32 | FPU_FL_DBL)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 822ef14..820a6ab 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2323,8 +2323,8 @@ char arm_arch_name[] = "__ARM_ARCH_PROFILE__";
 
 const struct arm_fpu_desc all_fpus[] =
 {
-#define ARM_FPU(NAME, REV, VFP_REGS, FEATURES) \
-  { NAME, REV, VFP_REGS, FEATURES },
+#define ARM_FPU(NAME, REV, FEATURES) \
+  { NAME, REV, FEATURES },
 #include "arm-fpus.def"
 #undef ARM_FPU
 };
@@ -30218,10 +30218,6 @@ arm_can_inline_p (tree caller, tree callee)
   if ((caller_fpu->features & callee_fpu->features) != callee_fpu->features)
 return false;
 
-  /* Need same FPU regs.  */
-  if (callee_fpu->regs != callee_fpu->regs)
-return false;
-
   /* OK to inline between different modes.
  Function with mode specific instructions, e.g using asm,
  must be explicitly protected with noinline.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 7690e70..a

Re: [PATCH] improve string find algorithm

2016-12-15 Thread Aditya K

Ping.




From: Aditya Kumar 
Sent: Wednesday, December 7, 2016 11:46 AM
To: libstd...@gcc.gnu.org
Cc: gcc-patches@gcc.gnu.org; hiradi...@msn.com; Aditya Kumar
Subject: [PATCH] improve string find algorithm
    
Here is an improved version of basic_string::find. The idea is to
split the string find in two parts:
1. search for the first match by using traits_type::find (this gets converted 
to memchr for x86)
2. see if there is a match (this gets converted to memcmp for x86)

Passes bootstrap on x86-64.

The patch results in good improvements on a synthetic test case I wrote using 
the google-benchmark.
following are the results.


Branch: master without patch
$ ./bin/string.libcxx.out
Run on (24 X 1899.12 MHz CPU s)
2016-12-06 16:41:55
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 
be noisy and will incur extra overhead.
Benchmark   Time   CPU Iterations
-
BM_StringFindNoMatch/10 8 ns  8 ns   81880927
BM_StringFindNoMatch/64    52 ns 52 ns   13235018
BM_StringFindNoMatch/512  355 ns    355 ns    1962488
BM_StringFindNoMatch/4k  2769 ns   2772 ns 249090
BM_StringFindNoMatch/32k    22598 ns  22619 ns  30984
BM_StringFindNoMatch/128k   89745 ns  89830 ns   7996
BM_StringFindAllMatch/1 7 ns  7 ns  102893835
BM_StringFindAllMatch/8 9 ns  9 ns   75403364
BM_StringFindAllMatch/64   12 ns 12 ns   60766893
BM_StringFindAllMatch/512  31 ns 31 ns   23163999
BM_StringFindAllMatch/4k  141 ns    141 ns    4980386
BM_StringFindAllMatch/32k    1402 ns   1403 ns 483581
BM_StringFindAllMatch/128k   5604 ns   5609 ns 126123
BM_StringFindMatch1/1   44430 ns  44473 ns  15804
BM_StringFindMatch1/8   44315 ns  44357 ns  15741
BM_StringFindMatch1/64  44689 ns  44731 ns  15712
BM_StringFindMatch1/512 44247 ns  44290 ns  15724
BM_StringFindMatch1/4k  45010 ns  45053 ns  15678
BM_StringFindMatch1/32k 45717 ns  45761 ns  15278
BM_StringFindMatch2/1   44307 ns  44349 ns  15730
BM_StringFindMatch2/8   44631 ns  44674 ns  15721
BM_StringFindMatch2/64  44300 ns  44342 ns  15750
BM_StringFindMatch2/512 44239 ns  44281 ns  15713
BM_StringFindMatch2/4k  44886 ns  44928 ns  15787

Branch: master with patch
$ ./bin/string.libcxx.out
Run on (24 X 2892.28 MHz CPU s)
2016-12-06 18:51:38
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 
be noisy and will incur extra overhead.
Benchmark   Time   CPU Iterations
-
BM_StringFindNoMatch/10    11 ns 11 ns   63049677
BM_StringFindNoMatch/64    12 ns 12 ns   57259381
BM_StringFindNoMatch/512   27 ns 27 ns   25495432
BM_StringFindNoMatch/4k   130 ns    130 ns    5301301
BM_StringFindNoMatch/32k  858 ns    859 ns 824048
BM_StringFindNoMatch/128k    4091 ns   4095 ns 171493
BM_StringFindAllMatch/1    14 ns 14 ns   53023977
BM_StringFindAllMatch/8    14 ns 14 ns   51516536
BM_StringFindAllMatch/64   17 ns 17 ns   40992668
BM_StringFindAllMatch/512  37 ns 37 ns   18503267
BM_StringFindAllMatch/4k  153 ns    153 ns    4494458
BM_StringFindAllMatch/32k    1460 ns   1461 ns 483380
BM_StringFindAllMatch/128k   5801 ns   5806 ns 120680
BM_StringFindMatch1/1    2062 ns   2064 ns 333144
BM_StringFindMatch1/8    2057 ns   2059 ns 335496
BM_StringFindMatch1/64   2083 ns   2085 ns 341469
BM_StringFindMatch1/512  2134 ns   2136 ns 336880
BM_StringFindMatch1/4k   2309 ns   2312 ns 308745
BM_StringFindMatch1/32k  3413 ns   3417 ns 208206
BM_StringFindMatch2/1    2053 ns   2055 ns 341523
BM_StringFindMatch2/8    2061 ns   2063 ns 343999
BM_StringFindMatch2/64   2075 ns   2077 ns 338479
BM_StringFindMatch2/512  2102 ns   2104 ns 332276
BM_StringFindMatch2/4k   2286 ns   2288 ns 300416
BM_StringFindMatch2/32k  3385 ns   3388 ns 204158


ChangeLog:

2016-12-07  Aditya Kumar 
   * include/bits/basic_string.tcc(find(const _CharT* __s, size_type
   __pos, size_type __n) const)): Improve the algorithm


---
 libstdc++-v3/include/bits/basic_string.tcc | 31 ++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.t

[PATCH] builtin expansion of strncmp for rs6000

2016-12-15 Thread Aaron Sawdey
This patch adds a cmpstrnsi pattern for rs6000 target to provide
builtin expansion of strncmp(). Perf tests on a power8 system show that
it is 3-10x faster than the glibc strncmp on RHEL7 for lengths under 64
bytes.

Bootstrap/regtest has passed on powerpc64le, in progress for powerpc64,
ok for trunk if no new regressions?

2016-11-17  Aaron Sawdey  

* config/rs6000/rs6000-protos.h (expand_strn_compare): Declare.
* config/rs6000/rs6000.md (UNSPEC_CMPB): New unspec.
(cmpb3): pattern for generating cmpb.
(cmpstrnsi): pattern to expand strncmp ().
* config/rs6000/rs6000.opt (mstring-compare-inline-limit): Add a new
target option for controlling how much code inline expansion of
strncmp() will be allowed to generate.
* config/rs6000/rs6000.c (expand_strncmp_align_check): generate code
for runtime page crossing check of strncmp () args.
(expand_strn_compare): Function to do builtin expansion of strncmp ().


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h	(revision 243658)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,6 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
+extern bool expand_strn_compare (rtx[]);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 243658)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19382,7 +19382,387 @@
   return true;
 }
 
+/* Generate alignment check and branch code to set up for
+   strncmp when we don't have DI alignment.
+   STRNCMP_LABEL is the label to branch if there is a page crossing.
+   SRC is the string pointer to be examined.
+   BYTES is the max number of bytes to compare.  */
+static void
+expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes)
+{
+  rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label);
+  rtx src_check = copy_addr_to_reg (XEXP (src, 0));
+  if (GET_MODE (src_check) == SImode)
+emit_insn (gen_andsi3_mask (src_check, src_check, GEN_INT(0xfff)));
+  else
+emit_insn (gen_anddi3_mask (src_check, src_check, GEN_INT(0xfff)));
+  rtx cond = gen_reg_rtx (CCmode);
+  emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check,
+	 GEN_INT (4096-bytes)));
 
+  rtx cmp_rtx = gen_rtx_LT (VOIDmode, cond, const0_rtx);
+
+  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
+ pc_rtx, lab_ref);
+  rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+  JUMP_LABEL (j) = strncmp_label;
+  LABEL_NUSES (strncmp_label) += 1;
+}
+
+/* Expand a string compare operation with length, and return 
+   true if successful. Return false if we should let the 
+   compiler generate normal code, probably a strncmp call.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the first source.
+   OPERANDS[2] is the second source.
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the alignment in bytes.  */
+bool
+expand_strn_compare (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];
+  HOST_WIDE_INT cmp_bytes = 0;
+  rtx src1 = orig_src1;
+  rtx src2 = orig_src2;
+
+  /* If this is not a fixed size compare, just call strncmp.  */
+  if (!CONST_INT_P (bytes_rtx))
+return false;
+
+  /* This must be a fixed size alignment.  */
+  if (!CONST_INT_P (align_rtx))
+return false;
+
+  int base_align = INTVAL (align_rtx);
+  int align1 = MEM_ALIGN (orig_src1) / BITS_PER_UNIT;
+  int align2 = MEM_ALIGN (orig_src2) / BITS_PER_UNIT;
+
+  /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff.  */
+  if (SLOW_UNALIGNED_ACCESS (word_mode, align1)
+  || SLOW_UNALIGNED_ACCESS (word_mode, align2))
+return false;
+
+  gcc_assert (GET_MODE (target) == SImode);
+
+  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
+
+  /* If we have an LE target without ldbrx and word_mode is DImode,
+ then we must avoid using word_mode.  */
+  int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
+		   && word_mode == DImode);
+
+  int word_mode_size = GET_MODE_SIZE (word_mode);
+
+  int offset = 0;
+  machine_mode load_mode =
+select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
+  int load_mode_size = GET_MODE_SIZE (load_mode);
+
+  /* We don't want to generate too much code.  */
+  if (ROUND_UP (bytes, load_mode_size) / load_mode_size
+  > rs6000_string_compare_inline_limit)
+return false;
+
+  rtx result_r

Re: [PATCH][ARM] PR target/71436: Restrict *load_multiple pattern till after LRA

2016-12-15 Thread Kyrill Tkachov


On 15/12/16 09:55, Richard Earnshaw (lists) wrote:

On 30/11/16 16:47, Kyrill Tkachov wrote:

Hi all,

In this awkward ICE we have a *load_multiple pattern that is being
transformed in reload from:
(insn 55 67 151 3 (parallel [
 (set (reg:SI 0 r0)
 (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
 (set (reg:SI 158 [ c+4 ])
 (mem/u/c:SI (plus:SI (reg/f:SI 147)
 (const_int 4 [0x4])) [2 c+4 S4 A32]))
 ]) arm-crash.c:25 393 {*load_multiple}
  (expr_list:REG_UNUSED (reg:SI 0 r0)
 (nil)))


into the invalid:
(insn 55 67 70 3 (parallel [
 (set (reg:SI 0 r0)
 (mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
 (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
 (const_int -4 [0xfffc])) [4 %sfp+-12
S4 A32])
 (mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
 (const_int 4 [0x4])) [2 c+4 S4 A32]))
 ]) arm-crash.c:25 393 {*load_multiple}
  (nil))

The operands of *load_multiple are not validated through constraints
like LRA is used to, but rather through
a match_parallel predicate which ends up calling ldm_stm_operation_p to
validate the multiple sets.
But this means that LRA cannot reason about the constraints properly.
This two-regiseter load should not have used *load_multiple anyway, it
should have used *ldm2_ from ldmstm.md
and indeed it did until the loop2_invariant pass which copied the ldm2_
pattern:
(insn 27 23 28 4 (parallel [
 (set (reg:SI 0 r0)
 (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
 (set (reg:SI 1 r1)
 (mem/u/c:SI (plus:SI (reg/f:SI 147)
 (const_int 4 [0x4])) [2 c+4 S4 A32]))
 ]) "ldm.c":25 385 {*ldm2_}
  (nil))

into:
(insn 55 19 67 3 (parallel [
 (set (reg:SI 0 r0)
 (mem/u/c:SI (reg/f:SI 147) [2 c+0 S4 A32]))
 (set (reg:SI 158)
 (mem/u/c:SI (plus:SI (reg/f:SI 147)
 (const_int 4 [0x4])) [2 c+4 S4 A32]))
 ]) "ldm.c":25 404 {*load_multiple}
  (expr_list:REG_UNUSED (reg:SI 0 r0)
 (nil)))

Note that it now got recognised as load_multiple because the second
register is not a hard register but the pseudo 158.
In any case, the solution suggested in the PR (and I agree with it) is
to restrict *load_multiple to after reload.
The similar pattern *load_multiple_with_writeback also has a similar
condition and the comment above *load_multiple says that
it's used to generate epilogues, which is done after reload anyway. For
pre-reload load-multiples the patterns in ldmstm.md
should do just fine.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?


I don't think this is right.  Firstly, these patterns look to me like
the ones used for memcpy expansion, so not recognizing them could lead
to compiler aborts.


I think the other patterns in ldmstm.md would catch these anyway but...



Secondly, the bug is when we generate

  (insn 55 67 70 3 (parallel [
  (set (reg:SI 0 r0)
  (mem/u/c:SI (reg/f:SI 5 r5 [147]) [2 c+0 S4 A32]))
  (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
  (const_int -4 [0xfffc])) [4 %sfp+-12
  S4 A32])
  (mem/u/c:SI (plus:SI (reg/f:SI 5 r5 [147])
  (const_int 4 [0x4])) [2 c+4 S4 A32]))
  ]) arm-crash.c:25 393 {*load_multiple}
   (nil))

These patterns are supposed to enforce that the load (store) target
register is a hard register that is higher numbered than the hard
register in the memory slot that precedes it (thus satisfying the
constraints for a normal ldm/stm instruction.

The real question is why did ldm_stm_operation_p permit this modified
pattern through, or was the pattern validation bypassed incorrectly by
the loop2 invariant code when it copied the insn and made changes?


... ldm_stm_operation_p doesn't check that the registers are hard registers,
which is an implicit requirement since it validates their ascending order.
So the safe easy fix is to require hard registers.

This patch does that.
Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk?

Thanks,
Kyrill

2016-12-15  Kyrylo Tkachov  

PR target/71436
* config/arm/arm.c (ldm_stm_operation_p): Check that last register
in the list is a hard reg.

2016-12-15  Kyrylo Tkachov  

PR target/71436
* gcc.c-torture/compile/pr71436.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 437da6fe3d34978e7a3a72f7ec39dc76a54d6408..b6c5fdf42ee359a0c0b0fab8de4d374ff39cd35b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12709,6 +12709,13 @@ ldm_stm_operation_p (rtx op, bool load, machine_mode mode,
 addr_reg_in_reglist = true;
 }
 
+  /* The ascending register number requirement only makes sense when dealing
+ with hard reg

Re: [Patch Doc] Update documentation for __fp16 type

2016-12-15 Thread Sandra Loosemore

On 12/15/2016 03:27 AM, Jakub Jelinek wrote:

On Thu, Dec 15, 2016 at 11:23:14AM +0100, Andreas Schwab wrote:

On Dez 15 2016, Jakub Jelinek  wrote:


So, shall we change also the first 3?


Yes, I'd think so.


So here is it in patch form.  Is this ok for trunk?

2016-12-15  Jakub Jelinek  

* doc/extend.texi: Clean up @xref{...} uses.
* doc/invoke.texi: Likewise.

--- gcc/doc/extend.texi.jj  2016-12-14 20:28:12.0 +0100
+++ gcc/doc/extend.texi 2016-12-15 11:26:07.867736292 +0100
@@ -1057,7 +1057,7 @@ implements conversions between @code{__f
  calls.

  It is recommended that portable code use the @code{_Float16} type defined
-by ISO/IEC TS 18661-3:2015 (@xref{Floating Types}).
+by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.

  @node Decimal Float
  @section Decimal Floating Types
@@ -2089,7 +2089,7 @@ union foo f = @{ .d = 4 @};
  converts 4 to a @code{double} to store it in the union using
  the second element.  By contrast, casting 4 to type @code{union foo}
  stores it into the union as the integer @code{i}, since it is
-an integer.  (@xref{Cast to Union}.)
+an integer.  @xref{Cast to Union}.

  You can combine this technique of naming elements with ordinary C
  initialization of successive elements.  Each initializer element that
@@ -2181,7 +2181,7 @@ specified is a union type.  You can spec
  @code{union} keyword or with a @code{typedef} name that refers to
  a union.  A cast to a union actually creates a compound literal and
  yields an lvalue, not an rvalue like true casts do.
-(@xref{Compound Literals}.)
+@xref{Compound Literals}.

  The types that may be cast to the union type are those of the members
  of the union.  Thus, given the following union and variables:
--- gcc/doc/invoke.texi.jj  2016-12-15 10:26:15.0 +0100
+++ gcc/doc/invoke.texi 2016-12-15 11:25:19.226386092 +0100
@@ -7262,8 +7262,8 @@ release to an another.
  @opindex fno-keep-inline-dllexport
  This is a more fine-grained version of @option{-fkeep-inline-functions},
  which applies only to functions that are declared using the @code{dllexport}
-attribute or declspec (@xref{Function Attributes,,Declaring Attributes of
-Functions}.)
+attribute or declspec.  @xref{Function Attributes,,Declaring Attributes of
+Functions}.

  @item -fkeep-inline-functions
  @opindex fkeep-inline-functions



This is OK, but FYI it would have been simpler just to do

s/@xref/@pxref/

in the one instance that was causing an diagnostic.  Sorry I missed that 
in the original patch review.  :-(


-Sandra



[PATCH] Optimiza aggregate a = b = c = {} (PR c/78408, take 2)

2016-12-15 Thread Jakub Jelinek
On Wed, Dec 14, 2016 at 01:27:56PM +0100, Richard Biener wrote:
> Ah, ok.  But then the size of the memset shouldn't be compared
> against the get_ref_base_and_extend size from src2 but to the
> size of the access of SRC/DEST (clearly looking at the "size" of
> the ADDR_EXPR argument is somewhat bogus).
> And as you compare src and dest
> with operand_equal_p there is no need to reject ssize != max_size
> either (you'd of course not see memset (&a[i].x, 0, 400) because
> &a[i].x is not invariant, you'd need to lookup the SSA def for a pointer
> here).
> 
> You can get at the size of an actual access simpler than by
> the full-blown get_ref_base_and_extent (just outline a
> get_ref_size () from the head part of it.
> 
> I still think that using get_addr_base_and_unit_offset on the
> address is better and passing decomposed (base, offset, size)
> triplets to optimize_memcpy would also save you the
> MEM[(char * {ref-all})&b] = MEM[(char * {ref-all})&a]; special-case.

Here is an updated patch that does that (i.e. always work with base, offset,
length triplets) and drops the alias check for dest vs. src overlap.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-12-15  Jakub Jelinek  

PR c/78408
* tree-ssa-ccp.c: Include tree-dfa.h.
(optimize_memcpy): New function.
(pass_fold_builtins::execute): Use it.  Remove useless conditional
break after BUILT_IN_VA_*.

* gcc.dg/pr78408-1.c: New test.
* gcc.dg/pr78408-2.c: New test.

--- gcc/tree-ssa-ccp.c.jj   2016-11-28 16:19:11.0 +0100
+++ gcc/tree-ssa-ccp.c  2016-12-15 15:09:03.180993745 +0100
@@ -143,6 +143,7 @@ along with GCC; see the file COPYING3.
 #include "stor-layout.h"
 #include "optabs-query.h"
 #include "tree-ssa-ccp.h"
+#include "tree-dfa.h"
 
 /* Possible lattice values.  */
 typedef enum
@@ -2933,6 +2934,113 @@ optimize_atomic_bit_test_and (gimple_stm
   release_ssa_name (lhs);
 }
 
+/* Optimize
+   a = {};
+   b = a;
+   into
+   a = {};
+   b = {};
+   Similarly for memset (&a, ..., sizeof (a)); instead of a = {};
+   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;  */
+
+static void
+optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, tree src, tree len)
+{
+  gimple *stmt = gsi_stmt (*gsip);
+  if (gimple_has_volatile_ops (stmt)
+  || TREE_THIS_VOLATILE (dest)
+  || TREE_THIS_VOLATILE (src))
+return;
+
+  tree vuse = gimple_vuse (stmt);
+  if (vuse == NULL)
+return;
+
+  gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
+  tree src2 = NULL_TREE, len2 = NULL_TREE;
+  HOST_WIDE_INT offset, offset2;
+  tree val = integer_zero_node;
+  if (gimple_store_p (defstmt)
+  && gimple_assign_single_p (defstmt)
+  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (defstmt)) == 0
+  && !gimple_clobber_p (defstmt))
+src2 = gimple_assign_lhs (defstmt);
+  else if (gimple_call_builtin_p (defstmt, BUILT_IN_MEMSET)
+  && TREE_CODE (gimple_call_arg (defstmt, 0)) == ADDR_EXPR
+  && TREE_CODE (gimple_call_arg (defstmt, 1)) == INTEGER_CST)
+{
+  src2 = TREE_OPERAND (gimple_call_arg (defstmt, 0), 0);
+  len2 = gimple_call_arg (defstmt, 2);
+  val = gimple_call_arg (defstmt, 1);
+  if (!integer_zerop (val) && is_gimple_assign (stmt))
+   src2 = NULL_TREE;
+}
+
+  if (src2 == NULL_TREE)
+return;
+
+  if (len == NULL_TREE)
+len = (TREE_CODE (src) == COMPONENT_REF
+  ? DECL_SIZE_UNIT (TREE_OPERAND (src, 1))
+  : TYPE_SIZE_UNIT (TREE_TYPE (src)));
+  if (len2 == NULL_TREE)
+len2 = (TREE_CODE (src2) == COMPONENT_REF
+   ? DECL_SIZE_UNIT (TREE_OPERAND (src2, 1))
+   : TYPE_SIZE_UNIT (TREE_TYPE (src2)));
+  if (len == NULL_TREE
+  || TREE_CODE (len) != INTEGER_CST
+  || len2 == NULL_TREE
+  || TREE_CODE (len2) != INTEGER_CST)
+return;
+
+  src = get_addr_base_and_unit_offset (src, &offset);
+  src2 = get_addr_base_and_unit_offset (src2, &offset2);
+  if (src == NULL_TREE
+  || src2 == NULL_TREE
+  || offset < offset2)
+return;
+
+  if (!operand_equal_p (src, src2, 0))
+return;
+
+  /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
+ Make sure that
+ [ src + offset, src + offset + len - 1 ] is a subset of that.  */
+  if (wi::to_widest (len) + (offset - offset2) > wi::to_widest (len2))
+return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Simplified\n  ");
+  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+  fprintf (dump_file, "after previous\n  ");
+  print_gimple_stmt (dump_file, defstmt, 0, dump_flags);
+}
+
+  if (is_gimple_assign (stmt))
+{
+  tree ctor = build_constructor (TREE_TYPE (dest), NULL);
+  gimple_assign_set_rhs_from_tree (gsip, ctor);
+  update_stmt (stmt);
+}
+  else
+{
+  gcall *call = as_a  (stmt);
+  tree fndecl = builtin_decl_implicit (BUILT_IN

[C++ PATCH] P0490R0 GB 20: decomposition declaration should commit to tuple interpretation early (take 2)

2016-12-15 Thread Jakub Jelinek
On Thu, Dec 15, 2016 at 07:40:58AM -0500, Nathan Sidwell wrote:
> On 12/15/2016 07:26 AM, Jakub Jelinek wrote:
> 
> > I don't think so.  complete_type (error_mark_node) returns error_mark_node,
> > and COMPLETE_TYPE_P (error_mark_node) is invalid (should fail TYPE_CHECK in
> > checking compiler).
> > 
> > I can write it as
> >   inst = complete_type (inst);
> >   if (inst == error_mark_node || !COMPLETE_TYPE_P (inst))
> > return NULL_TREE;
> 
> that's probably better, because complete_type can return error_mark_node if
> 'something goes horribly wrong'

Ok, here is the updated patch (and also moving the diagnostics to
get_tuple_size caller as written in the other mail as an option),
bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-12-15  Jakub Jelinek  

P0490R0 GB 20: decomposition declaration should commit to tuple
interpretation early
* decl.c (get_tuple_size): Make static.  If inst is error_mark_node
or non-complete type, return NULL_TREE, otherwise if
lookup_qualified_name fails or doesn't fold into INTEGER_CST, return
error_mark_node.
(get_tuple_element_type, get_tuple_decomp_init): Make static.
(cp_finish_decomp): Pass LOC to get_tuple_size.  If it returns
error_mark_node, complain and fail.

* g++.dg/cpp1z/decomp10.C (f1): Adjust expected diagnostics.

--- gcc/cp/decl.c.jj2016-12-08 23:17:57.256167066 +0100
+++ gcc/cp/decl.c   2016-12-15 13:48:48.087424991 +0100
@@ -7259,7 +7259,7 @@ find_decomp_class_base (location_t loc,
 
 /* Return std::tuple_size::value.  */
 
-tree
+static tree
 get_tuple_size (tree type)
 {
   tree args = make_tree_vec (1);
@@ -7268,6 +7268,9 @@ get_tuple_size (tree type)
 /*in_decl*/NULL_TREE,
 /*context*/std_node,
 /*entering_scope*/false, tf_none);
+  inst = complete_type (inst);
+  if (inst == error_mark_node || !COMPLETE_TYPE_P (inst))
+return NULL_TREE;
   tree val = lookup_qualified_name (inst, get_identifier ("value"),
/*type*/false, /*complain*/false);
   if (TREE_CODE (val) == VAR_DECL || TREE_CODE (val) == CONST_DECL)
@@ -7275,12 +7278,12 @@ get_tuple_size (tree type)
   if (TREE_CODE (val) == INTEGER_CST)
 return val;
   else
-return NULL_TREE;
+return error_mark_node;
 }
 
 /* Return std::tuple_element::type.  */
 
-tree
+static tree
 get_tuple_element_type (tree type, unsigned i)
 {
   tree args = make_tree_vec (2);
@@ -7297,7 +7302,7 @@ get_tuple_element_type (tree type, unsig
 
 /* Return e.get() or get(e).  */
 
-tree
+static tree
 get_tuple_decomp_init (tree decl, unsigned i)
 {
   tree get_id = get_identifier ("get");
@@ -7342,6 +7347,7 @@ store_decomp_type (tree v, tree t)
 decomp_type_table = hash_map::create_ggc (13);
   decomp_type_table->put (v, t);
 }
+
 tree
 lookup_decomp_type (tree v)
 {
@@ -7502,6 +7508,12 @@ cp_finish_decomp (tree decl, tree first,
 }
   else if (tree tsize = get_tuple_size (type))
 {
+  if (tsize == error_mark_node)
+   {
+ error_at (loc, "%::value%> is not an integral "
+"constant expression", type);
+ goto error_out;
+   }
   eltscnt = tree_to_uhwi (tsize);
   if (count != eltscnt)
goto cnt_mismatch;
--- gcc/testsuite/g++.dg/cpp1z/decomp10.C.jj2016-12-08 23:17:57.477164261 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp10.C   2016-12-15 13:47:47.288240503 
+0100
@@ -7,7 +7,7 @@ namespace std {
 
 struct A1 { int i,j; } a1;
 template<> struct std::tuple_size {  };
-void f1() { auto [ x ] = a1; } // { dg-error "decomposes into 2" }
+void f1() { auto [ x ] = a1; } // { dg-error "is not an integral constant 
expression" }
 
 struct A2 { int i,j; } a2;
 template<> struct std::tuple_size { enum { value = 5 }; };


Jakub


Re: [PATCH] Add AVX512 k-mask intrinsics

2016-12-15 Thread Uros Bizjak
On Thu, Dec 15, 2016 at 2:31 PM, Andrew Senkevich
 wrote:
> 2016-12-14 22:55 GMT+03:00 Uros Bizjak :
>> On Wed, Dec 14, 2016 at 8:04 PM, Andrew Senkevich
>>  wrote:
>>
>>> here is the second part of k-mask intrinsics, is it Ok?
>>
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -1309,12 +1309,30 @@
>>>  ;; Mask variant shift mnemonics
>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>
>>> +(define_expand "kmovb"
>>> +  [(set (match_operand:QI 0 "nonimmediate_operand")
>>> + (match_operand:QI 1 "nonimmediate_operand"))]
>>> +  "TARGET_AVX512DQ
>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>> +
>>>  (define_expand "kmovw"
>>>[(set (match_operand:HI 0 "nonimmediate_operand")
>>>   (match_operand:HI 1 "nonimmediate_operand"))]
>>>"TARGET_AVX512F
>>> && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>
>>> +(define_expand "kmovd"
>>> +  [(set (match_operand:SI 0 "nonimmediate_operand")
>>> + (match_operand:SI 1 "nonimmediate_operand"))]
>>> +  "TARGET_AVX512BW
>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>> +
>>> +(define_expand "kmovq"
>>> +  [(set (match_operand:DI 0 "nonimmediate_operand")
>>> + (match_operand:DI 1 "nonimmediate_operand"))]
>>> +  "TARGET_AVX512BW
>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>> +
>>>  (define_insn "k"
>>>[(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
>>>   (any_logic:SWI1248_AVX512BW
>>
>> All the above patterns can be macroized with the following patch:
>>
>> --cut here--
>> Index: sse.md
>> ===
>> --- sse.md  (revision 243651)
>> +++ sse.md  (working copy)
>> @@ -1309,9 +1309,9 @@
>>  ;; Mask variant shift mnemonics
>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>
>> -(define_expand "kmovw"
>> -  [(set (match_operand:HI 0 "nonimmediate_operand")
>> -   (match_operand:HI 1 "nonimmediate_operand"))]
>> +(define_expand "kmov"
>> +  [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
>> +   (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
>>"TARGET_AVX512F
>> && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>
>> --cut here--
>>
>> Please also post ChangeLog entry.
>
> Thanks,
>
> here is with ChangeLogs and renamed internal __builtin_ia32_kmov* to
> match instruction names.
> For __builtin_ia32_kmov16 change I will follow up for update in branches.
>
> Regtested on x86_64-linux-gnu, Ok for trunk?

OK.

Thanks,
Uros.


[PATCH] Formatting and spelling fixes for ipa-cp.c

2016-12-15 Thread Jakub Jelinek
Hi!

When looking at the noipa attribute, I've been initially changing
ipa-cp.c, and noticed some bad spellings (various functions called *accross*
rather than *across*) and tons of bad formatting, sometimes e.g. indentation
by 3 or -1 columns etc.

This patch fixes what I've found quickly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2016-12-15  Jakub Jelinek  

* ipa-cp.c (class ipcp_bits_lattice): Formatting fixes.
(print_ipcp_constant_value): Likewise.
(ipcp_cloning_candidate_p): Likewise.
(ipcp_bits_lattice::get_value_and_mask): Likewise.
(ipcp_bits_lattice::meet_with_1): Likewise.
(ipcp_bits_lattice::meet_with): Likewise.
(initialize_node_lattices): Likewise.
(ipcp_lattice::add_value): Likewise.
(propagate_vals_accross_pass_through): Renamed to ...
(propagate_vals_across_pass_through): ... this function.
(propagate_vals_accross_ancestor): Renamed to ...
(propagate_vals_across_ancestor): ... this.
(propagate_scalar_accross_jump_function): Renamed to ...
(propagate_scalar_across_jump_function): ... this.
Adjust calls to above functions.
(propagate_context_accross_jump_function): Renamed to ...
(propagate_context_across_jump_function): ... this.
(propagate_bits_accross_jump_function): Renamed to ...
(propagate_bits_accross_jump_function): ... this.  Formatting fixes.
(propagate_vr_accross_jump_function): Renamed to ...
(propagate_vr_across_jump_function): ... this.
(merge_agg_lats_step): Formatting fixes.
(propagate_constants_accross_call): Renamed to ...
(propagate_constants_across_call): ... this.  Adjust calls to above
functions.
(ipa_get_indirect_edge_target_1): Formatting fixes.
(gather_context_independent_values): Likewise.
(estimate_local_effects): Likewise.
(add_all_node_vals_to_toposort): Likewise.
(propagate_constants_topo): Adjust calls to above functions.
(get_replacement_map): Formatting fixes.
(dump_profile_updates): Likewise.
(update_profiling_info): Likewise.
(update_specialized_profile): Likewise.
(create_specialized_node): Likewise.
(find_more_contexts_for_caller_subset): Likewise.
(decide_whether_version_node): Likewise.
(identify_dead_nodes): Likewise.
(ipcp_decision_stage): Likewise.
(ipcp_store_bits_results): Likewise.
(ipcp_store_vr_results): Likewise.
(ipcp_driver): Likewise.

--- gcc/ipa-cp.c.jj 2016-12-15 12:20:11.0 +0100
+++ gcc/ipa-cp.c2016-12-15 13:11:29.914430138 +0100
@@ -61,7 +61,7 @@ along with GCC; see the file COPYING3.
values:
 
Pass through - the caller's formal parameter is passed as an actual
-  argument, plus an operation on it can be performed.
+ argument, plus an operation on it can be performed.
Constant - a constant is passed as an actual argument.
Unknown - neither of the above.
 
@@ -268,8 +268,8 @@ public:
   bool top_p () { return m_lattice_val == IPA_BITS_UNDEFINED; }
   bool constant_p () { return m_lattice_val == IPA_BITS_CONSTANT; }
   bool set_to_bottom ();
-  bool set_to_constant (widest_int, widest_int); 
- 
+  bool set_to_constant (widest_int, widest_int);
+
   widest_int get_value () { return m_value; }
   widest_int get_mask () { return m_mask; }
 
@@ -288,9 +288,9 @@ private:
  value is known to be constant.  */
   widest_int m_value, m_mask;
 
-  bool meet_with_1 (widest_int, widest_int, unsigned); 
+  bool meet_with_1 (widest_int, widest_int, unsigned);
   void get_value_and_mask (tree, widest_int *, widest_int *);
-}; 
+};
 
 /* Lattice of value ranges.  */
 
@@ -424,7 +424,7 @@ static void
 print_ipcp_constant_value (FILE * f, tree v)
 {
   if (TREE_CODE (v) == ADDR_EXPR
-  && TREE_CODE (TREE_OPERAND (v, 0)) == CONST_DECL)
+  && TREE_CODE (TREE_OPERAND (v, 0)) == CONST_DECL)
 {
   fprintf (f, "& ");
   print_generic_expr (f, DECL_INITIAL (TREE_OPERAND (v, 0)), 0);
@@ -684,18 +684,18 @@ ipcp_cloning_candidate_p (struct cgraph_
   if (!opt_for_fn (node->decl, flag_ipa_cp_clone))
 {
   if (dump_file)
-fprintf (dump_file, "Not considering %s for cloning; "
+   fprintf (dump_file, "Not considering %s for cloning; "
 "-fipa-cp-clone disabled.\n",
-node->name ());
+node->name ());
   return false;
 }
 
   if (node->optimize_for_size_p ())
 {
   if (dump_file)
-fprintf (dump_file, "Not considering %s for cloning; "
+   fprintf (dump_file, "Not considering %s for cloning; "
 "optimizing it for size.\n",
-node->name ());
+node->name ());
   return false;
 }
 
@@ -705,8 +705,8 @@ ipcp_cloning_candidate_p (struct cgraph_
   if (inline_summaries->get (node)

RFC: Make iterator printers fail more gracefully

2016-12-15 Thread Jonathan Wakely

This patch tries to improve the user experience when debugging
container iterators, for cases where some of the typedefs used by the
printers are not in the debuginfo, so gdb.lookup_type() calls fail.
That happens if the iterator's operator*() and operator->() haven't
been instantiated, or if they've been inlined.

Currently this results in an exception:

$1 = Python Exception  Cannot find type std::_List_iterator::_Node: 


If the iterator being printed is part of some other object the whole
thing fails due to the exception.

With this patch the iterator instead prints:

$1 = 

and if it's a subobject the rest of the object is printed, with that
as the value of the iterator.

* python/libstdcxx/v6/printers.py (StdListIteratorPrinter.to_string):
Handle exception from failed type lookup and return user-friendly
string.
(StdRbtreeIteratorPrinter.__init__): Handle exception from failed
type lookup.
(StdRbtreeIteratorPrinter.to_string): Return user-friendly string.

Seem reasonable?

I consider this a stop-gap until we have Xmethods for all our iterator
types, then we'll be able to "print *iter" even without debuginfo for
all the iterator's members, and we can disable these printers.

commit 1794ede023b36df4eb2c7264e2d04d6a86ad456a
Author: Jonathan Wakely 
Date:   Thu Dec 15 16:52:21 2016 +

Make iterator printers fail more gracefully

* python/libstdcxx/v6/printers.py (StdListIteratorPrinter.to_string):
Handle exception from failed type lookup and return user-friendly
string.
(StdRbtreeIteratorPrinter.__init__): Handle exception from failed
type lookup.
(StdRbtreeIteratorPrinter.to_string): Return user-friendly string.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 86de1ca..60afb52 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -202,10 +202,13 @@ class StdListIteratorPrinter:
 def to_string(self):
 if not self.val['_M_node']:
 return 'non-dereferenceable iterator for std::list'
-nodetype = find_type(self.val.type, '_Node')
-nodetype = nodetype.strip_typedefs().pointer()
-node = self.val['_M_node'].cast(nodetype).dereference()
-return str(get_value_from_list_node(node))
+try:
+nodetype = find_type(self.val.type, '_Node')
+nodetype = nodetype.strip_typedefs().pointer()
+node = self.val['_M_node'].cast(nodetype).dereference()
+return str(get_value_from_list_node(node))
+except:
+return ''
 
 class StdSlistPrinter:
 "Print a __gnu_cxx::slist"
@@ -496,12 +499,17 @@ class StdRbtreeIteratorPrinter:
 def __init__ (self, typename, val):
 self.val = val
 valtype = self.val.type.template_argument(0).strip_typedefs()
-nodetype = gdb.lookup_type('std::_Rb_tree_node<' + str(valtype) + '>')
-self.link_type = nodetype.strip_typedefs().pointer()
+try:
+nodetype = gdb.lookup_type('std::_Rb_tree_node<' + str(valtype) + 
'>')
+self.link_type = nodetype.strip_typedefs().pointer()
+except:
+self.link_type = None
 
 def to_string (self):
 if not self.val['_M_node']:
 return 'non-dereferenceable iterator for associative container'
+if self.link_type is None:
+return ""
 node = self.val['_M_node'].cast(self.link_type).dereference()
 return str(get_value_from_Rb_tree_node(node))
 


Re: RFC: Make iterator printers fail more gracefully

2016-12-15 Thread Ville Voutilainen
On 15 December 2016 at 19:39, Jonathan Wakely  wrote:
> This patch tries to improve the user experience when debugging
> container iterators, for cases where some of the typedefs used by the
> printers are not in the debuginfo, so gdb.lookup_type() calls fail.
> That happens if the iterator's operator*() and operator->() haven't
> been instantiated, or if they've been inlined.
>
> Currently this results in an exception:
>
> $1 = Python Exception  Cannot find type
> std::_List_iterator::_Node:
> If the iterator being printed is part of some other object the whole
> thing fails due to the exception.
>
> With this patch the iterator instead prints:
>
> $1 = 
>
> and if it's a subobject the rest of the object is printed, with that
> as the value of the iterator.
>
> * python/libstdcxx/v6/printers.py
> (StdListIteratorPrinter.to_string):
> Handle exception from failed type lookup and return user-friendly
> string.
> (StdRbtreeIteratorPrinter.__init__): Handle exception from failed
> type lookup.
> (StdRbtreeIteratorPrinter.to_string): Return user-friendly string.
>
> Seem reasonable?

Yes, looks like a good improvement.

> I consider this a stop-gap until we have Xmethods for all our iterator
> types, then we'll be able to "print *iter" even without debuginfo for
> all the iterator's members, and we can disable these printers.


+1.


Re: RFC: Make iterator printers fail more gracefully

2016-12-15 Thread David Malcolm
On Thu, 2016-12-15 at 17:39 +, Jonathan Wakely wrote:
> This patch tries to improve the user experience when debugging
> container iterators, for cases where some of the typedefs used by the
> printers are not in the debuginfo, so gdb.lookup_type() calls fail.
> That happens if the iterator's operator*() and operator->() haven't
> been instantiated, or if they've been inlined.
> 
> Currently this results in an exception:
> 
> $1 = Python Exception  Cannot find type
> std::_List_iterator::_Node: 
> 
> If the iterator being printed is part of some other object the whole
> thing fails due to the exception.
> 
> With this patch the iterator instead prints:
> 
> $1 = 
> 
> and if it's a subobject the rest of the object is printed, with that
> as the value of the iterator.
> 
>   * python/libstdcxx/v6/printers.py
> (StdListIteratorPrinter.to_string):
>   Handle exception from failed type lookup and return user
> -friendly
>   string.
>   (StdRbtreeIteratorPrinter.__init__): Handle exception from
> failed
>   type lookup.
>   (StdRbtreeIteratorPrinter.to_string): Return user-friendly
> string.
> 
> Seem reasonable?
> 
> I consider this a stop-gap until we have Xmethods for all our
> iterator
> types, then we'll be able to "print *iter" even without debuginfo for
> all the iterator's members, and we can disable these printers.

BTW, is it always a ValueError exception?

(I'm a little wary of naked "except:" in Python, as it can catch
*anything*, including syntax errors in the try/except-guarded code).


Re: [PATCH] Formatting and spelling fixes for ipa-cp.c

2016-12-15 Thread Martin Jambor
On Thu, Dec 15, 2016 at 05:51:25PM +0100, Jakub Jelinek wrote:
> Hi!
> 
> When looking at the noipa attribute, I've been initially changing
> ipa-cp.c, and noticed some bad spellings (various functions called *accross*
> rather than *across*) and tons of bad formatting, sometimes e.g. indentation
> by 3 or -1 columns etc.
> 
> This patch fixes what I've found quickly, bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

I can't approve it but I'll be happy to have this committed.

Thanks,

Martin


Re: RFC: Make iterator printers fail more gracefully

2016-12-15 Thread Jonathan Wakely

On 15/12/16 13:11 -0500, David Malcolm wrote:

On Thu, 2016-12-15 at 17:39 +, Jonathan Wakely wrote:

This patch tries to improve the user experience when debugging
container iterators, for cases where some of the typedefs used by the
printers are not in the debuginfo, so gdb.lookup_type() calls fail.
That happens if the iterator's operator*() and operator->() haven't
been instantiated, or if they've been inlined.

Currently this results in an exception:

$1 = Python Exception  Cannot find type
std::_List_iterator::_Node:

If the iterator being printed is part of some other object the whole
thing fails due to the exception.

With this patch the iterator instead prints:

$1 = 

and if it's a subobject the rest of the object is printed, with that
as the value of the iterator.

* python/libstdcxx/v6/printers.py
(StdListIteratorPrinter.to_string):
Handle exception from failed type lookup and return user
-friendly
string.
(StdRbtreeIteratorPrinter.__init__): Handle exception from
failed
type lookup.
(StdRbtreeIteratorPrinter.to_string): Return user-friendly
string.

Seem reasonable?

I consider this a stop-gap until we have Xmethods for all our
iterator
types, then we'll be able to "print *iter" even without debuginfo for
all the iterator's members, and we can disable these printers.


BTW, is it always a ValueError exception?

(I'm a little wary of naked "except:" in Python, as it can catch
*anything*, including syntax errors in the try/except-guarded code).


Good point. As far as I know, the gdb.lookup_type method will throw a
ValueError in the case I'm trying to fix. If it can throw other things
we can deal with them later by adding other handlers.



Re: [PATCH] Add AVX512 k-mask intrinsics

2016-12-15 Thread Andrew Senkevich
2016-12-15 19:51 GMT+03:00 Uros Bizjak :
> On Thu, Dec 15, 2016 at 2:31 PM, Andrew Senkevich
>  wrote:
>> 2016-12-14 22:55 GMT+03:00 Uros Bizjak :
>>> On Wed, Dec 14, 2016 at 8:04 PM, Andrew Senkevich
>>>  wrote:
>>>
 here is the second part of k-mask intrinsics, is it Ok?
>>>
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -1309,12 +1309,30 @@
  ;; Mask variant shift mnemonics
  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])

 +(define_expand "kmovb"
 +  [(set (match_operand:QI 0 "nonimmediate_operand")
 + (match_operand:QI 1 "nonimmediate_operand"))]
 +  "TARGET_AVX512DQ
 +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
 +
  (define_expand "kmovw"
[(set (match_operand:HI 0 "nonimmediate_operand")
   (match_operand:HI 1 "nonimmediate_operand"))]
"TARGET_AVX512F
 && !(MEM_P (operands[0]) && MEM_P (operands[1]))")

 +(define_expand "kmovd"
 +  [(set (match_operand:SI 0 "nonimmediate_operand")
 + (match_operand:SI 1 "nonimmediate_operand"))]
 +  "TARGET_AVX512BW
 +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
 +
 +(define_expand "kmovq"
 +  [(set (match_operand:DI 0 "nonimmediate_operand")
 + (match_operand:DI 1 "nonimmediate_operand"))]
 +  "TARGET_AVX512BW
 +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
 +
  (define_insn "k"
[(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
   (any_logic:SWI1248_AVX512BW
>>>
>>> All the above patterns can be macroized with the following patch:
>>>
>>> --cut here--
>>> Index: sse.md
>>> ===
>>> --- sse.md  (revision 243651)
>>> +++ sse.md  (working copy)
>>> @@ -1309,9 +1309,9 @@
>>>  ;; Mask variant shift mnemonics
>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>
>>> -(define_expand "kmovw"
>>> -  [(set (match_operand:HI 0 "nonimmediate_operand")
>>> -   (match_operand:HI 1 "nonimmediate_operand"))]
>>> +(define_expand "kmov"
>>> +  [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
>>> +   (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
>>>"TARGET_AVX512F
>>> && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>
>>> --cut here--
>>>
>>> Please also post ChangeLog entry.
>>
>> Thanks,
>>
>> here is with ChangeLogs and renamed internal __builtin_ia32_kmov* to
>> match instruction names.
>> For __builtin_ia32_kmov16 change I will follow up for update in branches.
>>
>> Regtested on x86_64-linux-gnu, Ok for trunk?
>
> OK.

Thanks,

here is one more part for kadd{b,w,d,q}, is it ok?

gcc/
* config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
* config/i386/avx512dqintrin.h: Ditto.
* config/i386/avx512fintrin.h: Ditto.
* config/i386/i386-builtin.def (__builtin_ia32_kaddqi,
__builtin_ia32_kaddhi, __builtin_ia32_kaddsi,
__builtin_ia32_kadddi): New.
* config/i386/sse.md (kadd): New.

gcc/testsuite/
* gcc.target/i386/avx512bw-kaddd-1.c: New test.
* gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
* gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
* gcc.target/i386/avx512f-kaddw-1.c: Ditto.

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index b35ae2b..e38055c 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,20 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));

 typedef unsigned long long __mmask64;

+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask32_u32 (__mmask32 __A)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 4db44e4..ccc6a4d 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,13 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */

+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask8_u32 (__mmask8 __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index a889c83..820741c 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9984,6 +9984,13 @@ _mm5

Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2016-12-15 Thread Joseph Myers
On Thu, 15 Dec 2016, Tamar Christina wrote:

> > Note that on some systems we even disable 64bit floating point support.
> > I suspect this check needs a little re-thinking as I don't think that
> > checking for a specific UNITS_PER_WORD is correct, nor is checking the
> > width of the type.  I'm not offhand sure what the test should be, just
> > that I think we need something better here.
> 
> I think what I really wanted to test here is if there was an integer 
> mode available which has the exact width as the floating point one. So I 
> have replaced this with just a call to int_mode_for_mode. Which is 
> probably more correct.

I think an integer mode should always exist - even in the case of TFmode 
on 32-bit systems (32-bit sparc / s390, for example, use TFmode long 
double for GNU/Linux, and it's supported as _Float128 and __float128 on 
32-bit x86).  It just be not be usable for arithmetic or declaring 
variables of that type.

I don't know whether TImode bitwise operations, such as generated by this 
fpclassify work, will get properly lowered to operations on supported 
narrower modes, but I hope so (clearly it's simpler if you can write 
things straightforwardly and have them cover this case of TFmode on 32-bit 
systems automatically through lowering elsewhere in the compiler, than if 
covering that case would require additional code - the more cases you 
cover, the more opportunity there is for glibc to use the built-in 
functions even with -fsignaling-nans).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C++ PATCH] P0490R0 GB 20: decomposition declaration should commit to tuple interpretation early (take 2)

2016-12-15 Thread Jason Merrill
OK.

On Thu, Dec 15, 2016 at 11:44 AM, Jakub Jelinek  wrote:
> On Thu, Dec 15, 2016 at 07:40:58AM -0500, Nathan Sidwell wrote:
>> On 12/15/2016 07:26 AM, Jakub Jelinek wrote:
>>
>> > I don't think so.  complete_type (error_mark_node) returns error_mark_node,
>> > and COMPLETE_TYPE_P (error_mark_node) is invalid (should fail TYPE_CHECK in
>> > checking compiler).
>> >
>> > I can write it as
>> >   inst = complete_type (inst);
>> >   if (inst == error_mark_node || !COMPLETE_TYPE_P (inst))
>> > return NULL_TREE;
>>
>> that's probably better, because complete_type can return error_mark_node if
>> 'something goes horribly wrong'
>
> Ok, here is the updated patch (and also moving the diagnostics to
> get_tuple_size caller as written in the other mail as an option),
> bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-12-15  Jakub Jelinek  
>
> P0490R0 GB 20: decomposition declaration should commit to tuple
> interpretation early
> * decl.c (get_tuple_size): Make static.  If inst is error_mark_node
> or non-complete type, return NULL_TREE, otherwise if
> lookup_qualified_name fails or doesn't fold into INTEGER_CST, return
> error_mark_node.
> (get_tuple_element_type, get_tuple_decomp_init): Make static.
> (cp_finish_decomp): Pass LOC to get_tuple_size.  If it returns
> error_mark_node, complain and fail.
>
> * g++.dg/cpp1z/decomp10.C (f1): Adjust expected diagnostics.
>
> --- gcc/cp/decl.c.jj2016-12-08 23:17:57.256167066 +0100
> +++ gcc/cp/decl.c   2016-12-15 13:48:48.087424991 +0100
> @@ -7259,7 +7259,7 @@ find_decomp_class_base (location_t loc,
>
>  /* Return std::tuple_size::value.  */
>
> -tree
> +static tree
>  get_tuple_size (tree type)
>  {
>tree args = make_tree_vec (1);
> @@ -7268,6 +7268,9 @@ get_tuple_size (tree type)
>  /*in_decl*/NULL_TREE,
>  /*context*/std_node,
>  /*entering_scope*/false, tf_none);
> +  inst = complete_type (inst);
> +  if (inst == error_mark_node || !COMPLETE_TYPE_P (inst))
> +return NULL_TREE;
>tree val = lookup_qualified_name (inst, get_identifier ("value"),
> /*type*/false, /*complain*/false);
>if (TREE_CODE (val) == VAR_DECL || TREE_CODE (val) == CONST_DECL)
> @@ -7275,12 +7278,12 @@ get_tuple_size (tree type)
>if (TREE_CODE (val) == INTEGER_CST)
>  return val;
>else
> -return NULL_TREE;
> +return error_mark_node;
>  }
>
>  /* Return std::tuple_element::type.  */
>
> -tree
> +static tree
>  get_tuple_element_type (tree type, unsigned i)
>  {
>tree args = make_tree_vec (2);
> @@ -7297,7 +7302,7 @@ get_tuple_element_type (tree type, unsig
>
>  /* Return e.get() or get(e).  */
>
> -tree
> +static tree
>  get_tuple_decomp_init (tree decl, unsigned i)
>  {
>tree get_id = get_identifier ("get");
> @@ -7342,6 +7347,7 @@ store_decomp_type (tree v, tree t)
>  decomp_type_table = hash_map::create_ggc (13);
>decomp_type_table->put (v, t);
>  }
> +
>  tree
>  lookup_decomp_type (tree v)
>  {
> @@ -7502,6 +7508,12 @@ cp_finish_decomp (tree decl, tree first,
>  }
>else if (tree tsize = get_tuple_size (type))
>  {
> +  if (tsize == error_mark_node)
> +   {
> + error_at (loc, "%::value%> is not an integral "
> +"constant expression", type);
> + goto error_out;
> +   }
>eltscnt = tree_to_uhwi (tsize);
>if (count != eltscnt)
> goto cnt_mismatch;
> --- gcc/testsuite/g++.dg/cpp1z/decomp10.C.jj2016-12-08 23:17:57.477164261 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1z/decomp10.C   2016-12-15 13:47:47.288240503 
> +0100
> @@ -7,7 +7,7 @@ namespace std {
>
>  struct A1 { int i,j; } a1;
>  template<> struct std::tuple_size {  };
> -void f1() { auto [ x ] = a1; } // { dg-error "decomposes into 2" }
> +void f1() { auto [ x ] = a1; } // { dg-error "is not an integral constant 
> expression" }
>
>  struct A2 { int i,j; } a2;
>  template<> struct std::tuple_size { enum { value = 5 }; };
>
>
> Jakub


Re: [Patch, Fortran, cleanup] PR 78798: some int-valued functions should be bool

2016-12-15 Thread Janus Weil
2016-12-13 19:55 GMT+01:00 Janus Weil :
> 2016-12-13 19:19 GMT+01:00 Janne Blomqvist :
>> On Tue, Dec 13, 2016 at 8:13 PM, Janus Weil  wrote:
>>> Hi all,
>>>
>>> here is a straightforward cleanup patch that makes a few functions
>>> return a bool instead of an int. Regtests cleanly on x86_64-linux-gnu.
>>> Ok for trunk?
>>
>> Ok, thanks.
>
> Thanks, Janne. Committed as r243621.


I realized that also lots of functions in interface.c could/should be
converted from int to bool. Attached is a patch which does that.

It's not small, but mostly mechanical, and regtests cleanly on
x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus


2016-12-15  Janus Weil  

PR fortran/78798
* gfortran.h (gfc_compare_derived_types,gfc_compare_types,
gfc_compare_interfaces,gfc_has_vector_subscript): Return bool instead
of int.
* interface.c (compare_components): Ditto.
(gfc_compare_union_types): Rename to compare_union_types, declare as
static, return bool.
(gfc_compare_derived_types): Return bool instead of int.
(gfc_compare_types): Ditto.
(compare_type): Ditto.
(compare_rank): Ditto.
(compare_type_rank): Ditto.
(compare_type_rank_if): Ditto.
(count_types_test): Ditto.
(generic_correspondence): Ditto.
(gfc_compare_interfaces): Ditto.
(check_interface0): Ditto.
(check_interface1): Ditto.
(compare_allocatable): Ditto.
(compare_parameter): Ditto.
(gfc_has_vector_subscript): Ditto.
(compare_actual_formal): Ditto.
Index: gcc/fortran/gfortran.h
===
--- gcc/fortran/gfortran.h  (revision 243695)
+++ gcc/fortran/gfortran.h  (working copy)
@@ -3225,14 +3225,14 @@ bool gfc_ref_dimen_size (gfc_array_ref *, int dime
 
 /* interface.c -- FIXME: some of these should be in symbol.c */
 void gfc_free_interface (gfc_interface *);
-int gfc_compare_derived_types (gfc_symbol *, gfc_symbol *);
-int gfc_compare_types (gfc_typespec *, gfc_typespec *);
+bool gfc_compare_derived_types (gfc_symbol *, gfc_symbol *);
+bool gfc_compare_types (gfc_typespec *, gfc_typespec *);
 bool gfc_check_dummy_characteristics (gfc_symbol *, gfc_symbol *,
  bool, char *, int);
 bool gfc_check_result_characteristics (gfc_symbol *, gfc_symbol *,
   char *, int);
-int gfc_compare_interfaces (gfc_symbol*, gfc_symbol*, const char *, int, int,
-   char *, int, const char *, const char *);
+bool gfc_compare_interfaces (gfc_symbol*, gfc_symbol*, const char *, int, int,
+char *, int, const char *, const char *);
 void gfc_check_interfaces (gfc_namespace *);
 bool gfc_procedure_use (gfc_symbol *, gfc_actual_arglist **, locus *);
 void gfc_ppc_use (gfc_component *, gfc_actual_arglist **, locus *);
@@ -3248,7 +3248,7 @@ void gfc_set_current_interface_head (gfc_interface
 gfc_symtree* gfc_find_sym_in_symtree (gfc_symbol*);
 bool gfc_arglist_matches_symbol (gfc_actual_arglist**, gfc_symbol*);
 bool gfc_check_operator_interface (gfc_symbol*, gfc_intrinsic_op, locus);
-int gfc_has_vector_subscript (gfc_expr*);
+bool gfc_has_vector_subscript (gfc_expr*);
 gfc_intrinsic_op gfc_equivalent_op (gfc_intrinsic_op);
 bool gfc_check_typebound_override (gfc_symtree*, gfc_symtree*);
 void gfc_check_dtio_interfaces (gfc_symbol*);
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c (revision 243695)
+++ gcc/fortran/interface.c (working copy)
@@ -471,7 +471,7 @@ is_anonymous_dt (gfc_symbol *derived)
 
 /* Compare components according to 4.4.2 of the Fortran standard.  */
 
-static int
+static bool
 compare_components (gfc_component *cmp1, gfc_component *cmp2,
 gfc_symbol *derived1, gfc_symbol *derived2)
 {
@@ -478,22 +478,22 @@ compare_components (gfc_component *cmp1, gfc_compo
   /* Compare names, but not for anonymous components such as UNION or MAP.  */
   if (!is_anonymous_component (cmp1) && !is_anonymous_component (cmp2)
   && strcmp (cmp1->name, cmp2->name) != 0)
-return 0;
+return false;
 
   if (cmp1->attr.access != cmp2->attr.access)
-return 0;
+return false;
 
   if (cmp1->attr.pointer != cmp2->attr.pointer)
-return 0;
+return false;
 
   if (cmp1->attr.dimension != cmp2->attr.dimension)
-return 0;
+return false;
 
   if (cmp1->attr.allocatable != cmp2->attr.allocatable)
-return 0;
+return false;
 
   if (cmp1->attr.dimension && gfc_compare_array_spec (cmp1->as, cmp2->as) == 0)
-return 0;
+return false;
 
   if (cmp1->ts.type == BT_CHARACTER && cmp2->ts.type == BT_CHARACTER)
 {
@@ -503,7 +503,7 @@ compare_components (gfc_component *cmp1, gfc_compo
   && l1->length->expr_type == EXPR_CONSTANT
   && l2->length->expr_type == EXPR_CONSTANT
   && gfc_dep_compare_expr (l1->length, l2->length) != 0)
-return 0;
+return false;
 }
 
   /* Make

Re: [PATCH] c++/pr77585 bogus this error with generic lambda

2016-12-15 Thread Jason Merrill
OK.

On Thu, Dec 15, 2016 at 7:38 AM, Nathan Sidwell  wrote:
> 77585 concerns the instantiation of a generic lambda that contains a call to
> a non-dependent non-static member function.
>
>   auto lam = [&](auto) { return Share (); };
>   r += Eat (lam);  // instantation of lambda::operator() here
>
> During instantiation of the call to Share, maybe_resolve_dummy gets called
> and uses current_nonlambda_class_type, which peeks up the
> current_current_class stack.
>
> That peeking presupposes we're actually pushing and popping class scopes as
> we enter them all the way from the global scope.  But that doesn't always
> happen in instantiation.  push_nested_class pushes the immediately enclosing
> scopes, but stops at function scope.  So we don't get the class scope of
> that function pushed.  Thus stack peeking fails.
>
> This hasn't previously been an instantiation problem, because templates
> couldn't be defined at local scope.  But generic lambdas now have that
> property (wrt this capture at least).
>
> This patch amends instantiate_decl to first push the containing non-lambda
> class scope before start_preparsed_function does its stack pushing.
>
> ok?
>
> nathan
> --
> Nathan Sidwell


gcc-patches@gcc.gnu.org

2016-12-15 Thread Ville Voutilainen
Tested on Linux-x64.

2016-12-15  Ville Voutilainen  

Implement LWG 2769, Redundant const in the return type of
any_cast(const any&).
* include/std/any (_AnyCast): New.
(any_cast(const any&)): Use it and add an explicit cast for return.
(any_cast(any&)): Likewise.
(any_cast(any&&)): Likewise.
* testsuite/20_util/any/misc/any_cast.cc: Add a test for a type
that has an explicit copy constructor.
*testsuite/20_util/any/misc/any_cast_neg.cc: Adjust.
diff --git a/libstdc++-v3/include/std/any b/libstdc++-v3/include/std/any
index ded2bb2..820427c 100644
--- a/libstdc++-v3/include/std/any
+++ b/libstdc++-v3/include/std/any
@@ -433,6 +433,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return any(in_place_type<_Tp>, __il, std::forward<_Args>(__args)...);
 }
 
+  template 
+using _AnyCast = remove_cv_t>;
   /**
* @brief Access the contained object.
*
@@ -448,9 +450,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(any::__is_valid_cast<_ValueType>(),
  "Template argument must be a reference or CopyConstructible type");
-  auto __p = any_cast>>(&__any);
+  auto __p = any_cast<_AnyCast<_ValueType>>(&__any);
   if (__p)
-   return *__p;
+   return static_cast<_ValueType>(*__p);
   __throw_bad_any_cast();
 }
 
@@ -471,9 +473,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(any::__is_valid_cast<_ValueType>(),
  "Template argument must be a reference or CopyConstructible type");
-  auto __p = any_cast>(&__any);
+  auto __p = any_cast<_AnyCast<_ValueType>>(&__any);
   if (__p)
-   return *__p;
+   return static_cast<_ValueType>(*__p);
   __throw_bad_any_cast();
 }
 
@@ -485,9 +487,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(any::__is_valid_cast<_ValueType>(),
  "Template argument must be a reference or CopyConstructible type");
-  auto __p = any_cast>(&__any);
+  auto __p = any_cast<_AnyCast<_ValueType>>(&__any);
   if (__p)
-   return *__p;
+   return static_cast<_ValueType>(*__p);
   __throw_bad_any_cast();
 }
 
@@ -499,9 +501,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   static_assert(any::__is_valid_cast<_ValueType>(),
  "Template argument must be a reference or CopyConstructible type");
-  auto __p = any_cast>(&__any);
+  using _Up = remove_cv_t>;
+  auto __p = any_cast<_AnyCast<_ValueType>>(&__any);
   if (__p)
-   return std::move(*__p);
+   return static_cast<_ValueType>(std::move(*__p));
   __throw_bad_any_cast();
 }
   // @}
diff --git a/libstdc++-v3/testsuite/20_util/any/misc/any_cast.cc 
b/libstdc++-v3/testsuite/20_util/any/misc/any_cast.cc
index 96f9419..f3ae592 100644
--- a/libstdc++-v3/testsuite/20_util/any/misc/any_cast.cc
+++ b/libstdc++-v3/testsuite/20_util/any/misc/any_cast.cc
@@ -106,9 +106,22 @@ void test03()
   MoveDeleted&& md3 = any_cast(any(std::move(md)));
 }
 
+void test04()
+{
+  struct ExplicitCopy
+  {
+ExplicitCopy() = default;
+explicit ExplicitCopy(const ExplicitCopy&) = default;
+  };
+  any x = ExplicitCopy();
+  ExplicitCopy ec{any_cast(x)};
+  ExplicitCopy ec2{any_cast(std::move(x))};
+}
+
 int main()
 {
   test01();
   test02();
   test03();
+  test04();
 }
diff --git a/libstdc++-v3/testsuite/20_util/any/misc/any_cast_neg.cc 
b/libstdc++-v3/testsuite/20_util/any/misc/any_cast_neg.cc
index 4de400d..a8a1ca9 100644
--- a/libstdc++-v3/testsuite/20_util/any/misc/any_cast_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/any/misc/any_cast_neg.cc
@@ -26,5 +26,5 @@ void test01()
   using std::any_cast;
 
   const any y(1);
-  any_cast(y); // { dg-error "qualifiers" "" { target { *-*-* } } 453 }
+  any_cast(y); // { dg-error "invalid static_cast" "" { target { *-*-* } 
} 455 }
 }


Re: [Patch, Fortran, cleanup] PR 78798: some int-valued functions should be bool

2016-12-15 Thread Steve Kargl
On Thu, Dec 15, 2016 at 08:38:47PM +0100, Janus Weil wrote:
> 2016-12-13 19:55 GMT+01:00 Janus Weil :
> > 2016-12-13 19:19 GMT+01:00 Janne Blomqvist :
> >> On Tue, Dec 13, 2016 at 8:13 PM, Janus Weil  wrote:
> >>> Hi all,
> >>>
> >>> here is a straightforward cleanup patch that makes a few functions
> >>> return a bool instead of an int. Regtests cleanly on x86_64-linux-gnu.
> >>> Ok for trunk?
> >>
> >> Ok, thanks.
> >
> > Thanks, Janne. Committed as r243621.
> 
> 
> I realized that also lots of functions in interface.c could/should be
> converted from int to bool. Attached is a patch which does that.
> 
> It's not small, but mostly mechanical, and regtests cleanly on
> x86_64-linux-gnu. Ok for trunk?
> 

A quick scan of the patch did not reveal anything
that jumped out as wrong.  OK to commit.

-- 
steve


Fix concept checks usage

2016-12-15 Thread François Dumont

Hi

Here is a fix in the usage of a concept check. There are also many 
testsuite failures when using concept checks but this one forbids the 
library from being built. I know that concept checks are not really 
maintained so maybe the fix is to simply remove those checks.



* include/ext/random.tcc: Fix usage of _OutputIteratorConcept.

Tested under Linux x86 with and without concept checks.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/ext/random.tcc b/libstdc++-v3/include/ext/random.tcc
index e1fd88d..9b3a86eb 100644
--- a/libstdc++-v3/include/ext/random.tcc
+++ b/libstdc++-v3/include/ext/random.tcc
@@ -440,7 +440,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __param)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	std::__detail::_Adaptor<_UniformRandomNumberGenerator, result_type>
 	  __aurng(__urng);
@@ -725,7 +726,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	while (__f != __t)
 	  {
@@ -799,7 +801,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	typename std::gamma_distribution::param_type
 	  __pg(__p.mu(), __p.omega() / __p.mu());
@@ -863,7 +866,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	result_type __mu_val = __p.mu();
 	result_type __malphinv = -result_type(1) / __p.alpha();
@@ -953,7 +957,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	typename std::gamma_distribution::param_type
 	  __p1(__p.lambda(), result_type(1) / __p.lambda()),
@@ -1024,7 +1029,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	result_type __dif = __p.b() - __p.a();
 	result_type __sum = __p.a() + __p.b();
@@ -1121,7 +1127,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	result_type __2q = result_type(2) * __p.q();
 	result_type __q2 = __p.q() * __p.q();
@@ -1196,7 +1203,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __param)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	while (__f != __t)
 	  *__f++ = this->operator()(__urng, __param);
@@ -1297,7 +1305,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __param)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	while (__f != __t)
 	  *__f++ = this->operator()(__urng, __param);
@@ -1403,7 +1412,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __param)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
 
 	while (__f != __t)
 	  *__f++ = this->operator()(__urng);
@@ -1481,7 +1491,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const param_type& __p)
   {
-	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator>)
+	__glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
+	result_type>)
+
 	std::__detail::_Adaptor<_UniformRandomNumberGenerator, result_type>
 	  __aurng(__urng);
 
@@ -1643,7 +1655,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _UniformRandomNumberGenerator& __urng,
 		  const par

Re: [PATCH] Formatting and spelling fixes for ipa-cp.c

2016-12-15 Thread Bernd Schmidt

On 12/15/2016 05:51 PM, Jakub Jelinek wrote:


This patch fixes what I've found quickly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?


Ok.


Bernd



Re: [PATCH][ARM] Improve Thumb allocation order

2016-12-15 Thread Richard Earnshaw (lists)
On 30/11/16 17:32, Wilco Dijkstra wrote:
> Thumb uses a special register allocation order to increase the use of low
> registers.  Oddly enough, LR appears before R12, which means that LR must
> be saved and restored even if R12 is available.  Swapping R12 and LR means
> this simple example now uses R12 as a temporary (just like ARM):
> 
> int f(long long a, long long b)
> {
>   if (a < b) return 1;
>   return a + b;
> }
> 
>   cmp r0, r2
>   sbcsip, r1, r3
>   ite ge
>   addge   r0, r0, r2
>   movlt   r0, #1
>   bx  lr
> 
> Bootstrap OK. CSibe benchmarks unchanged.
> 
> ChangeLog:
> 2016-11-30  Wilco Dijkstra  
> 
> * gcc/config/arm/arm.c (thumb_core_reg_alloc_order): Swap R12 and R14.
> 

OK.

R.

> --
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 43c78f6148a5306fb0079ee2eba12f3763652bcc..29dcefd23762ba861b458b8860eb4b4856a9cb02
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -26455,7 +26455,7 @@ arm_mangle_type (const_tree type)
>  static const int thumb_core_reg_alloc_order[] =
>  {
> 3,  2,  1,  0,  4,  5,  6,  7,
> -  14, 12,  8,  9, 10, 11
> +  12, 14,  8,  9, 10, 11
>  };
>  
>  /* Adjust register allocation order when compiling for Thumb.  */
> 



Re: [Patch, Fortran, cleanup] PR 78798: some int-valued functions should be bool

2016-12-15 Thread Janus Weil
2016-12-15 21:41 GMT+01:00 Steve Kargl :
> On Thu, Dec 15, 2016 at 08:38:47PM +0100, Janus Weil wrote:
>> 2016-12-13 19:55 GMT+01:00 Janus Weil :
>> > 2016-12-13 19:19 GMT+01:00 Janne Blomqvist :
>> >> On Tue, Dec 13, 2016 at 8:13 PM, Janus Weil  wrote:
>> >>> Hi all,
>> >>>
>> >>> here is a straightforward cleanup patch that makes a few functions
>> >>> return a bool instead of an int. Regtests cleanly on x86_64-linux-gnu.
>> >>> Ok for trunk?
>> >>
>> >> Ok, thanks.
>> >
>> > Thanks, Janne. Committed as r243621.
>>
>>
>> I realized that also lots of functions in interface.c could/should be
>> converted from int to bool. Attached is a patch which does that.
>>
>> It's not small, but mostly mechanical, and regtests cleanly on
>> x86_64-linux-gnu. Ok for trunk?
>>
>
> A quick scan of the patch did not reveal anything
> that jumped out as wrong.  OK to commit.

Thanks, Steve. Committed as r243726.

Cheers,
Janus


Re: [PATCH][ARM] Merge negdi2 patterns

2016-12-15 Thread Richard Earnshaw
On 30/11/16 17:39, Wilco Dijkstra wrote:
> The negdi2 patterns for ARM and Thumb-2 are duplicated because Thumb-2
> doesn't support RSC with an immediate.  We can however emulate RSC with
> zero using a shifted SBC.  If we add this to subsi3_carryin the negdi
> patterns can be merged, simplifying things a bit (eg. if changing when to 
> split
> for PR77308).  This should generate identical code in all cases.
> 
> ChangeLog:
> 2016-11-30  Wilco Dijkstra  
> 
> * gcc/config/arm/arm.md (subsi3_carryin): Add Thumb-2 RSC #0.
> (arm_negdi2) Rename to negdi2, allow on Thumb-2.
> * gcc/config/arm/thumb2.md (thumb2_negdi2): Remove pattern.
> 

negdi2 sounds rather like the expansion pattern.  A better name would be
negdi_insn, similar to uses elsewhere.

OK with that change.

R.

> --
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> 2035fa5861d876771aef9fb391bcb01b877cf148..eb79d1376e1fb3df1eabddde22aa93ab6fec94ea
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -1128,19 +1128,20 @@
>  )
>  
>  (define_insn "*subsi3_carryin"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> -(minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" "r,I")
> -(match_operand:SI 2 "s_register_operand" "r,r"))
> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
> +(minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" 
> "r,I,Pz")
> +(match_operand:SI 2 "s_register_operand" 
> "r,r,r"))
>(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
>"TARGET_32BIT"
>"@
> sbc%?\\t%0, %1, %2
> -   rsc%?\\t%0, %2, %1"
> +   rsc%?\\t%0, %2, %1
> +   sbc%?\\t%0, %2, %2, lsl #1"
>[(set_attr "conds" "use")
> -   (set_attr "arch" "*,a")
> +   (set_attr "arch" "*,a,t2")
> (set_attr "predicable" "yes")
> (set_attr "predicable_short_it" "no")
> -   (set_attr "type" "adc_reg,adc_imm")]
> +   (set_attr "type" "adc_reg,adc_imm,alu_shift_imm")]
>  )
>  
>  (define_insn "*subsi3_carryin_const"
> @@ -4731,12 +4732,13 @@
>  
>  ;; The constraints here are to prevent a *partial* overlap (where %Q0 == 
> %R1).
>  ;; The first alternative allows the common case of a *full* overlap.
> -(define_insn_and_split "*arm_negdi2"
> +(define_insn_and_split "*negdi2"
>[(set (match_operand:DI 0 "s_register_operand" "=r,&r")
>   (neg:DI (match_operand:DI 1 "s_register_operand"  "0,r")))
> (clobber (reg:CC CC_REGNUM))]
> -  "TARGET_ARM"
> -  "#"   ; "rsbs\\t%Q0, %Q1, #0\;rsc\\t%R0, %R1, #0"
> +  "TARGET_32BIT"
> +  "#"   ; rsbs %Q0, %Q1, #0; rsc %R0, %R1, #0  (ARM)
> + ; negs %Q0, %Q1; sbc %R0, %R1, %R1, lsl #1 (Thumb-2)
>"&& reload_completed"
>[(parallel [(set (reg:CC CC_REGNUM)
>  (compare:CC (const_int 0) (match_dup 1)))
> diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
> index 
> affcd832b72b7d358347e7370265be492866bb90..d9c530a48878923683485933c5640ffe80908401
>  100644
> --- a/gcc/config/arm/thumb2.md
> +++ b/gcc/config/arm/thumb2.md
> @@ -125,32 +125,6 @@
> (set_attr "type" "multiple")]
>  )
>  
> -;; Thumb-2 does not have rsc, so use a clever trick with shifter operands.
> -(define_insn_and_split "*thumb2_negdi2"
> -  [(set (match_operand:DI 0 "s_register_operand" "=&r,r")
> - (neg:DI (match_operand:DI 1 "s_register_operand"  "?r,0")))
> -   (clobber (reg:CC CC_REGNUM))]
> -  "TARGET_THUMB2"
> -  "#" ; negs\\t%Q0, %Q1\;sbc\\t%R0, %R1, %R1, lsl #1
> -  "&& reload_completed"
> -  [(parallel [(set (reg:CC CC_REGNUM)
> -(compare:CC (const_int 0) (match_dup 1)))
> -   (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))])
> -   (set (match_dup 2) (minus:SI (minus:SI (match_dup 3)
> -  (ashift:SI (match_dup 3)
> - (const_int 1)))
> -(ltu:SI (reg:CC_C CC_REGNUM) (const_int 
> 0]
> -  {
> -operands[2] = gen_highpart (SImode, operands[0]);
> -operands[0] = gen_lowpart (SImode, operands[0]);
> -operands[3] = gen_highpart (SImode, operands[1]);
> -operands[1] = gen_lowpart (SImode, operands[1]);
> -  }
> -  [(set_attr "conds" "clob")
> -   (set_attr "length" "8")
> -   (set_attr "type" "multiple")]
> -)
> 



Re: [PATCH] PR59170 make pretty printers check for singular iterators

2016-12-15 Thread Jan Kratochvil
On Thu, 15 Dec 2016 15:18:17 +0100, Jonathan Wakely wrote:
> I'm going to add Xmethods for all our iterator types so that it will
> always be possible to do "print *iter", so if GDB supports Xmethods
> then we don't need to register the iterator printers.

Just with the GDB 'compile' project (libcc1) which is planned to be used for
all GDB expressions evalation the Xmethods will no longer work.


Jan


Re: [2/67] Make machine_mode a class

2016-12-15 Thread Trevor Saunders
On Fri, Dec 09, 2016 at 12:52:03PM +, Richard Sandiford wrote:
> This patch renames enum machine_mode to enum machine_mode_enum
> and adds a machine_mode wrapper class.
> 
> The previous patch mechanically replaced mode names in case
> statements; this one updates other places that should continue
> to use the enum directly.
> 
> The patch continues to use enums for static variables.  This isn't
> necessary, but it cuts down on the amount of load-time initialisation
> and shouldn't have any downsides.

We should probably add a GCC_CONSTEXPR macro so we can get that
advantage without having to play games like this, but this case doesn't
seem like too big of a deal.

Trev



[PATCH, i386]: Improve ffs for TARGET_BMI and macroize a couple of bitmanip patterns

2016-12-15 Thread Uros Bizjak
Hello!

Attached patch improves ffs expandsion for TARGET_BMI targets.
Compared to bsf, tzcnt is noticeably faster on AMD processors.
However, since generic target enables TARGET_AVOID_FALSE_DEP_FOR_BMI,
we always expand ffs with bsf, even when using -mbmi.

Attached patch enables TARGET_AVOID_FALSE_DEP_FOR_BMI fixup also for
ffs expander, so tzcnt with eventual false-dep fixup is generated for
generic -mbmi targets.

The patch also macroizes a couple of patterns in this area.

2016-12-15  Uros Bizjak  

* config/i386/i386.md (ffs2): Generate CCCmode flags register
for TARGET_BMI.
(ffssi2_no_cmove): Ditto.
(*tzcnt_1_falsedep_1): New insn_and_split pattern.
(*tzcnt_1_falsedep): New insn pattern.

(LT_ZCNT): New mode iterator.
(lt_zcnt): New mode attribute.
(lt_zcnt_type): New mode attribute.
(_): Macroize expander from bmi_tzcnt_ and
lzcnt_ using LT_ZCNT mode iterator.
(*__falsedep_1): Macroize insn from
*bmi_tzcnt__falsedep_1 and *lzcnt__falsedep_1
using LT_ZCNT mode iterator.
(*__falsedep): Macroize insn from
*bmi_tzcnt__falsedep and *lzcnt__falsedep
using LT_ZCNT mode iterator.
(*_): Macroize insn from *bmi_tzcnt_
and *lzcnt_ using LT_ZCNT mode iterator.
* config/i386/i386-builtin.def (__builtin_ia32_tzcnt_u16)
(__builtin_ia32_tzcnt_u32, __builtin_ia32_tzcnt_u64, __builtin_ctzs):
Update for rename.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386-builtin.def
===
--- config/i386/i386-builtin.def(revision 243716)
+++ config/i386/i386-builtin.def(working copy)
@@ -1197,11 +1197,11 @@ BDESC (OPTION_MASK_ISA_LZCNT | OPTION_MASK_ISA_64B
 BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_bmi_bextr_si, "__builtin_ia32_bextr_u32", 
IX86_BUILTIN_BEXTR32, UNKNOWN, (int) UINT_FTYPE_UINT_UINT)
 BDESC (OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_64BIT, CODE_FOR_bmi_bextr_di, 
"__builtin_ia32_bextr_u64", IX86_BUILTIN_BEXTR64, UNKNOWN, (int) 
UINT64_FTYPE_UINT64_UINT64)
 
-BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_bmi_tzcnt_hi, "__builtin_ia32_tzcnt_u16", 
IX86_BUILTIN_TZCNT16, UNKNOWN, (int) UINT16_FTYPE_UINT16)
+BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_tzcnt_hi, "__builtin_ia32_tzcnt_u16", 
IX86_BUILTIN_TZCNT16, UNKNOWN, (int) UINT16_FTYPE_UINT16)
 /* Same as above, for backward compatibility.  */
-BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_bmi_tzcnt_hi, "__builtin_ctzs", 
IX86_BUILTIN_CTZS, UNKNOWN, (int) UINT16_FTYPE_UINT16)
-BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_bmi_tzcnt_si, "__builtin_ia32_tzcnt_u32", 
IX86_BUILTIN_TZCNT32, UNKNOWN, (int) UINT_FTYPE_UINT)
-BDESC (OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_64BIT, CODE_FOR_bmi_tzcnt_di, 
"__builtin_ia32_tzcnt_u64", IX86_BUILTIN_TZCNT64, UNKNOWN, (int) 
UINT64_FTYPE_UINT64)
+BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_tzcnt_hi, "__builtin_ctzs", 
IX86_BUILTIN_CTZS, UNKNOWN, (int) UINT16_FTYPE_UINT16)
+BDESC (OPTION_MASK_ISA_BMI, CODE_FOR_tzcnt_si, "__builtin_ia32_tzcnt_u32", 
IX86_BUILTIN_TZCNT32, UNKNOWN, (int) UINT_FTYPE_UINT)
+BDESC (OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_64BIT, CODE_FOR_tzcnt_di, 
"__builtin_ia32_tzcnt_u64", IX86_BUILTIN_TZCNT64, UNKNOWN, (int) 
UINT64_FTYPE_UINT64)
 
 /* TBM */
 BDESC (OPTION_MASK_ISA_TBM, CODE_FOR_tbm_bextri_si, 
"__builtin_ia32_bextri_u32", IX86_BUILTIN_BEXTRI32, UNKNOWN, (int) 
UINT_FTYPE_UINT_UINT)
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 243716)
+++ config/i386/i386.md (working copy)
@@ -12534,8 +12534,7 @@
   DONE;
 }
 
-  flags_mode
-= (TARGET_BMI && !TARGET_AVOID_FALSE_DEP_FOR_BMI) ? CCCmode : CCZmode;
+  flags_mode = TARGET_BMI ? CCCmode : CCZmode;
 
   operands[2] = gen_reg_rtx (mode);
   operands[3] = gen_rtx_REG (flags_mode, FLAGS_REG);
@@ -12561,8 +12560,7 @@
(parallel [(set (match_dup 0) (plus:SI (match_dup 0) (const_int 1)))
  (clobber (reg:CC FLAGS_REG))])]
 {
-  machine_mode flags_mode
-= (TARGET_BMI && !TARGET_AVOID_FALSE_DEP_FOR_BMI) ? CCCmode : CCZmode;
+  machine_mode flags_mode = TARGET_BMI ? CCCmode : CCZmode;
 
   operands[3] = gen_lowpart (QImode, operands[2]);
   operands[4] = gen_rtx_REG (flags_mode, FLAGS_REG);
@@ -12571,6 +12569,46 @@
   ix86_expand_clear (operands[2]);
 })
 
+; False dependency happens when destination is only updated by tzcnt,
+; lzcnt or popcnt.  There is no false dependency when destination is
+; also used in source.
+(define_insn_and_split "*tzcnt_1_falsedep_1"
+  [(set (reg:CCC FLAGS_REG)
+   (compare:CCC (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+(const_int 0)))
+   (set (match_operand:SWI48 0 "register_operand" "=r")
+   (ctz:SWI48 (match_dup 1)))]
+  "TARGET_BMI
+   && TARGET_AVOID_FALSE_DEP_FOR_BMI && optimize_function_for_speed_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(parallel
+[(set (reg:CCC FLAGS_REG)
+ 

Go patch committed: call determine_types even for constant expressions

2016-12-15 Thread Ian Lance Taylor
The Go frontend needs to call determine_types even for constant
expressions, which it was not doing.  The problem is that a constant
expression may include code like unsafe.Sizeof(0).  Something needs to
determine the type of the untyped 0, and that should be the
determine_types pass.

Implementing that triggered a compiler crash on test/const1.go because
it permitted some erroneous constants to make it all the way to the
backend.  Catch that case by checking whether we get a constant
overflow error, and marking the expression invalid if we do.  This is
a good change in any case, as previously we reported the same constant
overflow error multiple times, and now we only report it once.

This fixes GCC PR 78763.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 243682)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-5eb55901861f360c2c2ff70f14a8315694934c97
+e807c1deec1e7114bc4757b6193510fdae13e75f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 243682)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -3738,8 +3738,12 @@ Unary_expression::do_lower(Gogo*, Named_
   if (expr->numeric_constant_value(&nc))
{
  Numeric_constant result;
- if (Unary_expression::eval_constant(op, &nc, loc, &result))
+ bool issued_error;
+ if (Unary_expression::eval_constant(op, &nc, loc, &result,
+ &issued_error))
return result.expression(loc);
+ else if (issued_error)
+   return Expression::make_error(this->location());
}
 }
 
@@ -3900,12 +3904,15 @@ Unary_expression::do_is_static_initializ
 }
 
 // Apply unary opcode OP to UNC, setting NC.  Return true if this
-// could be done, false if not.  Issue errors for overflow.
+// could be done, false if not.  On overflow, issues an error and sets
+// *ISSUED_ERROR.
 
 bool
 Unary_expression::eval_constant(Operator op, const Numeric_constant* unc,
-   Location location, Numeric_constant* nc)
+   Location location, Numeric_constant* nc,
+   bool* issued_error)
 {
+  *issued_error = false;
   switch (op)
 {
 case OPERATOR_PLUS:
@@ -4050,7 +4057,12 @@ Unary_expression::eval_constant(Operator
   mpz_clear(uval);
   mpz_clear(val);
 
-  return nc->set_type(unc->type(), true, location);
+  if (!nc->set_type(unc->type(), true, location))
+{
+  *issued_error = true;
+  return false;
+}
+  return true;
 }
 
 // Return the integral constant value of a unary expression, if it has one.
@@ -4061,8 +4073,9 @@ Unary_expression::do_numeric_constant_va
   Numeric_constant unc;
   if (!this->expr_->numeric_constant_value(&unc))
 return false;
+  bool issued_error;
   return Unary_expression::eval_constant(this->op_, &unc, this->location(),
-nc);
+nc, &issued_error);
 }
 
 // Return the type of a unary expression.
@@ -4737,13 +4750,15 @@ Binary_expression::compare_complex(const
 
 // Apply binary opcode OP to LEFT_NC and RIGHT_NC, setting NC.  Return
 // true if this could be done, false if not.  Issue errors at LOCATION
-// as appropriate.
+// as appropriate, and sets *ISSUED_ERROR if it did.
 
 bool
 Binary_expression::eval_constant(Operator op, Numeric_constant* left_nc,
 Numeric_constant* right_nc,
-Location location, Numeric_constant* nc)
+Location location, Numeric_constant* nc,
+bool* issued_error)
 {
+  *issued_error = false;
   switch (op)
 {
 case OPERATOR_OROR:
@@ -4792,7 +4807,11 @@ Binary_expression::eval_constant(Operato
 r = Binary_expression::eval_integer(op, left_nc, right_nc, location, nc);
 
   if (r)
-r = nc->set_type(type, true, location);
+{
+  r = nc->set_type(type, true, location);
+  if (!r)
+   *issued_error = true;
+}
 
   return r;
 }
@@ -5115,9 +5134,15 @@ Binary_expression::do_lower(Gogo* gogo,
else
  {
Numeric_constant nc;
+   bool issued_error;
if (!Binary_expression::eval_constant(op, &left_nc, &right_nc,
- location, &nc))
+ location, &nc,
+ &issued_error))
+ {
+   if (issued_error)
+ return Expression::make_error(location);
 return t

Go patch committed: Fix off-by-one error in array length for GC symbol

2016-12-15 Thread Ian Lance Taylor
This patch by Than McIntosh fixes an off-by-one error in the array
length of the value we create for a GC symbol.  This was not serious,
it just wasted an element.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 243729)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e807c1deec1e7114bc4757b6193510fdae13e75f
+ae57b28b3caf1f6670e0f663235f1bf7655db870
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 243445)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -2261,7 +2261,7 @@ Type::gc_symbol_constructor(Gogo* gogo)
 
   vals->push_back(Expression::make_integer_ul(GC_END, uintptr_t, bloc));
 
-  Expression* len = Expression::make_integer_ul(vals->size() + 1, NULL,
+  Expression* len = Expression::make_integer_ul(vals->size(), NULL,
bloc);
   Array_type* gc_symbol_type = Type::make_array_type(uintptr_t, len);
   return Expression::make_array_composite_literal(gc_symbol_type, vals, bloc);


GCC 6 RFA: Go patch: call determine_types even for constant expressions

2016-12-15 Thread Ian Lance Taylor
On Thu, Dec 15, 2016 at 2:47 PM, Ian Lance Taylor  wrote:
> The Go frontend needs to call determine_types even for constant
> expressions, which it was not doing.  The problem is that a constant
> expression may include code like unsafe.Sizeof(0).  Something needs to
> determine the type of the untyped 0, and that should be the
> determine_types pass.
>
> Implementing that triggered a compiler crash on test/const1.go because
> it permitted some erroneous constants to make it all the way to the
> backend.  Catch that case by checking whether we get a constant
> overflow error, and marking the expression invalid if we do.  This is
> a good change in any case, as previously we reported the same constant
> overflow error multiple times, and now we only report it once.
>
> This fixes GCC PR 78763.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.


In order to fix GCC PR 78763, I'd like to commit this patch to the GCC
6 branch.  I've bootstrapped the test on x86_64-pc-linux-gnu, and
verified that the Go tests continue to pass.

OK for GCC 6 branch?

Ian
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 243728)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -3639,8 +3639,12 @@
   if (expr->numeric_constant_value(&nc))
{
  Numeric_constant result;
- if (Unary_expression::eval_constant(op, &nc, loc, &result))
+ bool issued_error;
+ if (Unary_expression::eval_constant(op, &nc, loc, &result,
+ &issued_error))
return result.expression(loc);
+ else if (issued_error)
+   return Expression::make_error(this->location());
}
 }
 
@@ -3747,12 +3751,15 @@
 }
 
 // Apply unary opcode OP to UNC, setting NC.  Return true if this
-// could be done, false if not.  Issue errors for overflow.
+// could be done, false if not.  On overflow, issues an error and sets
+// *ISSUED_ERROR.
 
 bool
 Unary_expression::eval_constant(Operator op, const Numeric_constant* unc,
-   Location location, Numeric_constant* nc)
+   Location location, Numeric_constant* nc,
+   bool* issued_error)
 {
+  *issued_error = false;
   switch (op)
 {
 case OPERATOR_PLUS:
@@ -3897,7 +3904,12 @@
   mpz_clear(uval);
   mpz_clear(val);
 
-  return nc->set_type(unc->type(), true, location);
+  if (!nc->set_type(unc->type(), true, location))
+{
+  *issued_error = true;
+  return false;
+}
+  return true;
 }
 
 // Return the integral constant value of a unary expression, if it has one.
@@ -3908,8 +3920,9 @@
   Numeric_constant unc;
   if (!this->expr_->numeric_constant_value(&unc))
 return false;
+  bool issued_error;
   return Unary_expression::eval_constant(this->op_, &unc, this->location(),
-nc);
+nc, &issued_error);
 }
 
 // Return the type of a unary expression.
@@ -4539,13 +4552,15 @@
 
 // Apply binary opcode OP to LEFT_NC and RIGHT_NC, setting NC.  Return
 // true if this could be done, false if not.  Issue errors at LOCATION
-// as appropriate.
+// as appropriate, and sets *ISSUED_ERROR if it did.
 
 bool
 Binary_expression::eval_constant(Operator op, Numeric_constant* left_nc,
 Numeric_constant* right_nc,
-Location location, Numeric_constant* nc)
+Location location, Numeric_constant* nc,
+bool* issued_error)
 {
+  *issued_error = false;
   switch (op)
 {
 case OPERATOR_OROR:
@@ -4594,7 +4609,11 @@
 r = Binary_expression::eval_integer(op, left_nc, right_nc, location, nc);
 
   if (r)
-r = nc->set_type(type, true, location);
+{
+  r = nc->set_type(type, true, location);
+  if (!r)
+   *issued_error = true;
+}
 
   return r;
 }
@@ -4917,9 +4936,15 @@
else
  {
Numeric_constant nc;
+   bool issued_error;
if (!Binary_expression::eval_constant(op, &left_nc, &right_nc,
- location, &nc))
+ location, &nc,
+ &issued_error))
+ {
+   if (issued_error)
+ return Expression::make_error(location);
 return this;
+ }
return nc.expression(location);
  }
   }
@@ -5254,8 +5279,9 @@
   Numeric_constant right_nc;
   if (!this->right_->numeric_constant_value(&right_nc))
 return false;
+  bool issued_error;
   return Binary_expression::eval_constant(this->op_, &left_nc, &right_nc,
- this->location(), nc);
+   

Go patch committed: fix comments and field names to match current libgo sources

2016-12-15 Thread Ian Lance Taylor
This patch by Than McIntosh fixes some comments and field names in the
compiler to match the current libgo sources.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 243731)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-ae57b28b3caf1f6670e0f663235f1bf7655db870
+310862eb11ec0705f21a375c0dd16f46a8d901c1
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 243731)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -1471,8 +1471,8 @@ Type::convert_builtin_named_types(Gogo*
 }
 
 // Return the type of a type descriptor.  We should really tie this to
-// runtime.Type rather than copying it.  This must match commonType in
-// libgo/go/runtime/type.go.
+// runtime.Type rather than copying it.  This must match the struct "_type"
+// declared in libgo/go/runtime/type.go.
 
 Type*
 Type::make_type_descriptor_type()
@@ -1519,7 +1519,7 @@ Type::make_type_descriptor_type()
 
   // Forward declaration for the type descriptor type.
   Named_object* named_type_descriptor_type =
-   Named_object::make_type_declaration("commonType", NULL, bloc);
+   Named_object::make_type_declaration("_type", NULL, bloc);
   Type* ft = Type::make_forward_declaration(named_type_descriptor_type);
   Type* pointer_type_descriptor_type = Type::make_pointer_type(ft);
 
@@ -1565,7 +1565,7 @@ Type::make_type_descriptor_type()
   "ptrToThis",
   pointer_type_descriptor_type);
 
-  Named_type* named = Type::make_builtin_named_type("commonType",
+  Named_type* named = Type::make_builtin_named_type("_type",
type_descriptor_type);
 
   named_type_descriptor_type->set_type_value(named);
@@ -3882,7 +3882,7 @@ Function_type::do_type_descriptor(Gogo*
   vals->reserve(4);
 
   Struct_field_list::const_iterator p = fields->begin();
-  go_assert(p->is_field_name("commonType"));
+  go_assert(p->is_field_name("_type"));
   vals->push_back(this->type_descriptor_constructor(gogo,
RUNTIME_TYPE_KIND_FUNC,
name, NULL, true));
@@ -4395,7 +4395,7 @@ Pointer_type::do_type_descriptor(Gogo* g
   vals->reserve(2);
 
   Struct_field_list::const_iterator p = fields->begin();
-  go_assert(p->is_field_name("commonType"));
+  go_assert(p->is_field_name("_type"));
   vals->push_back(this->type_descriptor_constructor(gogo,
RUNTIME_TYPE_KIND_PTR,
name, methods, false));
@@ -5305,7 +5305,7 @@ Struct_type::do_type_descriptor(Gogo* go
   go_assert(methods == NULL || name == NULL);
 
   Struct_field_list::const_iterator ps = fields->begin();
-  go_assert(ps->is_field_name("commonType"));
+  go_assert(ps->is_field_name("_type"));
   vals->push_back(this->type_descriptor_constructor(gogo,
RUNTIME_TYPE_KIND_STRUCT,
name, methods, true));
@@ -6719,7 +6719,7 @@ Array_type::array_type_descriptor(Gogo*
   vals->reserve(3);
 
   Struct_field_list::const_iterator p = fields->begin();
-  go_assert(p->is_field_name("commonType"));
+  go_assert(p->is_field_name("_type"));
   vals->push_back(this->type_descriptor_constructor(gogo,
RUNTIME_TYPE_KIND_ARRAY,
name, NULL, true));
@@ -6758,7 +6758,7 @@ Array_type::slice_type_descriptor(Gogo*
   vals->reserve(2);
 
   Struct_field_list::const_iterator p = fields->begin();
-  go_assert(p->is_field_name("commonType"));
+  go_assert(p->is_field_name("_type"));
   vals->push_back(this->type_descriptor_constructor(gogo,
RUNTIME_TYPE_KIND_SLICE,
name, NULL, true));
@@ -7243,7 +7243,7 @@ Map_type::do_type_descriptor(Gogo* gogo,
   vals->reserve(12);
 
   Struct_field_list::const_iterator p = fields->begin();
-  go_assert(p->is_field_name("commonType"));
+  go_assert(p->is_field_name("_type"));
   vals->push_back(this->type_descriptor_constructor(gogo,
RUNTIME_TYPE_KIND_MAP,
name, NULL, true));
@@ -7681,7 +7681,7 @@ Channel_type::do_type_descriptor(Gogo* g
   vals->reserve(3);
 
   Struct_field_list::const_iterator p = fields->begin();
-  go_assert(p->is_field_name

Re: [PATCH] PR59170 make pretty printers check for singular iterators

2016-12-15 Thread Jonathan Wakely

On 15/12/16 22:19 +0100, Jan Kratochvil wrote:

On Thu, 15 Dec 2016 15:18:17 +0100, Jonathan Wakely wrote:

I'm going to add Xmethods for all our iterator types so that it will
always be possible to do "print *iter", so if GDB supports Xmethods
then we don't need to register the iterator printers.


Just with the GDB 'compile' project (libcc1) which is planned to be used for
all GDB expressions evalation the Xmethods will no longer work.


Ah.

But then *it can just get compiled, so it will still work, right?

The only reason it doesn't work today is that the definition for
operator* might not be in the executable, but if you can compile a new
definition that doesn't matter.



[PATCH 0/4] Improve DSE implementation

2016-12-15 Thread Jeff Law
This is a 4 part patchkit to address various deficiencies in our DSE 
implementation.


BZ33562 was the inspiration for this work.  33562 is a low priority 
regression that's been around for a long time.  Patch #1 addresses 
33562, "aggregate DSE disabled" and also implements trimming of complex 
assignment when just one half of it is dead.


The discussions last year with Richi, reviewing of bugs in both LLVM and 
GCC's databases and code instrumentation resulted in patches 2-4.


Patch #2 implements trimming of CONSTRUCTOR initializations.  This is 
61912/77485.  This gets the most static hits of all the improvements.


Patch #3 implements trimming of mem* calls.  We trim from the front or 
back of the store.This doesn't hit as much as #2, but still happens 
quite often.  There is no BZ for this deficiency.


Patch #4 adds the ability to look through loads which may read from the 
same memory as the potentially dead store, but which can be proven only 
read from currently dead bytes within the object.  This hits just once 
in the compiler & runtime libraries.  But it does hit often in the 
libstdc++ testsuite.  There is no BZ for this deficiency.



There's dependencies as we walk forward in the patch kits.  Each patch 
has been bootstrapped & tested with its previous patch(es).


There is much more that could be done beyond the series of 4 patches in 
this patchkit.  Richi has pointed out that SRA and DSE could probably 
share a lot of analysis and transformation code.  There may even be 
advantages to having the two optimizations integrated into a single 
pass.  I haven't investigated any of that yet (though we are using a bit 
of code from SRA in this kit).


We also need to look at store sinking again.  I saw a patch from Richi 
back in July looked reasonable at a high level and would likely allow 
resolution of a multiple BZs.


Jeff


[RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-15 Thread Jeff Law


This is the first of the 4 part patchkit to address deficiencies in our 
DSE implementation.



This patch addresses the P2 regression 33562 which has been a low 
priority regression since gcc-4.3.  To summarize, DSE no longer has the 
ability to detect an aggregate store as dead if subsequent stores are 
done in a piecemeal fashion.


I originally tackled this by changing how we lower complex objects. 
That was sufficient to address 33562, but was reasonably rejected.


This version attacks the problem by improving DSE to track stores to 
memory at a byte level.  That allows us to determine if a series of 
stores completely covers an earlier store (thus making the earlier store 
dead).


A useful side effect of this is we can detect when parts of a store are 
dead and potentially rewrite the store.  This patch implements that for 
complex object initializations.  While not strictly part of 33562, it's 
so closely related that I felt it belongs as part of this patch.


This originally limited the size of the tracked memory space to 64 
bytes.  I bumped the limit after working through the CONSTRUCTOR and 
mem* trimming patches.  The 256 byte limit is still fairly arbitrary and 
I wouldn't lose sleep if we throttled back to 64 or 128 bytes.


Later patches in the kit will build upon this patch.  So if pieces look 
like skeleton code, that's because it is.



Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
PR tree-optimization/33562
* params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
* tree-ssa-dse.c: Include params.h.
(initialize_ao_ref_for_dse): New, partially extracted from
dse_optimize_stmt.
(valid_io_ref_for_dse): New.
(clear_bytes_written_by, trim_complex_store): Likewise.
(trim_partially_dead_store): Likewise.
(dse_partially_dead_store_p): Track what bytes were originally stored
into memory by the statement as well as the subset of bytes that
are still live.   If we "fail", but have identified the store as
partially dead, try to rewrite it to store fewer bytes of data.
Exit the main loop if we find a full kill as a single statement
or via group of statements.
(dse_optimize_stmt): Use initialize_ao_ref_for_dse.


* gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
* gcc.dg/tree-ssa/complex-5.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-18.c: New test.
* gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.

diff --git a/gcc/params.def b/gcc/params.def
index 50f75a7..ddc3d65 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
 "Average number of iterations of a loop.",
 10, 1, 0)
 
+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+"dse-max-object-size",
+"Maximum size (in bytes) of objects tracked by dead store 
elimination.",
+256, 0, 0)
+
 DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
 "scev-max-expr-size",
 "Bound on size of expressions used in the scalar evolutions analyzer.",
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
index 87a2638..3155741 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
@@ -10,4 +10,4 @@ int f(void)
   return g(&t);
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
index e2cd403..e6d027f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
@@ -8,4 +8,4 @@ int f(void)
  __imag__ t = 2;
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
new file mode 100644
index 000..92b2df8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+int g(_Complex int*);
+int f(void)
+{
+  _Complex int t = 0;
+  int i, j;
+ __imag__ t += 2;
+  return g(&t);
+}
+
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
new file mode 100644
index 000..718b746
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+

[RFA][PATCH 3/4] Trim mem* calls in DSE

2016-12-15 Thread Jeff Law

This is the 3rd patch in the kit to improve our DSE implementation.

This patch supports trimming of the head or tail of a memset, memcpy or 
memmove call.  It's conceptually similar to trimming CONSTRUCTORS (and 
was in fact developed first).


Try as I might, I couldn't find a BZ in our database that would be 
resolved by this patch.  There's BZs in the LLVM database in this space, 
but I didn't actually test those.  With that in mind, I don't think I 
can strictly call this a bugfix.  It does represent closing a deficiency 
when compared to LLVM.So while I'd like to see it go onto the trunk, 
I won't lose sleep if we defer to gcc-8.


Note this patch relies on the alignment tweak I mentioned in the 
discussion of patch #2 to avoid creating code that the strlen folding 
optimization can't optimize.  The code is still correct/valid, it's just 
in a form that the strlen folders don't grok.


This includes a trivial test that I used for development purposes.  It 
hits fairly often building GCC itself.  If we wanted more coverage i the 
testsuite, I could extract some tests from GCC and reduce them.


This patch has (of course) been bootstrapped and regression tested on 
x86_64-linux-gnu.  OK for the trunk or defer to gcc-8?


* tree-ssa-dse.c (need_ssa_update): New file scoped boolean.
(decrement_count): New function.
(increment_start_addr, trim_memstar_call): Likewise.
(trim_partially_dead_store): Call trim_memstar_call.
(pass_dse::execute): Initialize need_ssa_update.  If set, then
return TODO_ssa_update.

* gcc.dg/tree-ssa/ssa-dse-25.c: New test.
 
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 1482c7f..b21b9b5 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -79,6 +80,10 @@ static bitmap need_eh_cleanup;
It is always safe to return FALSE.  But typically better optimziation
can be achieved by analyzing more statements.  */
 
+/* If trimming stores requires insertion of new statements, then we
+   will need an SSA update.  */
+static bool need_ssa_update;
+
 static bool
 initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
 {
@@ -309,6 +314,113 @@ trim_constructor_store (bitmap orig, bitmap live, gimple 
*stmt)
 }
 }
 
+/* STMT is a memcpy, memmove or memset.  Decrement the number of bytes
+   copied/set by DECREMENT.  */
+static void
+decrement_count (gimple *stmt, int decrement)
+{
+  tree *countp = gimple_call_arg_ptr (stmt, 2);
+  gcc_assert (TREE_CODE (*countp) == INTEGER_CST);
+  tree x = fold_build2 (MINUS_EXPR, TREE_TYPE (*countp), *countp,
+   build_int_cst (TREE_TYPE (*countp), decrement));
+  *countp = x;
+}
+
+static void
+increment_start_addr (gimple *stmt ATTRIBUTE_UNUSED, tree *where, int 
increment)
+{
+  /* If the address wasn't initially a MEM_REF, make it a MEM_REF.  */
+  if (TREE_CODE (*where) == ADDR_EXPR
+  && TREE_CODE (TREE_OPERAND (*where, 0)) != MEM_REF)
+{
+  tree t = TREE_OPERAND (*where, 0);
+  t = build_ref_for_offset (EXPR_LOCATION (t), t,
+   increment * BITS_PER_UNIT, false,
+   ptr_type_node, NULL, false);
+  *where = build_fold_addr_expr (t);
+  return;
+}
+  else if (TREE_CODE (*where) == SSA_NAME)
+{
+  tree tem = make_ssa_name (TREE_TYPE (*where));
+  gassign *newop
+= gimple_build_assign (tem, POINTER_PLUS_EXPR, *where,
+  build_int_cst (sizetype, increment));
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gsi_insert_before (&gsi, newop, GSI_SAME_STMT);
+  need_ssa_update = true;
+  *where = tem;
+  update_stmt (gsi_stmt (gsi));
+  return;
+}
+
+  /* We can just adjust the offset in the MEM_REF expression.  */
+  tree x1 = TREE_OPERAND (TREE_OPERAND (*where, 0), 1);
+  tree x = fold_build2 (PLUS_EXPR, TREE_TYPE (x1), x1,
+   build_int_cst (TREE_TYPE (x1), increment));
+  TREE_OPERAND (TREE_OPERAND (*where, 0), 1) = x;
+
+}
+
+/* STMT is builtin call that writes bytes in bitmap ORIG, some bytes are dead
+   (ORIG & ~NEW) and need not be stored.  Try to rewrite STMT to reduce
+   the amount of data it actually writes.
+
+   Right now we only support trimming from the head or the tail of the
+   memory region.  In theory we could split the mem* call, but it's
+   likely of marginal value.  */
+
+static void
+trim_memstar_call (bitmap orig, bitmap live, gimple *stmt)
+{
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
+{
+case BUILT_IN_MEMCPY:
+case BUILT_IN_MEMMOVE:
+  {
+   int head_trim, tail_trim;
+   compute_trims (orig, live, &head_trim, &tail_trim);
+
+   /* Tail trimming is easy, we can just reduce the count.  */
+if (tail_trim)
+ decrement_count (stmt, tail_trim);
+
+   /* Head trimming requires adjusting all the arguments.  */
+if (head_trim)
+  {
+   tree *dst = gimple_c

[RFA][PR tree-optimization/61912] [PATCH 2/4] Trimming CONSTRUCTOR stores in DSE

2016-12-15 Thread Jeff Law

This is the second patch in the kit to improve our DSE implementation.

This patch recognizes when a CONSTRUCTOR assignment could be trimmed at 
the head or tail because those bytes are dead.


The first implementation of this turned the CONSTRUCTOR into a memset. 
This version actually rewrites the RHS and LHS of the CONSTRUCTOR 
assignment.


You'll note that the implementation computes head and tail trim counts, 
then masks them to an even byte count.  We might even consider masking 
off the two low bits in the counts.  This masking keeps higher 
alignments on the CONSTRUCTOR remnant which helps keep things efficient 
when the CONSTRUCTOR results in a memset call.


This patch hits a lot statically in GCC and the testsuite.  There were 
hundreds of hits in each.


There may be some room for tuning.  Trimming shouldn't ever result in 
poorer performance, but it may also not result in any measurable gain 
(it depends on how much gets trimmed relative to the size of the 
CONSTRUCTOR node and how the CONSTRUCTOR node gets expanded, the 
processor's capabilities for merging stores internally, etc etc).  I 
suspect the main benefit comes when the CONSTRUCTOR collapses down to 
some thing small that gets expanded inline, thus exposing the internals 
to the rest of the optimization pipeline.


We could, in theory, split the CONSTRUCTOR to pick up dead bytes in the 
middle of the CONSTRUCTOR.  I haven't looked to see how applicable that 
is in real code and what the cost/benefit analysis might look like.



Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
PR tree-optimization/61912
PR tree-optimization/77485
* tree-sra.h: New file.
* ipa-cp.c: Include tree-sra.h
* ipa-prop.h (build_ref_for_offset): Remove prototype.
* tree-ssa-dse.c: Include expr.h and tree-sra.h.
(compute_trims, trim_constructor_store): New functions.
(trim_partially_dead_store): Call trim_constructor_store.


* g++.dg/tree-ssa/ssa-dse-1.C: New test.
* gcc.dg/tree-ssa/pr30375: Adjust expected output.
* gcc.dg/tree-ssa/ssa-dse-24.c: New test.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4ec7cc5..772dd68 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -122,6 +122,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-inline.h"
 #include "ipa-utils.h"
 #include "tree-ssa-ccp.h"
+#include "tree-sra.h"
 
 template  class ipcp_value;
 
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 0e75cf4..6d7b480 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -820,10 +820,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree 
**, bool *,
 void ipa_release_body_info (struct ipa_func_body_info *);
 tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
 
-/* From tree-sra.c:  */
-tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
-  gimple_stmt_iterator *, bool);
-
 /* In ipa-cp.c  */
 void ipa_cp_c_finalize (void);
 
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
new file mode 100644
index 000..f928947
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c++14 -O -fdump-tree-dse1-details" } */
+
+using uint = unsigned int;
+
+template
+struct FixBuf
+{
+   C buf[S] = {};
+};
+
+template
+struct OutBuf
+{
+   C*  cur;
+   C*  end;
+   C*  beg;
+
+   template
+   constexpr
+   OutBuf(FixBuf& b) : cur{b.buf}, end{b.buf + S}, beg{b.buf} { }
+
+   OutBuf(C* b, C* e) : cur{b}, end{e} { }
+   OutBuf(C* b, uint s) : cur{b}, end{b + s} { }
+
+   constexpr
+   OutBuf& operator<<(C v)
+   {
+   if (cur < end) {
+   *cur = v;
+   }
+   ++cur;
+   return *this;
+   }
+
+   constexpr
+   OutBuf& operator<<(uint v)
+   {
+   uint q = v / 10U;
+   uint r = v % 10U;
+   if (q) {
+   *this << q;
+   }
+   *this << static_cast(r + '0');
+   return *this;
+   }
+};
+
+template
+struct BufOrSize
+{
+   template
+   static constexpr auto Select(FixBuf& fb, OutBuf&)
+   {
+   return fb;
+   }
+};
+
+template<>
+struct BufOrSize
+{
+   template
+   static constexpr auto Select(FixBuf&, OutBuf& ob)
+   {
+   return ob.cur - ob.beg;
+   }
+};
+
+// if BOS=1, it will return the size of the generated data, else the data 
itself
+template
+constexpr
+auto fixbuf()
+{
+   FixBuf fb;
+   OutBuf ob{fb};
+   for (uint i = 0; i <= N; ++i) {
+   ob << i << static_cast(i == N ? 0 : ' ');
+   }
+   return BufOrSize::Select(fb, ob);
+}
+
+auto foo()
+{
+   constexpr auto x = fixbuf<13, 200>();
+   return x;
+}
+
+auto foo_sized()
+{
+   

  1   2   >