Re: [PATCH 1/2, i386] cmpstrnsi needs string length

2016-11-03 Thread Uros Bizjak
Hello!

> This patch adds a test to the cmpstrnsi pattern in i386.md so that it
> will bail out (FAIL) if neither of the strings is a constant string. It
> can only work as a proper strncmp if the length is not longer than both
> of the strings. This change is required if expand_builtin_strncmp is
> going to try expansion of strncmp when neither string argument is
> constant.

The patch needs a ChangeLog entry, but otherwise looks good.

Uros.


[patch, avr] Make pmem-wrap-around option as default

2016-11-03 Thread Pitchumani Sivanupandi
Most of the AVR's 8k memorydevices have only rjmp instruction, not jmp. 
So, it
is important to wrap around jump destination to check if it can reach 
backwards.


Currently link specs passes --pmem-wrap-around=xxK when mrelax and
mpmem-wrap-around options are enabled. Attached patch changes the specs 
so that

option --pmem-wrap-around=8K is passed for 8k memory devices if
-mno-pmem-wrap-around is not enabled.

If OK, could someone commit please?

Note: Currently 8k devices are identified based on name prefix. We are 
working

on alternative method to incorporate flash memory size.

Regards,
Pitchumani

gcc/ChangeLog

2016-11-03  Pitchumani Sivanupandi  

* config/avr/gen-avr-mmcu-specs.c (print_mcu): Update 
link_pmem_wrap spec

string to add linker option --pmem-wrap-around=8k.
* config/avr/specs.h: Add link_pmem_wrap to linker specs (LINK_SPEC).
diff --git a/gcc/config/avr/gen-avr-mmcu-specs.c b/gcc/config/avr/gen-avr-mmcu-specs.c
index 7fca756..41dbd80 100644
--- a/gcc/config/avr/gen-avr-mmcu-specs.c
+++ b/gcc/config/avr/gen-avr-mmcu-specs.c
@@ -220,7 +220,9 @@ print_mcu (const avr_mcu_t *mcu)
 : 0;
 
   fprintf (f, "*link_pmem_wrap:\n");
-  if (wrap_k)
+  if (wrap_k == 8)
+fprintf (f, "\t%%{!mno-pmem-wrap-around: --pmem-wrap-around=8k}");
+  else if (wrap_k > 8)
 fprintf (f, "\t%%{mpmem-wrap-around: --pmem-wrap-around=%dk}", wrap_k);
   fprintf (f, "\n\n");
 
diff --git a/gcc/config/avr/specs.h b/gcc/config/avr/specs.h
index 52763cc..fbf0ce6 100644
--- a/gcc/config/avr/specs.h
+++ b/gcc/config/avr/specs.h
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
   "%(link_data_start) " \
   "%(link_text_start) " \
   "%(link_relax) "  \
+  "%(link_pmem_wrap) "  \
   "%{shared:%eshared is not supported} "
 
 #undef  LIB_SPEC


Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-03 Thread Richard Biener
On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool
 wrote:
> On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
>> >> I don't believe it needs a cleanup on every iteration. One cleanup at
>> >> the end should work fine.
>> >
>> > But as the comment there says:
>> >
>> >   /* Merge the duplicated blocks into predecessors, when possible.  */
>> >   cleanup_cfg (0);
>> >
>> > (this is not a new comment), and without merging blocks this whole
>> > patch does zilch.
>> >
>> > There is no need to create new basic blocks at all (can replace the
>> > final branch in a block with a copy of the whole block it jumps to,
>> > instead); and then it is painfully obvious that switching to layout
>> > mode here is pointless, too.
>> >
>> > Do you want me to do that?
>> >
>> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
>> > length allowed of the block to copy) linear.  Unless there are unboundedly
>> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
>>
>> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) 
>> that
>> walk the whole function.  So unless the number of iterations is bound
>> with a constant I call this quadratic (in function size).
>
> It is bounded (with that caveat above): new candidates will be bigger than
> the block merged into it, so there won't be more than
> PARAM_MAX_GOTO_DUPLICATION_INSNS passes.
>
> But I can make it all work simpler, in non-layout mode, if you prefer.

Iteration is fine but we should restrict where we look for new
candidates.  Likewise
block merging opportunities should be easier to exploit than by
running cfg-cleanup...

I'm just thinking of those gigantic machine-generated state machines where we
ATM do quite well (at least at -O1 ...) with respect to compile-time.

Richard.

>
> Segher


[Patch, testsuite] Fix bogus PR 78170 failure for avr

2016-11-03 Thread Senthil Kumar Selvaraj
Hi,

  The below patch requires int32plus for
  gcc.c-torture/execute/pr78170.c, as it has int bitfields that are more
  than 16 bits wide.

  Committed to trunk as obvious.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-11-03  Senthil Kumar Selvaraj  

* gcc.c-torture/execute/pr78170.c: Require int32plus.



Index: gcc.c-torture/execute/pr78170.c
===
--- gcc.c-torture/execute/pr78170.c (revision 241808)
+++ gcc.c-torture/execute/pr78170.c (working copy)
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target int32plus } */
+
 /* PR tree-optimization/78170.
Check that sign-extended store to a bitfield
doesn't overwrite other fields.  */


[Patch, testsuite] Add new effective-target_store_merge

2016-11-03 Thread Senthil Kumar Selvaraj
Hi,

  The below patch adds a new effective target keyword (store_merge) for
  use in the store_merging_xxx.c tests.

  The tests currently require non_strict_align, but they still fail for the avr.
  Eyeballing the dump, I found that the pass doesn't attempt merging as it is
  unprofitable for a target like the avr which has only single byte
  stores.

  I figured store merging is unlikely to be profitable for targets with
  smallish word sizes, and added a check_effective_target_store_merge
  that combines non_strict_align and int32plus.

  Is this ok for trunk?

Regards
Senthil

gcc/testsuite/ChangeLog

2016-11-03  Senthil Kumar Selvaraj  

* gcc.dg/store_merging_1.c: Require store_merge.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise. 
* gcc.dg/store_merging_6.c: Likewise.
* gcc.dg/store_merging_7.c: Likewise.
* gcc.dg/store_merging_8.c: Likewise.
* lib/target-supports.exp (check_effective_target_store_merge): New.


Index: gcc/testsuite/gcc.dg/store_merging_1.c
===
--- gcc/testsuite/gcc.dg/store_merging_1.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_1.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 struct bar {
Index: gcc/testsuite/gcc.dg/store_merging_2.c
===
--- gcc/testsuite/gcc.dg/store_merging_2.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_2.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 struct bar
Index: gcc/testsuite/gcc.dg/store_merging_4.c
===
--- gcc/testsuite/gcc.dg/store_merging_4.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_4.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 /* Check that we can merge interleaving stores that are guaranteed
Index: gcc/testsuite/gcc.dg/store_merging_5.c
===
--- gcc/testsuite/gcc.dg/store_merging_5.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_5.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 /* Make sure that non-aliasing non-constant interspersed stores do not
Index: gcc/testsuite/gcc.dg/store_merging_6.c
===
--- gcc/testsuite/gcc.dg/store_merging_6.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_6.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 /* Check that we can widen accesses to bitfields.  */
Index: gcc/testsuite/gcc.dg/store_merging_7.c
===
--- gcc/testsuite/gcc.dg/store_merging_7.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_7.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 /* Check that we can merge consecutive array members through the pointer.
Index: gcc/testsuite/gcc.dg/store_merging_8.c
===
--- gcc/testsuite/gcc.dg/store_merging_8.c  (revision 241808)
+++ gcc/testsuite/gcc.dg/store_merging_8.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target non_strict_align } */
+/* { dg-require-effective-target store_merge } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
 
 struct baz {
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 241808)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -8107,3 +8107,16 @@
 
 return [check_effective_target_divmod]
 }
+
+# Return 1 if store merging optimization is applicable for target.
+# Store merging is not profitable for targets like the avr which
+# can load/store only one byte at a time. Use int si

Re: [PATCH] Fix PR78185

2016-11-03 Thread Richard Biener
On Wed, 2 Nov 2016, Richard Biener wrote:

> 
> The following fixes PR78185 by properly honoring possibly infinite child
> loops when computing what blocks are always executed during loop invariant
> motion.  Such loops behave as if the loop would exit at this point.
> 
> Both GIMPLE and RTL level passes have that very same issue and the 
> following fixes the GIMPLE one and simply disables hoisting of possibly
> trapping or faulting instructions in the RTL pass ... Eric seems to
> remember this might regress gzip so I'm going to put this on one of
> our SPEC testers for tonights run as well.  Maybe somebody else

Nothing popped out, but ...

> wants to check the performance impact (I'm interested in both,
> GIMPLE and RTL change fallout for performance).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> If all goes well I'll followup with the obvious removal of no longer
> needed stuff in loop-invariant.c.
> 
> Another variant for RTL would be to simply treat all edges entering
> a child loop (not only those entering possibly infinitely looping ones)
> as an exit.  I think finite_loop_p has no RTL level variant (yet).

... that was easy enough to implement and should be slightly less
conservative.

Re-bootstrap / regtest running on x86_64-unknown-linux-gnu.  I'll
commit that if nobody objects until tomorrow.

Richard.

2016-11-03  Richard Biener  

PR middle-end/78185
* loop-invariant.c (find_exits): Record entering inner
loops as possibly exiting to handle infinite sub-loops.
* tree-ssa-loop-im.c: Include tree-ssa-loop-niter.h.
(fill_always_executed_in_1): Honor infinite child loops.

* gcc.dg/pr78185.c: New testcase.

Index: gcc/tree-ssa-loop-im.c
===
--- gcc/tree-ssa-loop-im.c  (revision 241795)
+++ gcc/tree-ssa-loop-im.c  (working copy)
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
 #include "trans-mem.h"
 #include "gimple-fold.h"
 #include "tree-scalar-evolution.h"
+#include "tree-ssa-loop-niter.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -2369,8 +2370,16 @@ fill_always_executed_in_1 (struct loop *
break;
 
  FOR_EACH_EDGE (e, ei, bb->succs)
-   if (!flow_bb_inside_loop_p (loop, e->dest))
- break;
+   {
+ /* If there is an exit from this BB.  */
+ if (!flow_bb_inside_loop_p (loop, e->dest))
+   break;
+ /* Or we enter a possibly non-finite loop.  */
+ if (flow_loop_nested_p (bb->loop_father,
+ e->dest->loop_father)
+ && ! finite_loop_p (e->dest->loop_father))
+   break;
+   }
  if (e)
break;
 
Index: gcc/loop-invariant.c
===
--- gcc/loop-invariant.c(revision 241795)
+++ gcc/loop-invariant.c(working copy)
@@ -598,13 +598,17 @@ find_exits (struct loop *loop, basic_blo
 
  FOR_EACH_EDGE (e, ei, body[i]->succs)
{
- if (flow_bb_inside_loop_p (loop, e->dest))
-   continue;
-
- bitmap_set_bit (may_exit, i);
- bitmap_set_bit (has_exit, i);
- outermost_exit = find_common_loop (outermost_exit,
-e->dest->loop_father);
+ if (! flow_bb_inside_loop_p (loop, e->dest))
+   {
+ bitmap_set_bit (may_exit, i);
+ bitmap_set_bit (has_exit, i);
+ outermost_exit = find_common_loop (outermost_exit,
+e->dest->loop_father);
+   }
+ /* If we enter a subloop that might never terminate treat
+it like a possible exit.  */
+ if (flow_loop_nested_p (loop, e->dest->loop_father))
+   bitmap_set_bit (may_exit, i);
}
  continue;
}
Index: gcc/testsuite/gcc.dg/pr78185.c
===
--- gcc/testsuite/gcc.dg/pr78185.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr78185.c  (revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do run { target *-*-linux* *-*-gnu* } } */
+/* { dg-options "-O" } */
+
+#include 
+#include 
+#include 
+
+static char var1 = 0L;
+static char *var2 = &var1;
+
+void do_exit (int i)
+{
+  exit (0);
+}
+
+int main(void)
+{
+  struct sigaction s;
+  sigemptyset (&s.sa_mask);
+  s.sa_handler = do_exit;
+  s.sa_flags = 0;
+  sigaction (SIGALRM, &s, NULL);
+  alarm (1);
+  /* The following loop is infinite, the division by zero should not
+ be hoisted out of it.  */
+  for (; (var1 == 0 ? 0 : (100 / var1)) == *var2; );
+  return 0;
+}


[PATCH] Add mcpu flag for Qualcomm falkor core

2016-11-03 Thread Siddhesh Poyarekar
This adds an mcpu option for the upcoming Qualcomm Falkor core.  This
is identical to the qdf24xx part that was added earlier and hence
retains the same tuning structure and continues to have the a57
pipeline for now.  The part number has also been changed and this
patch fixes this for both qdf24xx and falkor options.

Tested aarch64-linux-gnu and arm-linux-gnuabihf and did not find any
regressions resulting from this patch.

Siddhesh

* gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part
number.
(falkor): New core.
* gcc/config/aarch64/aarch64-tune.md: Regenerated.
* gcc/config/arm/arm-cores.def (falkor): New core.
* gcc/config/arm/arm-tables.opt: Regenerated.
* gcc/config/arm/arm-tune.md: Regenerated.
* gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support.
* gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor
support.
* gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor.
(ARM Options/-mtune): Add falkor.

---
 gcc/config/aarch64/aarch64-cores.def | 3 ++-
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/config/arm/arm-cores.def | 1 +
 gcc/config/arm/arm-tables.opt| 3 +++
 gcc/config/arm/arm-tune.md   | 2 +-
 gcc/config/arm/bpabi.h   | 2 ++
 gcc/config/arm/t-aprofile| 1 +
 gcc/doc/invoke.texi  | 9 +
 8 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index d9da257..0aad9a9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -46,7 +46,8 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  
AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
 AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
-AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0x800")
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0xC00")
+AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0xC00")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, "0x50", "0x000")
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 022b131..29afcdf 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 2072e1e..1a382e1 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -174,6 +174,7 @@ ARM_CORE("cortex-a72",  cortexa72, cortexa57,   8A, 
ARM_FSET_MAKE_CPU1 (FL_LDSCHED
 ARM_CORE("cortex-a73", cortexa73, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
(FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
 ARM_CORE("exynos-m1",  exynosm1,  exynosm1,8A, ARM_FSET_MAKE_CPU1 
(FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
 ARM_CORE("qdf24xx",qdf24xx,   cortexa57,   8A, ARM_FSET_MAKE_CPU1 
(FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
+ARM_CORE("falkor", falkor,cortexa57,   8A, ARM_FSET_MAKE_CPU1 
(FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
 ARM_CORE("xgene1",  xgene1,xgene1,  8A,ARM_FSET_MAKE_CPU1 
(FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
 
 /* V8 big.LITTLE implementations */
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index b92cb17..52ca9c3 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -328,6 +328,9 @@ EnumValue
 Enum(processor_type) String(exynos-m1) Value(exynosm1)
 
 EnumValue
+Enum(processor_type) String(falkor) Value(falkor)
+
+EnumValue
 Enum(processor_type) String(qdf24xx) Value(qdf24xx)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tu

[match.pd] Fix for PR35691

2016-11-03 Thread Prathamesh Kulkarni
Hi Richard,
The attached patch tries to fix PR35691, by adding the following two
transforms to match.pd:
(x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
(x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.

For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
it gets transformed to bit_and_expr
so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:

#if GENERIC
(for op (truth_andif truth_orif)
#elif GIMPLE
(for op (bit_and bit_ior)
#endif

Is that OK ?
Bootstrap+test running on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
2016-11-03  Prathamesh Kulkarni  

PR middle-end/35691
* match.pd: Add following two patterns:
(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.

testsuite/
* gcc.dg/pr35691-1.c: New test-case.
* gcc.dg/pr35691-2.c: Likewise.
* gcc.dg/pr35691-3.c: Likewise.
* gcc.dg/pr35691-4.c: Likewise.

diff --git a/gcc/match.pd b/gcc/match.pd
index 48f7351..65930bb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -519,6 +519,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (TYPE_UNSIGNED (type))
 (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1)
 
+/* PR35691: Transform
+   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
+   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
+
+#if GENERIC
+(for op (truth_andif truth_orif)
+#elif GIMPLE
+(for op (bit_and bit_ior)
+#endif
+ cmp (eq ne)
+ (simplify
+  (op (cmp @0 integer_zerop) (cmp @1 integer_zerop))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); }
+
 /* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
 (simplify
  (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1))
diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c
new file mode 100644
index 000..25a7ace
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-gimple" } */
+
+int foo1(int z0, unsigned z1) 
+{
+  return (z0 == 0) && (z1 == 0);
+}
+
+/* { dg-final { scan-tree-dump-not "z1.\[0-9\]*_\[0-9\]* = (int) z1" "gimple" 
} } */
diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c
new file mode 100644
index 000..5211f815
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 == 0);
+  int t1 = (z1 == 0);
+  int t2 = (t0 && t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) 
z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/pr35691-3.c b/gcc/testsuite/gcc.dg/pr35691-3.c
new file mode 100644
index 000..134bbdf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-gimple" } */
+
+int foo1(int z0, unsigned z1) 
+{
+  return (z0 != 0) || (z1 != 0);
+}
+
+/* { dg-final { scan-tree-dump-not "z1.\[0-9\]*_\[0-9\]* = (int) z1" "gimple" 
} } */
diff --git a/gcc/testsuite/gcc.dg/pr35691-4.c b/gcc/testsuite/gcc.dg/pr35691-4.c
new file mode 100644
index 000..90cbf6d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 != 0);
+  int t1 = (z1 != 0);
+  int t2 = (t0 || t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) 
z1_\[0-9\]*\\(D\\);" "forwprop1" } } */


Re: RFC: Warnings silenced when macros from system headers are used (PR c/78000, c/71613)

2016-11-03 Thread Marek Polacek
On Wed, Nov 02, 2016 at 01:39:22PM -0400, Jason Merrill wrote:
> On Wed, Nov 2, 2016 at 12:37 PM, Joseph Myers  wrote:
> > On Wed, 2 Nov 2016, Jason Merrill wrote:
> >
> >> It seems to me that the general principle is that we should consider
> >> the location where the thing we're warning about is happening.  In
> >>
> >>float_var = LLONG_MIN;
> >>
> >> The relevant location is that of the assignment, not the constant on
> >> the RHS.  In your ?: example, a simple answer would be to warn based
> >
> > I'm not sure we track locations well enough to handle that yet.
> >
> > Say you have an implicit conversion of a function argument to the type
> > from the prototype and something about that conversion should be warned
> > about.  Then you should obviously warn if the argument is a macro from a
> > system header but the call is outside a system header.  But say the
> > function call itself comes from a macro defined in a system header - you
> > should still warn if the user passed an argument of the wrong type, even
> > if that argument is a macro from a system header.
> >
> > That is:
> >
> > /* System header.  */
> > int __foo (int);
> > /* This sort of macro to avoid accidental interposition issues has been
> >discussed lately on libc-alpha, so it's a realistic example.  */
> > #define foo(x) (0 + __foo (x))
> > /* User code.  */
> > v = foo (NULL);
> >
> > should warn because the call to __foo logically results from user code
> > even though both NULL and foo are macros defined in system headers.  I'm
> > not sure what the locations look like here.
> 
> Sure, the problem here comes from the user combining the two macros.
> I suppose in this case you could notice that the macro expansions of
> 'foo' and 'NULL' are not nested.

That testcase is handled ok even without the patch:
x.c: In function ‘f’:
x.c:7:7: warning: passing argument 1 of ‘__foo’ makes integer from pointer 
without a cast [-Wint-conversion]
   v = foo (NULL);
   ^~~
In file included from x.c:1:0:
x.h:2:5: note: expected ‘int’ but argument is of type ‘void *’
 int __foo (int);
 ^

because convert_for_assignment already has the call to
expansion_point_location_if_in_system_header.

I will add that testcase to my patch.

Marek


RE: [PATCH,gcc/MIPS] Make loongson3a use fused madd.d

2016-11-03 Thread Matthew Fortune
Paul Hua  writes:
> Loongson3a has 4 operand fused madd instrcution. This patch set
> loongson3a use fused madd.d.

Hi Paul,

Thanks for the fix. I was vaguely aware that this was wrong for
loongson-3a but never confirmed it.

I suspect this change is mechanical enough that it can bypass
copyright assignment but I'd need a global maintainer to comment.

I've sent you copyright assignment paperwork separately.

Two comments on the patch:

> ChangeLog :
> 
> *** gcc/ChangeLog ***
> 
> 2016-11-03 Chenghua Xu 
> 
> config/mips/
> * mips.h: Set loongson3a use fused madd.d.

The changelog needs to reference what was changed rather than the
effect of the change:

* config/mips/mips.h (ISA_HAS_FUSED_MADD4): Enable for
TARGET_LOONGSON_3A.
(ISA_HAS_UNFUSED_MADD4): Exclude TARGET_LOONGSON_3A.


>diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>index 81862a9..5076a2b 100644
>--- a/gcc/config/mips/mips.h
>+++ b/gcc/config/mips/mips.h
>@@ -1056,11 +1056,11 @@ struct mips_cpu_info {
> 
> /* ISA has 4 operand fused madd instructions of the form
>'d = [+-] (a * b [+-] c)'.  */
>-#define ISA_HAS_FUSED_MADD4   TARGET_MIPS8000
>+#define ISA_HAS_FUSED_MADD4   (TARGET_MIPS8000 || TARGET_LOONGSON_3A)
> 
> /* ISA has 4 operand unfused madd instructions of the form
>'d = [+-] (a * b [+-] c)'.  */
>-#define ISA_HAS_UNFUSED_MADD4 (ISA_HAS_FP4 && !TARGET_MIPS8000)
>+#define ISA_HAS_UNFUSED_MADD4 (ISA_HAS_FP4 && !TARGET_MIPS8000 && 
>!TARGET_LOONGSON_3A)

Please split this line and move && !TARGET_LOONGSON_3A to the next line
under ISA_HAS_FP4.

> 
> /* ISA has 3 operand r6 fused madd instructions of the form
>'c = c [+-] (a * b)'.  */

Thanks,
Matthew



[PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-03 Thread Dominik Vogt
The attached patch fixes the stack layout problems on AIX and
Power as described here:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359

The patch has been bootstrapped on AIX (32 Bit) and bootstrappend
and regression tested on Power (biarch).  It needs more testing
that I cannot do with the hardware available to me.

If the patch is good, this one can be re-applied:

  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01730.html
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01616.html

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/rs6000/rs6000.c (rs6000_stack_info): Properly align local
variables in functions calling alloca.
* config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
Likewise.
* config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
Copy AIX specific versions of the rs6000.h macros to aix.h.
>From bd36042fd82e29204d2f10c180b9e7c27281eef2 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 28 Oct 2016 12:59:55 +0100
Subject: [PATCH] PR77359: Properly align local variables in functions
 calling alloca.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359 for a discussion of the
problem and the fix.
---
 gcc/config/rs6000/aix.h| 27 +++
 gcc/config/rs6000/rs6000.c |  9 +++--
 gcc/config/rs6000/rs6000.h | 14 --
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
index b254236..7773517 100644
--- a/gcc/config/rs6000/aix.h
+++ b/gcc/config/rs6000/aix.h
@@ -40,6 +40,33 @@
 #undef  STACK_BOUNDARY
 #define STACK_BOUNDARY 128
 
+/* Offset within stack frame to start allocating local variables at.
+   If FRAME_GROWS_DOWNWARD, this is the offset to the END of the
+   first local allocated.  Otherwise, it is the offset to the BEGINNING
+   of the first local allocated.
+
+   On the RS/6000, the frame pointer is the same as the stack pointer,
+   except for dynamic allocations.  So we start after the fixed area and
+   outgoing parameter area.  */
+
+#undef STARTING_FRAME_OFFSET
+#define STARTING_FRAME_OFFSET  \
+  (FRAME_GROWS_DOWNWARD
\
+   ? 0 \
+   : (cfun->calls_alloca   \
+  ? RS6000_ALIGN (crtl->outgoing_args_size + RS6000_SAVE_AREA, 16) \
+  : (RS6000_ALIGN (crtl->outgoing_args_size, 16) + RS6000_SAVE_AREA)))
+
+/* Offset from the stack pointer register to an item dynamically
+   allocated on the stack, e.g., by `alloca'.
+
+   The default value for this macro is `STACK_POINTER_OFFSET' plus the
+   length of the outgoing arguments.  The default is correct for most
+   machines.  See `function.c' for details.  */
+#undef STACK_DYNAMIC_OFFSET
+#define STACK_DYNAMIC_OFFSET(FUNDECL)  \
+   RS6000_ALIGN (crtl->outgoing_args_size + (STACK_POINTER_OFFSET), 16)
+
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f9e4739..02ed9c1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26004,8 +26004,13 @@ rs6000_stack_info (void)
   info->reg_size = reg_size;
   info->fixed_size   = RS6000_SAVE_AREA;
   info->vars_size= RS6000_ALIGN (get_frame_size (), 8);
-  info->parm_size= RS6000_ALIGN (crtl->outgoing_args_size,
-TARGET_ALTIVEC ? 16 : 8);
+  if (cfun->calls_alloca)
+info->parm_size  =
+  RS6000_ALIGN (crtl->outgoing_args_size + info->fixed_size,
+   STACK_BOUNDARY / BITS_PER_UNIT) - info->fixed_size;
+  else
+info->parm_size  = RS6000_ALIGN (crtl->outgoing_args_size,
+TARGET_ALTIVEC ? 16 : 8);
   if (FRAME_GROWS_DOWNWARD)
 info->vars_size
   += RS6000_ALIGN (info->fixed_size + info->vars_size + info->parm_size,
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 4b83abd..c11dc1b 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1728,9 +1728,12 @@ extern enum reg_class 
rs6000_constraints[RS6000_CONSTRAINT_MAX];
 #define STARTING_FRAME_OFFSET  \
   (FRAME_GROWS_DOWNWARD
\
? 0 \
-   : (RS6000_ALIGN (crtl->outgoing_args_size,  \
-   (TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8)\
-  + RS6000_SAVE_AREA))
+   : (cfun->calls_alloca   \
+  ? (RS6000_ALIGN (crtl->outgoing_args_size + RS6000_SAVE_AREA,\
+  (TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8 ))   \
+  : (RS6000_ALIGN (crtl->outgoing_args_size

Re: [match.pd] Fix for PR35691

2016-11-03 Thread Richard Biener
On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:

> Hi Richard,
> The attached patch tries to fix PR35691, by adding the following two
> transforms to match.pd:
> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.
> 
> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
> it gets transformed to bit_and_expr
> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:
> 
> #if GENERIC
> (for op (truth_andif truth_orif)
> #elif GIMPLE
> (for op (bit_and bit_ior)
> #endif
> 
> Is that OK ?

As you are not removing the fold-const.c variant I'd say you should
simply not look for truth_* and only handle GIMPLE.  Note that we
have tree-ssa-ifcombine.c which should handle the variant with
control-flow (but I guess it does not and your patch wouldn't help
it either).

The transform would also work for vectors (element_precision for
the test but also a value-matching zero which should ensure the
same number of elements).

Richard.


RE: [PATCH] [ARC] define SIZE_TYPE and PTRDIFF_TYPE correctly

2016-11-03 Thread Claudiu Zissulescu
Hi Vineet,

Thank you for your contribution.

> gcc/
> 2016-10-28  Vineet Gupta 
> 
> * config/arc/arc.h (SIZE_TYPE): define as unsigned int.
>   * (PTRDIFF_TYPE): define as int.
> 

Approved and committed. However,  the entry changelog line is not as expected 
and the patch didn't apply on the mainline sources seamlessly. Fix them for you 
:)

For ur reference: Committed r241812.

Best,
Claudiu


[PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need branch-likely instructions.

2016-11-03 Thread Toma Tabacu
Hi,

The gcc.target/mips/wrap-delay.c test was failing on mips-img-* toolchains
because it was using -mbranch-likely with an R6 target, and branch-likely
instructions were removed in R6.

This patch makes the testsuite downgrade to R5 if the -mbranch-likely option
is present and we're targeting R6.

Tested with mips-img-elf and mips-img-linux-gnu.

Regards,
Toma Tabacu

gcc/testsuite/
* gcc.target/mips/mips.exp: Add check for -mbranch-likely in
condition for R5 downgrade.

diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 7c24140..382d69c 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -1176,7 +1176,8 @@ proc mips-dg-options { args } {
   || [mips_have_test_option_p options "-mpaired-single"]
   || [mips_have_test_option_p options "-mnan=legacy"]
   || [mips_have_test_option_p options "-mabs=legacy"]
-  || [mips_have_test_option_p options "!HAS_LSA"]) } {
+  || [mips_have_test_option_p options "!HAS_LSA"]
+  || [mips_have_test_option_p options "-mbranch-likely"]) 
} {
if { $gp_size == 32 } {
mips_make_test_option options "-mips32r5"
} else {


Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2016-11-03 Thread Marek Polacek
On Tue, Nov 01, 2016 at 02:53:58PM +0100, Jakub Jelinek wrote:
> On Tue, Nov 01, 2016 at 09:41:20AM -0400, Jason Merrill wrote:
> > On Tue, Oct 25, 2016 at 9:59 AM, Marek Polacek  wrote:
> > > On Mon, Oct 24, 2016 at 04:10:21PM +0200, Marek Polacek wrote:
> > >> On Thu, Oct 20, 2016 at 12:28:36PM +0200, Marek Polacek wrote:
> > >> > I found a problem with this patch--we can't call 
> > >> > do_warn_duplicated_branches in
> > >> > build_conditional_expr, because that way the C++-specific codes might 
> > >> > leak into
> > >> > the hasher.  Instead, I should use operand_equal_p, I think.  Let me 
> > >> > rework
> > >> > that part of the patch.
> > 
> > Hmm, is there a reason not to use operand_equal_p for
> > do_warn_duplicated_branches as well?  I'm concerned about hash
> > collisions leading to false positives.
> 
> If the hashing function is iterative_hash_expr / inchash::add_expr, then
> that is supposed to pair together with operand_equal_p, we even have
> checking verification of that.

Yes, I use inchash::add_expr.

So any opinions as to what to do with this patch?

Marek


Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou
> The same approach can also be used on other targets without hardware
> divmod insn.

This should be made generic instead: can't we automatically create a divmod 
libfunc for double-word mode & define a default TARGET_EXPAND_DIVMOD_LIBFUNC?
This would save the code duplication in the ~50 back-ends...

-- 
Eric Botcazou


RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2016-11-03 Thread Andrew Bennett
Ping.


Regards,



Andrew

> -Original Message-
> From: Andrew Bennett
> Sent: 28 August 2015 16:50
> To: Matthew Fortune; Moore, Catherine; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard
> library support check the sysroot supports the required test options.
> 
> > I had some comments on this that I hadn't got round to posting. The fix in
> > this patch is not general enough as the missing header problem comes in
> > two (related) forms:
> >
> > 1) Using the new MTI and IMG sysroot layout we can end up with GCC looking
> >for headers in a sysroot that simply does not exist. The current patch
> >handles this.
> > 2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible
> >for the stdlib.h header to be found but the ABI dependent gnu-stubs
> >header may not be installed depending on soft/hard nan1985/nan2008.
> >
> > The test for stdlib.h needs to therefore verify that preprocessing succeeds
> > rather than just testing for an error relating to stdlib.h. This could be
> > done by adding a further option to mips_preprocess to indicate the processor
> > output should go to a file and that the caller wants the messages emitted
> > by the compiler instead.
> >
> > A second issue is that you have added (REQUIRES_STDLIB) to too many tests.
> > You only need to add it to tests that request a compiler option (via
> > dg-options) that could potentially lead to forcing soft/hard nan1985/nan2008
> > directly or indirectly. So -mips32r6 implies nan2008 so you need it -
> mips32r5
> > implies nan1985 so you need it. There are at least two tests which don't
> > need the option but you need to check them all so we don't run the check
> > needlessly.
> 
> The updated patch and ChangeLog that addresses Matthew's comments is below.
> 
> Ok to commit?
> 
> Regards,
> 
> 
> Andrew
> 
> 
> testsuite/
> 
>   * gcc.target/mips/loongson-simd.c (dg-options): Add
>   (REQUIRES_STDLIB).
>   * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto
>   * gcc.target/mips/mips-3d-1.c: Ditto
>   * gcc.target/mips/mips-3d-2.c: Ditto
>   * gcc.target/mips/mips-3d-3.c: Ditto
>   * gcc.target/mips/mips-3d-4.c: Ditto
>   * gcc.target/mips/mips-3d-5.c: Ditto
>   * gcc.target/mips/mips-3d-6.c: Ditto
>   * gcc.target/mips/mips-3d-7.c: Ditto
>   * gcc.target/mips/mips-3d-8.c: Ditto
>   * gcc.target/mips/mips-3d-9.c: Ditto
>   * gcc.target/mips/mips-ps-1.c: Ditto
>   * gcc.target/mips/mips-ps-2.c: Ditto
>   * gcc.target/mips/mips-ps-3.c: Ditto
>   * gcc.target/mips/mips-ps-4.c: Ditto
>   * gcc.target/mips/mips-ps-6.c: Ditto
>   * gcc.target/mips/mips16-attributes.c: Ditto
>   * gcc.target/mips/mips32-dsp-run.c: Ditto
>   * gcc.target/mips/mips32-dsp.c: Ditto
>   * gcc.target/mips/save-restore-1.c: Ditto
>   * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib.
>   (mips_preprocess): Add ignore_output argument that when set
>   will not return the pre-processed output.
>   (mips_arch_info): Update arguments for the call to
>   mips_preprocess.
>   (mips-dg-init): Ditto.
>   (mips-dg-options): Check if a test having test option
>   (REQUIRES_STDLIB) has the required sysroot support for
>   the current test options.
> 
> 
> 
> diff --git a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> index f57a18c..baed48c 100644
> --- a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> +++ b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> @@ -4,7 +4,7 @@
>  /* loongson.h does not handle or check for MIPS16ness.  There doesn't
> seem any good reason for it to, given that the Loongson processors
> do not support MIPS16.  */
> -/* { dg-options "isa=loongson -mhard-float -mno-mips16" } */
> +/* { dg-options "isa=loongson -mhard-float -mno-mips16 (REQUIRES_STDLIB)" }
> */
>  /* See PR 52155.  */
>  /* { dg-options "isa=loongson -mhard-float -mno-mips16 -mlong64" { mips*-*-
> elf* && ilp32 } } */
> 
> diff --git a/gcc/testsuite/gcc.target/mips/loongson-simd.c
> b/gcc/testsuite/gcc.target/mips/loongson-simd.c
> index 6d2ceb6..f263b43 100644
> --- a/gcc/testsuite/gcc.target/mips/loongson-simd.c
> +++ b/gcc/testsuite/gcc.target/mips/loongson-simd.c
> @@ -26,7 +26,7 @@ along with GCC; see the file COPYING3.  If not see
> because inclusion of some system headers e.g. stdint.h will fail due to
> not
> finding stubs-o32_hard.h.  */
>  /* { dg-require-effective-target mips_nanlegacy } */
> -/* { dg-options "isa=loongson -mhard-float -mno-micromips -mno-mips16 -flax-
> vector-conversions" } */
> +/* { dg-options "isa=loongson -mhard-float -mno-micromips -mno-mips16 -flax-
> vector-conversions (REQUIRES_STDLIB)" } */
> 
>  #include "loongson.h"
>  #include 
> diff --git a/gcc/testsuite/gcc.target/mips/mips-3d-1.c
> b/g

Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Bernd Schmidt
I'm concerned about the number of false positives for this warning, and 
judging by previous discussions, I'm not alone in this. This patch 
limits it to level 1 (any comment before the case label disables the 
warning) for cases where the user specified no explicit level. It'll 
still generate enough noise that people will be aware of it and can 
choose whether to use a higher level or not.


Bootstrapped and tested on x86_64-linux. Ok?


Bernd
c-family/
	* c.opt (Wimplicit-fallthrough=): Default to 1.

gcc/
	* common.opt (Wimplicit-fallthrough=): Default to 1.
	* doc/invoke.texi (-Wextra, -Wimplicit-fallthrough): Adjust
	documentation.

testsuite/
	* c-c++-comon/Wimplicit-fallthrough-1.c: Specify level for
	-Wimplicit-fallthrough
	* c-c++-comon/Wimplicit-fallthrough-10.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-12.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-16.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-19.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-2.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-22.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-25.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-26.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-3.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-4.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-5.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-7.c: Likewise.
	* c-c++-comon/Wimplicit-fallthrough-8.c: Likewise.
	* c-c++-comon/attr-fallthrough-1.c: Likewise.
	* g++.dg/cpp0x/fallthrough1.C: Likewise.
	* objc.dg/Wimplicit-fallthrough-1.m: Likewise.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt	(revision 241233)
+++ gcc/c-family/c.opt	(working copy)
@@ -541,7 +541,7 @@ C ObjC Var(warn_implicit) Warning LangEn
 Warn about implicit declarations.
 
 Wimplicit-fallthrough=
-LangEnabledBy(C ObjC C++ ObjC++,Wextra,3,0)
+LangEnabledBy(C ObjC C++ ObjC++,Wextra,1,0)
 ; in common.opt
 
 Wdouble-promotion
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 241233)
+++ gcc/common.opt	(working copy)
@@ -602,7 +602,7 @@ Common Var(warn_hsa) Init(1) Warning
 Warn when a function cannot be expanded to HSAIL.
 
 Wimplicit-fallthrough
-Common Alias(Wimplicit-fallthrough=,3,0) Warning
+Common Alias(Wimplicit-fallthrough=,1,0) Warning
 
 Wimplicit-fallthrough=
 Common Var(warn_implicit_fallthrough) RejectNegative Joined UInteger Warning
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 241233)
+++ gcc/doc/invoke.texi	(working copy)
@@ -3735,7 +3735,7 @@ name is still supported, but the newer n
 @gccoptlist{-Wclobbered  @gol
 -Wempty-body  @gol
 -Wignored-qualifiers @gol
--Wimplicit-fallthrough=3 @gol
+-Wimplicit-fallthrough=1 @gol
 -Wmissing-field-initializers  @gol
 -Wmissing-parameter-type @r{(C only)}  @gol
 -Wold-style-declaration @r{(C only)}  @gol
@@ -4107,7 +4107,7 @@ This warning is enabled by @option{-Wall
 @item -Wimplicit-fallthrough
 @opindex Wimplicit-fallthrough
 @opindex Wno-implicit-fallthrough
-@option{-Wimplicit-fallthrough} is the same as @option{-Wimplicit-fallthrough=3}
+@option{-Wimplicit-fallthrough} is the same as @option{-Wimplicit-fallthrough=1}
 and @option{-Wno-implicit-fallthrough} is the same as
 @option{-Wimplicit-fallthrough=0}.
 
Index: gcc/testsuite/c-c++-common/Wimplicit-fallthrough-1.c
===
--- gcc/testsuite/c-c++-common/Wimplicit-fallthrough-1.c	(revision 241233)
+++ gcc/testsuite/c-c++-common/Wimplicit-fallthrough-1.c	(working copy)
@@ -1,6 +1,6 @@
 /* PR c/7652 */
 /* { dg-do compile } */
-/* { dg-options "-Wimplicit-fallthrough" } */
+/* { dg-options "-Wimplicit-fallthrough=3" } */
 /* Test taken from
.  */
 
Index: gcc/testsuite/c-c++-common/Wimplicit-fallthrough-10.c
===
--- gcc/testsuite/c-c++-common/Wimplicit-fallthrough-10.c	(revision 241233)
+++ gcc/testsuite/c-c++-common/Wimplicit-fallthrough-10.c	(working copy)
@@ -1,6 +1,6 @@
 /* PR c/7652 */
 /* { dg-do compile } */
-/* { dg-options "-Wimplicit-fallthrough" } */
+/* { dg-options "-Wimplicit-fallthrough=3" } */
 
 extern void bar (int);
 
Index: gcc/testsuite/c-c++-common/Wimplicit-fallthrough-12.c
===
--- gcc/testsuite/c-c++-common/Wimplicit-fallthrough-12.c	(revision 241233)
+++ gcc/testsuite/c-c++-common/Wimplicit-fallthrough-12.c	(working copy)
@@ -1,6 +1,6 @@
 /* PR c/7652 */
 /* { dg-do compile } */
-/* { dg-options "-Wimplicit-fallthrough -O2" } */
+/* { dg-options "-Wimplicit-fallthrough=3 -O2" } */
 
 /* Don't let optimizations preclude the warning.  */
 
Index: gcc/testsuite/c-c++-common/Wimplicit-fallthrough-16.c
=

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Uros Bizjak
On Thu, Nov 3, 2016 at 12:29 PM, Eric Botcazou  wrote:
>> The same approach can also be used on other targets without hardware
>> divmod insn.
>
> This should be made generic instead: can't we automatically create a divmod
> libfunc for double-word mode & define a default TARGET_EXPAND_DIVMOD_LIBFUNC?
> This would save the code duplication in the ~50 back-ends...

libfunc is already implemented for all targets to use, there is also:

OPTAB_NC(sdivmod_optab, "divmod$a4", UNKNOWN)
OPTAB_NC(udivmod_optab, "udivmod$a4", UNKNOWN)

in optabs.def that can probably be changed  for generic expansion.

Uros.


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> I'm concerned about the number of false positives for this warning, and
> judging by previous discussions, I'm not alone in this. This patch limits it
> to level 1 (any comment before the case label disables the warning) for
> cases where the user specified no explicit level. It'll still generate
> enough noise that people will be aware of it and can choose whether to use a
> higher level or not.
> 
> Bootstrapped and tested on x86_64-linux. Ok?

I disagree, I'm ok with changing it to 2, but 1 is too much.

Jakub


Re: [PATCH,gcc/MIPS] Make loongson3a use fused madd.d

2016-11-03 Thread Paul Hua
Hi Matthew,

Thanks for your comments, update the patch.

*** gcc/ChangeLog ***

2016-11-03 Chenghua Xu 

* config/mips/mips.h (ISA_HAS_FUSED_MADD4): Enable for
TARGET_LOONGSON_3A.
(ISA_HAS_UNFUSED_MADD4): Exclude TARGET_LOONGSON_3A.

Thanks,
Paul

On Thu, Nov 3, 2016 at 6:31 PM, Matthew Fortune
 wrote:
> Paul Hua  writes:
>> Loongson3a has 4 operand fused madd instrcution. This patch set
>> loongson3a use fused madd.d.
>
> Hi Paul,
>
> Thanks for the fix. I was vaguely aware that this was wrong for
> loongson-3a but never confirmed it.
>
> I suspect this change is mechanical enough that it can bypass
> copyright assignment but I'd need a global maintainer to comment.
>
> I've sent you copyright assignment paperwork separately.
>
> Two comments on the patch:
>
>> ChangeLog :
>>
>> *** gcc/ChangeLog ***
>>
>> 2016-11-03 Chenghua Xu 
>>
>> config/mips/
>> * mips.h: Set loongson3a use fused madd.d.
>
> The changelog needs to reference what was changed rather than the
> effect of the change:
>
> * config/mips/mips.h (ISA_HAS_FUSED_MADD4): Enable for
> TARGET_LOONGSON_3A.
> (ISA_HAS_UNFUSED_MADD4): Exclude TARGET_LOONGSON_3A.
>
>
>>diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>>index 81862a9..5076a2b 100644
>>--- a/gcc/config/mips/mips.h
>>+++ b/gcc/config/mips/mips.h
>>@@ -1056,11 +1056,11 @@ struct mips_cpu_info {
>>
>> /* ISA has 4 operand fused madd instructions of the form
>>'d = [+-] (a * b [+-] c)'.  */
>>-#define ISA_HAS_FUSED_MADD4   TARGET_MIPS8000
>>+#define ISA_HAS_FUSED_MADD4   (TARGET_MIPS8000 || TARGET_LOONGSON_3A)
>>
>> /* ISA has 4 operand unfused madd instructions of the form
>>'d = [+-] (a * b [+-] c)'.  */
>>-#define ISA_HAS_UNFUSED_MADD4 (ISA_HAS_FP4 && !TARGET_MIPS8000)
>>+#define ISA_HAS_UNFUSED_MADD4 (ISA_HAS_FP4 && !TARGET_MIPS8000 && 
>>!TARGET_LOONGSON_3A)
>
> Please split this line and move && !TARGET_LOONGSON_3A to the next line
> under ISA_HAS_FP4.
>
>>
>> /* ISA has 3 operand r6 fused madd instructions of the form
>>'c = c [+-] (a * b)'.  */
>
> Thanks,
> Matthew
>
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 81862a9..2a7a3f2 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1056,11 +1056,13 @@ struct mips_cpu_info {
 
 /* ISA has 4 operand fused madd instructions of the form
'd = [+-] (a * b [+-] c)'.  */
-#define ISA_HAS_FUSED_MADD4	TARGET_MIPS8000
+#define ISA_HAS_FUSED_MADD4	(TARGET_MIPS8000 || TARGET_LOONGSON_3A)
 
 /* ISA has 4 operand unfused madd instructions of the form
'd = [+-] (a * b [+-] c)'.  */
-#define ISA_HAS_UNFUSED_MADD4	(ISA_HAS_FP4 && !TARGET_MIPS8000)
+#define ISA_HAS_UNFUSED_MADD4	(ISA_HAS_FP4\
+ && !TARGET_MIPS8000			\
+ && !TARGET_LOONGSON_3A)
 
 /* ISA has 3 operand r6 fused madd instructions of the form
'c = c [+-] (a * b)'.  */


Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-11-03 Thread Bernd Schmidt

On 09/14/2016 03:00 PM, Andrew Burgess wrote:

In an attempt to get this patch merged (as I still think that its
correct) I've investigated, and documented a little more about how I
think things currently work.  I'm sure most people reading this will
already know this, but hopefully, if my understanding is wrong someone
can point it out.
gcc/ChangeLog:

* gcc/bb-reorder.c: Remove 'toplev.h' include.
(pass_partition_blocks::gate): No longer check
user_defined_section_attribute, instead check the function decl
for a section attribute.
* gcc/c-family/c-common.c (handle_section_attribute): No longer
set user_defined_section_attribute.
* gcc/final.c (rest_of_handle_final): Likewise.
* gcc/toplev.c: Remove definition of user_defined_section_attribute.
* gcc/toplev.h: Remove declaration of
user_defined_section_attribute.

gcc/testsuiteChangeLog:

* gcc.dg/tree-prof/section-attr-1.c: New file.
* gcc.dg/tree-prof/section-attr-2.c: New file.
* gcc.dg/tree-prof/section-attr-3.c: New file.


I think the explanation is perfectly reasonable and the patch looks 
good, except:



+__attribute__((noinline))


Add noclone to all of these as well.


Bernd


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Marek Polacek
On Thu, Nov 03, 2016 at 12:58:55PM +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > I'm concerned about the number of false positives for this warning, and
> > judging by previous discussions, I'm not alone in this. This patch limits it
> > to level 1 (any comment before the case label disables the warning) for
> > cases where the user specified no explicit level. It'll still generate
> > enough noise that people will be aware of it and can choose whether to use a
> > higher level or not.
> > 
> > Bootstrapped and tested on x86_64-linux. Ok?
> 
> I disagree, I'm ok with changing it to 2, but 1 is too much.

Same here.  I'd support the level 2.

Marek


Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-03 Thread Kyrill Tkachov


On 02/11/16 11:36, Eric Botcazou wrote:

I think you're right. I suppose the new condition should be:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (!WORD_REGISTER_OPERATIONS

  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND

 ? val_signbit_known_set_p (inner_mode, nonzero)

 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   ||
   || !MEM_P (SUBREG_REG (x

#endif

Agreed.


Would you prefer me to make this change or just revert the patch?

Go ahead and make the change, but please do a bit of comment massaging in the
process, for example:

#ifdef LOAD_EXTEND_OP
   /* On many CISC machines, accessing an object in a wider mode
 causes the high-order bits to become undefined.  So they are
 not known to be zero.  */
   if (!WORD_REGISTER_OPERATIONS
 /* If this is a typical RISC machine, we only have to worry
about the way loads are extended.  */
   || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
  ? val_signbit_known_set_p (inner_mode, nonzero)
  : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
|| !MEM_P (SUBREG_REG (x
#endif
{
  if (GET_MODE_PRECISION (GET_MODE (x))
  > GET_MODE_PRECISION (inner_mode))
nonzero |= (GET_MODE_MASK (GET_MODE (x))
& ~GET_MODE_MASK (inner_mode));
}



Thanks, here is the patch doing this.
Committing to trunk after bootstrap and testing on x86_64.

Sorry for the trouble,
Kyrill
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 9107af0fda76fe2233bc5cf1e439b9971d6691f0..b655315f380f221abcf9176635441688d1756b1e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4568,18 +4568,18 @@ nonzero_bits1 (const_rtx x, machine_mode mode, const_rtx known_x,
 	  known_x, known_mode, known_ret);
 
 #ifdef LOAD_EXTEND_OP
-	  /* If this is a typical RISC machine, we only have to worry
-	 about the way loads are extended.  */
-	  if (WORD_REGISTER_OPERATIONS
-	  && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
+  /* On many CISC machines, accessing an object in a wider mode
+	 causes the high-order bits to become undefined.  So they are
+	 not known to be zero.  */
+	  if (!WORD_REGISTER_OPERATIONS
+	  /* If this is a typical RISC machine, we only have to worry
+		 about the way loads are extended.  */
+	  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 		 ? val_signbit_known_set_p (inner_mode, nonzero)
 		 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
 		   || !MEM_P (SUBREG_REG (x
 #endif
 	{
-	  /* On many CISC machines, accessing an object in a wider mode
-		 causes the high-order bits to become undefined.  So they are
-		 not known to be zero.  */
 	  if (GET_MODE_PRECISION (GET_MODE (x))
 		  > GET_MODE_PRECISION (inner_mode))
 		nonzero |= (GET_MODE_MASK (GET_MODE (x))


Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-03 Thread Kyrill Tkachov


On 03/11/16 12:05, Kyrill Tkachov wrote:


On 02/11/16 11:36, Eric Botcazou wrote:

I think you're right. I suppose the new condition should be:

#ifdef LOAD_EXTEND_OP
 /* If this is a typical RISC machine, we only have to worry
about the way loads are extended.  */
  if (!WORD_REGISTER_OPERATIONS

  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND

 ? val_signbit_known_set_p (inner_mode, nonzero)

 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   ||
   || !MEM_P (SUBREG_REG (x

#endif

Agreed.


Would you prefer me to make this change or just revert the patch?

Go ahead and make the change, but please do a bit of comment massaging in the
process, for example:

#ifdef LOAD_EXTEND_OP
   /* On many CISC machines, accessing an object in a wider mode
 causes the high-order bits to become undefined.  So they are
 not known to be zero.  */
   if (!WORD_REGISTER_OPERATIONS
 /* If this is a typical RISC machine, we only have to worry
about the way loads are extended.  */
   || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
  ? val_signbit_known_set_p (inner_mode, nonzero)
  : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
|| !MEM_P (SUBREG_REG (x
#endif
{
  if (GET_MODE_PRECISION (GET_MODE (x))
  > GET_MODE_PRECISION (inner_mode))
nonzero |= (GET_MODE_MASK (GET_MODE (x))
& ~GET_MODE_MASK (inner_mode));
}



Thanks, here is the patch doing this.
Committing to trunk after bootstrap and testing on x86_64.

Sorry for the trouble,
Kyrill


With the following ChangeLog entry:

2016-11-03  Kyrylo Tkachov  

* rtlanal.c (nonzero_bits1): Fix WORD_REGISTER_OPERATIONS condition.
Move comments into more natural position.



Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Eric Botcazou
>What's your decision on this?

I think that we ought to standardize on a single order for note copying in the 
RTL middle-end and the best way to enforce it is to have a single primitive in 
rtlanal.c, with an optional filtering.  Bernd's patch is a step in the right 
direction, but doesn't enforce the single order.  Maybe something based on a 
macro calling duplicate_reg_note, but not clear whether it's really better.

-- 
Eric Botcazou


[PATCH] Convert character arrays to string csts

2016-11-03 Thread Martin Liška
Hello.

This is small follow-up of the patches I sent to string folding.
The patch transforms character array defined in an initializer to string
constant:

+const char global[] = {'a', 'b', 'c', 'd', '\0'};

Apart from that, it also enables string folding of local symbols like:
+  const char local[] = "abcd";

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Martin
>From ecd2b614072f55e71896f7f5bf290072b3a4b04c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 17 Oct 2016 15:24:46 +0200
Subject: [PATCH] Convert character arrays to string csts

gcc/testsuite/ChangeLog:

2016-10-24  Martin Liska  

	* gcc.dg/tree-ssa/builtins-folding-gimple.c (main): Add new
	tests.

gcc/ChangeLog:

2016-10-24  Martin Liska  

	* cgraphunit.c (varpool_node::finalize_decl): Convert ctors
	to string constants if possible.
	* gimplify.c (gimplify_decl_expr): Emit INIT_EXPR just if it
	cannot be converted to a string constant.
	(gimplify_init_constructor): Create string constant for local
	variables (if possible).
	* tree.c (build_string_cst_from_ctor): New function.
	(build_string_literal): Add new argument.
	(can_convert_ctor_to_string_cst): New function.
	* tree.h: Declare new functions.
	* varpool.c (ctor_for_folding): Support automatic variables.
---
 gcc/cgraphunit.c   |  6 ++
 gcc/gimplify.c | 14 -
 .../gcc.dg/tree-ssa/builtins-folding-gimple.c  |  7 +++
 gcc/tree.c | 68 --
 gcc/tree.h |  6 +-
 gcc/varpool.c  |  8 ++-
 6 files changed, 100 insertions(+), 9 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index e315a77..bc041d6 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -791,6 +791,12 @@ process_function_and_variable_attributes (cgraph_node *first,
 void
 varpool_node::finalize_decl (tree decl)
 {
+  tree init = DECL_INITIAL (decl);
+  if (init
+  && TREE_READONLY (decl)
+  && can_convert_ctor_to_string_cst (init))
+DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
+
   varpool_node *node = varpool_node::get_create (decl);
 
   gcc_assert (TREE_STATIC (decl) || DECL_EXTERNAL (decl));
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 1531582..4520843 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1495,7 +1495,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 	{
 	  if (!TREE_STATIC (decl))
 	{
-	  DECL_INITIAL (decl) = NULL_TREE;
+	  if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST)
+		DECL_INITIAL (decl) = NULL_TREE;
 	  init = build2 (INIT_EXPR, void_type_node, decl, init);
 	  gimplify_and_add (init, seq_p);
 	  ggc_free (init);
@@ -4438,6 +4439,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	break;
 	  }
 
+	/* Replace a ctor with a string constant with possible.  */
+	if (TREE_READONLY (object)
+	&& VAR_P (object)
+	&& can_convert_ctor_to_string_cst (ctor))
+	  {
+	ctor = build_string_cst_from_ctor (ctor);
+	TREE_OPERAND (*expr_p, 1) = ctor;
+	DECL_INITIAL (object) = ctor;
+	break;
+	  }
+
 	/* Fetch information about the constructor to direct later processing.
 	   We might want to make static versions of it in various cases, and
 	   can only do so if it known to be a valid constant initializer.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
index 283bd1c..b2d1fd5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
@@ -4,12 +4,15 @@
 char *buffer1;
 char *buffer2;
 
+const char global[] = {'a', 'b', 'c', 'd', '\0'};
+
 #define SIZE 1000
 
 int
 main (void)
 {
   const char* const foo1 = "hello world";
+  const char local[] = "abcd";
 
   buffer1 = __builtin_malloc (SIZE);
   __builtin_strcpy (buffer1, foo1);
@@ -45,6 +48,10 @@ main (void)
 __builtin_abort ();
   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
 __builtin_abort ();
+  if (__builtin_memchr (global, null, 5) == 0)
+__builtin_abort ();
+  if (__builtin_memchr (local, null, 5) == 0)
+__builtin_abort ();
 
   __builtin_memchr (foo1, x, 11);
   __builtin_memchr (buffer1, x, zero);
diff --git a/gcc/tree.c b/gcc/tree.c
index 56cc653..dcf767f 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -1784,6 +1784,26 @@ build_vector_from_val (tree vectype, tree sc)
 }
 }
 
+/* Build string constant from a constructor CTOR.  */
+
+tree
+build_string_cst_from_ctor (tree ctor)
+{
+  unsigned HOST_WIDE_INT idx;
+  tree value;
+  vec *elts = CONSTRUCTOR_ELTS (ctor);
+  char *str = XNEWVEC (char, elts->length ());
+
+  FOR_EACH_CONSTRUCTOR_VALUE (elts, idx, value)
+str[idx] = (char)tree_to_uhwi (value);
+
+  tree init = build_string_literal (elts->length (),
+ggc_alloc_string (str, elts->l

Re: [PATCH][AArch64] Improve SHA1 scheduling

2016-11-03 Thread Wilco Dijkstra
Andrew Pinski wrote:
> On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra  
> wrote:
> > SHA1H instructions may be scheduled after a SHA1C instruction
> > that uses the same input register.  However SHA1C updates its input,
> > so if SHA1H is scheduled after it, it requires an extra move.
> > Increase the priority of SHA1H to ensure it gets scheduled
> > earlier, avoiding the move.
> >
> > Is this something the generic scheduler could do automatically for
> > instructions with RMW operands?
>
> I was thinking that but there are many of those on x86 so it might not
> make sense at all.
> Also the SIMD instruction FMLA has the similar problem most likely
> though I don't know if it make sense to do a similar thing for that
> one.

Indeed it doesn't always make sense. I've never seen the issue with
FMLA for example, so maybe a more generic mechanism isn't required.

> Maybe it makes sense to mark the instructions in the .md file with
> some attribute and then just check for that attribute here instead of
> special casing SHA1C.

That should be possible if adding attributes is not expensive in terms of
memory/compile time.

Wilco


Re: [PATCH] Create x.gcov file for binary w/o x.gcda file (PR, gcov-profile/65831)

2016-11-03 Thread Martin Liška
On 08/04/2016 02:52 PM, Nathan Sidwell wrote:
> On 08/04/16 08:27, Martin Liška wrote:
>> Hi.
>>
>> Following patch is grabbed from the PR, where I just applied the patch
>> and wrote a test-case which removes x.gcda file before running gcov tool.
>>
>> Ready to be installed?
> 
> 2016-08-04  Martin Liska  
> 
> * g++.dg/gcov/gcov-16.C: New test.
> * lib/gcov.exp: Support new argument for run-gcov function.
> 
> gcc/ChangeLog:
> 
> 2016-08-04  Martin Liska  
> Adam Fineman  
> 
> * gcov.c (process_file): Create .gcov file when .gcda
> file is missing.
> 
> ok thanks

Hi.

As mentioned by the reporter, it's desired patch for backport to GCC 5 and 6 
branches.

May I install the same patch after it survives regression tests?
Thanks,
Martin


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-03 Thread Bernd Schmidt

On 10/29/2016 07:47 AM, Bernd Edlinger wrote:


Find attached a new patch with test case.


Boot-strapped on x86_64-pc-linux-gnu.
Is it OK for trunk?


Ok.


Bernd



Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-03 Thread Eric Botcazou
> Thanks, here is the patch doing this.
> Committing to trunk after bootstrap and testing on x86_64.

Thanks for the quick turn around!

-- 
Eric Botcazou


Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou
> libfunc is already implemented for all targets to use, there is also:
> 
> OPTAB_NC(sdivmod_optab, "divmod$a4", UNKNOWN)
> OPTAB_NC(udivmod_optab, "udivmod$a4", UNKNOWN)
> 
> in optabs.def that can probably be changed  for generic expansion.

So what's the purpose of ix86_init_libfuncs if the libfuncs are already there?

-- 
Eric Botcazou


[PATCH][ARM] Fix ldrd offsets

2016-11-03 Thread Wilco Dijkstra
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.

ChangeLog:
2015-11-03  Wilco Dijkstra  

gcc/
* config/arm/arm.c (arm_legitimate_index_p): Add comment.
(thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
{
  HOST_WIDE_INT val = INTVAL (index);
 
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
  if (TARGET_LDRD)
return val > -256 && val < 256;
  else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
index, int strict_p)
   if (code == CONST_INT)
{
  HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
}
   else
return 0;

Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Jiong Wang

On 03/11/16 12:06, Eric Botcazou wrote:

What's your decision on this?

I think that we ought to standardize on a single order for note copying in the
RTL middle-end and the best way to enforce it is to have a single primitive in
rtlanal.c, with an optional filtering.  Bernd's patch is a step in the right
direction, but doesn't enforce the single order.  Maybe something based on a
macro calling duplicate_reg_note, but not clear whether it's really better.


Thanks for the feedback,  I'll try to work through this.

Regards,
Jiong


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Bernd Schmidt

On 11/03/2016 12:58 PM, Jakub Jelinek wrote:

On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:

I'm concerned about the number of false positives for this warning, and
judging by previous discussions, I'm not alone in this. This patch limits it
to level 1 (any comment before the case label disables the warning) for
cases where the user specified no explicit level. It'll still generate
enough noise that people will be aware of it and can choose whether to use a
higher level or not.

Bootstrapped and tested on x86_64-linux. Ok?


I disagree, I'm ok with changing it to 2, but 1 is too much.


Well, we have data from our own sources where we had to "fix" lots of 
perfectly good code, and also from the Linux kernel. From an earlier 
discussion:


Markus Trippelsdorf wrote:

I randomly looked at the differences and almost all additional
-Wimplicit-fallthrough=2 warnings are bogus (~5% are genuine).
And _all_ additional -Wimplicit-fallthrough=3 warnings appear
to be bogus.

[...]

Actually looking more closely it appears that all of the 136 additional
warnings for level 2 are bogus, too.


Also, levels above 1 enforce English as a language, which isn't 
something we should be doing, even if we could detect fallthrough 
comments reliably (and we can't).



Bernd


[PATCH] AIX xcoff declare object name

2016-11-03 Thread David Edelsohn
During Richi's debugging of the section type conflict, he found a
problem with rs6000_xcoff_declare_object_name trying to access a decl
that did not exist.  Fixed thusly.

* config/rs6000/rs6000.c (rs6000_xcoff_declare_object_name): Use
symtab_node::get_create.

Index: rs6000.c
===
--- rs6000.c(revision 241814)
+++ rs6000.c(working copy)
@@ -35414,8 +35414,8 @@ rs6000_xcoff_declare_object_name (FILE *file, cons
   struct declare_alias_data data = {file, false};
   RS6000_OUTPUT_BASENAME (file, name);
   fputs (":\n", file);
-  symtab_node::get (decl)->call_for_symbol_and_aliases (rs6000_declare_alias,
-   &data, true);
+  symtab_node::get_create (decl)->call_for_symbol_and_aliases
(rs6000_declare_alias,
+  &data, true);
 }

 /* Overide the default 'SYMBOL-.' syntax with AIX compatible 'SYMBOL-$'. */


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> >>I'm concerned about the number of false positives for this warning, and
> >>judging by previous discussions, I'm not alone in this. This patch limits it
> >>to level 1 (any comment before the case label disables the warning) for
> >>cases where the user specified no explicit level. It'll still generate
> >>enough noise that people will be aware of it and can choose whether to use a
> >>higher level or not.
> >>
> >>Bootstrapped and tested on x86_64-linux. Ok?
> >
> >I disagree, I'm ok with changing it to 2, but 1 is too much.
> 
> Well, we have data from our own sources where we had to "fix" lots of
> perfectly good code, and also from the Linux kernel. From an earlier
> discussion:

That data wasn't really convincing on this.  All it proved is that most of
the cases are (likely) deliberate fall-throughs without any comment
whatsoever, the rest is in the noise.  As one needs to deal with those
where comments are missing altogether, dealing with the noise is acceptable.

> Markus Trippelsdorf wrote:
> >>>I randomly looked at the differences and almost all additional
> >>>-Wimplicit-fallthrough=2 warnings are bogus (~5% are genuine).
> >>>And _all_ additional -Wimplicit-fallthrough=3 warnings appear
> >>>to be bogus.
> [...]
> >Actually looking more closely it appears that all of the 136 additional
> >warnings for level 2 are bogus, too.
> 
> Also, levels above 1 enforce English as a language, which isn't something we
> should be doing, even if we could detect fallthrough comments reliably (and
> we can't).

If you use non-english comment, then you need to add lint-like comments for
this, or attributes.  IMHO it is really not very high cost (compared to
all those without any comment at all) to adjust code, what takes time is to
analyze if something is intentional or invalid fallthrough.  The clearer the
comments are on that, the better.

Jakub


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Eric Botcazou
> Thanks for the feedback,  I'll try to work through this.

Thanks, but since there doesn't seem to be a consensus on the goal, feel free 
to disregard it and just implement what you need for your initial work.

-- 
Eric Botcazou


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-03 Thread Yuri Rumyantsev
Hi Richard,

I did not understand your last remark:

> That is, here (and avoid the FOR_EACH_LOOP change):
>
> @@ -580,12 +586,21 @@ vectorize_loops (void)
>   && dump_enabled_p ())
>   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>"loop vectorized\n");
> -   vect_transform_loop (loop_vinfo);
> +   new_loop = vect_transform_loop (loop_vinfo);
> num_vectorized_loops++;
>/* Now that the loop has been vectorized, allow it to be unrolled
>   etc.  */
>  loop->force_vectorize = false;
>
> +   /* Add new loop to a processing queue.  To make it easier
> +  to match loop and its epilogue vectorization in dumps
> +  put new loop as the next loop to process.  */
> +   if (new_loop)
> + {
> +   loops.safe_insert (i + 1, new_loop->num);
> +   vect_loops_num = number_of_loops (cfun);
> + }
>
> simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
f> unction which will set up stuff properly (and also perform
> the if-conversion of the epilogue there).
>
> That said, if we can get in non-masked epilogue vectorization
> separately that would be great.

Could you please clarify your proposal.

Thanks.
Yuri.

2016-11-02 15:27 GMT+03:00 Richard Biener :
> On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi All,
>>
>> I re-send all patches sent by Ilya earlier for review which support
>> vectorization of loop epilogues and loops with low trip count. We
>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> approved by Jeff.
>>
>> I did re-base of all patches and performed bootstrapping and
>> regression testing that did not show any new failures. Also all
>> changes related to new vect_do_peeling algorithm have been changed
>> accordingly.
>>
>> Is it OK for trunk?
>
> I would have prefered that the series up to -03-nomask-tails would
> _only_ contain epilogue loop vectorization changes but unfortunately
> the patchset is oddly separated.
>
> I have a comment on that part nevertheless:
>
> @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
>/* Check if we can possibly peel the loop.  */
>if (!vect_can_advance_ivs_p (loop_vinfo)
>|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> -  || loop->inner)
> +  || loop->inner
> +  /* Required peeling was performed in prologue and
> +is not required for epilogue.  */
> +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>  do_peeling = false;
>
>if (do_peeling
> @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
>
>do_versioning =
> optimize_loop_nest_for_speed_p (loop)
> -   && (!loop->inner); /* FORNOW */
> +   && (!loop->inner) /* FORNOW */
> +/* Required versioning was performed for the
> +  original loop and is not required for epilogue.  */
> +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>
>if (do_versioning)
>  {
>
> please do that check in the single caller of this function.
>
> Otherwise I still dislike the new ->aux use and I believe that simply
> passing down info from the processed parent would be _much_ cleaner.
> That is, here (and avoid the FOR_EACH_LOOP change):
>
> @@ -580,12 +586,21 @@ vectorize_loops (void)
> && dump_enabled_p ())
>dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> "loop vectorized\n");
> -   vect_transform_loop (loop_vinfo);
> +   new_loop = vect_transform_loop (loop_vinfo);
> num_vectorized_loops++;
> /* Now that the loop has been vectorized, allow it to be unrolled
>etc.  */
> loop->force_vectorize = false;
>
> +   /* Add new loop to a processing queue.  To make it easier
> +  to match loop and its epilogue vectorization in dumps
> +  put new loop as the next loop to process.  */
> +   if (new_loop)
> + {
> +   loops.safe_insert (i + 1, new_loop->num);
> +   vect_loops_num = number_of_loops (cfun);
> + }
>
> simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> function which will set up stuff properly (and also perform
> the if-conversion of the epilogue there).
>
> That said, if we can get in non-masked epilogue vectorization
> separately that would be great.
>
> I'm still torn about all the rest of the stuff and question its
> usability (esp. merging the epilogue with the main vector loop).
> But it has already been approved ... oh well.
>
> Thanks,
> Richard.


Re: [PATCH] Convert character arrays to string csts

2016-11-03 Thread Richard Biener
On Thu, Nov 3, 2016 at 1:12 PM, Martin Liška  wrote:
> Hello.
>
> This is small follow-up of the patches I sent to string folding.
> The patch transforms character array defined in an initializer to string
> constant:
>
> +const char global[] = {'a', 'b', 'c', 'd', '\0'};
>
> Apart from that, it also enables string folding of local symbols like:
> +  const char local[] = "abcd";
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Hmm, does this handle

const char global[] = {'a', [4] = 'd', 'b', [3] = '\0'};

correctly?

I think you need to check that 'key' is increasing and there are no gaps
(ISTR constructor elements are sorted but gaps can still appear).

+ || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))

please instead check the element type of the array constructor.  I'd also
use a stricter check like TYPE_MAIN_VARIANT (type) == char_type_node
to avoid changing non-strings like unsigned / signed char.

Finally I'm a bit worried about doing this for all 'char'
constructors.  Maybe we
should restrict this to those with ISPRINT () entries?

Richard.


> Martin


Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Uros Bizjak
On Thu, Nov 3, 2016 at 1:18 PM, Eric Botcazou  wrote:
>> libfunc is already implemented for all targets to use, there is also:
>>
>> OPTAB_NC(sdivmod_optab, "divmod$a4", UNKNOWN)
>> OPTAB_NC(udivmod_optab, "udivmod$a4", UNKNOWN)
>>
>> in optabs.def that can probably be changed  for generic expansion.
>
> So what's the purpose of ix86_init_libfuncs if the libfuncs are already there?

libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
implemented in libgcc. But the enablement of this function inside the
compiler has to be performed by each target.

Uros.


Re: [PATCH] Convert character arrays to string csts

2016-11-03 Thread Bernd Schmidt

On 11/03/2016 01:12 PM, Martin Liška wrote:

+  tree init = DECL_INITIAL (decl);
+  if (init
+  && TREE_READONLY (decl)
+  && can_convert_ctor_to_string_cst (init))
+DECL_INITIAL (decl) = build_string_cst_from_ctor (init);


I'd merge these two new functions since they're only ever called 
together. We'd then have something like


if (init && TREE_READONLY (decl))
  init = convert_ctor_to_string_cst (init);
if (init)
  DECL_INITIAL (decl) = init;

I'll defer to Jan on whether finalize_decl seems like a good place to do 
this.



diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
index 283bd1c..b2d1fd5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
@@ -4,12 +4,15 @@
 char *buffer1;
 char *buffer2;

+const char global[] = {'a', 'b', 'c', 'd', '\0'};
+
 #define SIZE 1000

 int
 main (void)
 {
   const char* const foo1 = "hello world";
+  const char local[] = "abcd";

   buffer1 = __builtin_malloc (SIZE);
   __builtin_strcpy (buffer1, foo1);
@@ -45,6 +48,10 @@ main (void)
 __builtin_abort ();
   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
 __builtin_abort ();
+  if (__builtin_memchr (global, null, 5) == 0)
+__builtin_abort ();
+  if (__builtin_memchr (local, null, 5) == 0)
+__builtin_abort ();


How is that a meaningful test? This seems to work even with an unpatched 
gcc. I'd have expected something that shows a benefit for doing this 
conversion, and maybe also a test that shows it isn't done in cases 
where it's not allowed.



 tree
-build_string_literal (int len, const char *str)
+build_string_literal (int len, const char *str, bool build_addr_expr)


New arguments should be documented in the function comment.


+/* Return TRUE when CTOR can be converted to a string constant.  */


"if", not "when".


+  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
+{
+  if (key == NULL_TREE
+ || TREE_CODE (key) != INTEGER_CST
+ || !tree_fits_uhwi_p (value)
+ || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
+   return false;


Shouldn't all elements have the same type, or do you really have to call 
useless_type_conversion in a loop?



+  /* Allow zero character just at the end of a string.  */
+  if (integer_zerop (value))
+   return idx == elements - 1;


Don't you also have to explicitly check it's there?


Bernd


Re: [patch, avr] Make pmem-wrap-around option as default

2016-11-03 Thread Georg-Johann Lay

On 03.11.2016 08:58, Pitchumani Sivanupandi wrote:

Most of the AVR's 8k memorydevices have only rjmp instruction, not jmp. So, it
is important to wrap around jump destination to check if it can reach backwards.

Currently link specs passes --pmem-wrap-around=xxK when mrelax and
mpmem-wrap-around options are enabled. Attached patch changes the specs so that
option --pmem-wrap-around=8K is passed for 8k memory devices if
-mno-pmem-wrap-around is not enabled.

If OK, could someone commit please?

Note: Currently 8k devices are identified based on name prefix. We are working
on alternative method to incorporate flash memory size.


Currently, "at90usb8" this the only prefix that results in wrap_k = 8, so 
without adding knowledge about flash sizes to the compiler, the change makes 
not much sense... (but does not do harm either).


The right place for flash size info would be avr-mcus.def together with a new 
command option to specify the flash size and draw conclusions from it.  When I 
asked Atmel about a mapping of Device to flash size, the support told me to 
check the data sheets -- not really practical for ~300 devices in avr-mcus.def 
:-)  Atmel *must* have information about this...


The new option would render -mn-flash superfluous, but we should keep it for 
backward compatibility.


Johann



Regards,
Pitchumani

gcc/ChangeLog

2016-11-03  Pitchumani Sivanupandi  

* config/avr/gen-avr-mmcu-specs.c (print_mcu): Update link_pmem_wrap spec
string to add linker option --pmem-wrap-around=8k.
* config/avr/specs.h: Add link_pmem_wrap to linker specs (LINK_SPEC).




diff --git a/gcc/config/avr/specs.h b/gcc/config/avr/specs.h
index 52763cc..fbf0ce6 100644
--- a/gcc/config/avr/specs.h
+++ b/gcc/config/avr/specs.h
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
   "%(link_data_start) " \
   "%(link_text_start) " \
   "%(link_relax) "  \
+  "%(link_pmem_wrap) "  \
   "%{shared:%eshared is not supported} "

 #undef  LIB_SPEC


Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from 
LINK_RELAX_SPEC?  And what happens if relaxation is off?


Johann



RE: [PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need branch-likely instructions.

2016-11-03 Thread Moore, Catherine


> -Original Message-
> From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> Sent: Thursday, November 3, 2016 6:58 AM
> Subject: [PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need
> branch-likely instructions.
> 
> 
> gcc/testsuite/
> * gcc.target/mips/mips.exp: Add check for -mbranch-likely in
> condition for R5 downgrade.
> 

This patch is OK.
Catherine



Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf
On 2016.11.03 at 13:32 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> > On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> > >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > >>I'm concerned about the number of false positives for this warning, and
> > >>judging by previous discussions, I'm not alone in this. This patch limits 
> > >>it
> > >>to level 1 (any comment before the case label disables the warning) for
> > >>cases where the user specified no explicit level. It'll still generate
> > >>enough noise that people will be aware of it and can choose whether to 
> > >>use a
> > >>higher level or not.
> > >>
> > >>Bootstrapped and tested on x86_64-linux. Ok?
> > >
> > >I disagree, I'm ok with changing it to 2, but 1 is too much.
> > 
> > Well, we have data from our own sources where we had to "fix" lots of
> > perfectly good code, and also from the Linux kernel. From an earlier
> > discussion:
> 
> That data wasn't really convincing on this.  All it proved is that most of
> the cases are (likely) deliberate fall-throughs without any comment
> whatsoever, the rest is in the noise.  As one needs to deal with those
> where comments are missing altogether, dealing with the noise is acceptable.

Without Bernd's patch to set the default to 1 you will drown in false
positives once you start using gcc-7 to build a whole distro. On my
Gentoo test box anything but level 1 is simply unacceptable, because you
will miss important other warnings in the -Wimplicit-fallthrough noise
otherwise.

-- 
Markus


Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou
> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
> implemented in libgcc. But the enablement of this function inside the
> compiler has to be performed by each target.

So can we do it generically instead of duplicating it ~50 times?

-- 
Eric Botcazou


Re: [PATCH] Convert character arrays to string csts

2016-11-03 Thread Jan Hubicka
> On 11/03/2016 01:12 PM, Martin Liška wrote:
> >+  tree init = DECL_INITIAL (decl);
> >+  if (init
> >+  && TREE_READONLY (decl)
> >+  && can_convert_ctor_to_string_cst (init))
> >+DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
> 
> I'd merge these two new functions since they're only ever called
> together. We'd then have something like
> 
> if (init && TREE_READONLY (decl))
>   init = convert_ctor_to_string_cst (init);
> if (init)
>   DECL_INITIAL (decl) = init;
> 
> I'll defer to Jan on whether finalize_decl seems like a good place
> to do this.

I think finalize_decl may be bit too early because frontends may expects the
ctors to be in a way they produced them.  We only want to convert those arrays
seen by middle-end so I would move the logic to varpool_node::analyze

Otherwise the patch seems fine to me.

Honza
> 
> >diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c 
> >b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
> >index 283bd1c..b2d1fd5 100644
> >--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
> >+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
> >@@ -4,12 +4,15 @@
> > char *buffer1;
> > char *buffer2;
> >
> >+const char global[] = {'a', 'b', 'c', 'd', '\0'};
> >+
> > #define SIZE 1000
> >
> > int
> > main (void)
> > {
> >   const char* const foo1 = "hello world";
> >+  const char local[] = "abcd";
> >
> >   buffer1 = __builtin_malloc (SIZE);
> >   __builtin_strcpy (buffer1, foo1);
> >@@ -45,6 +48,10 @@ main (void)
> > __builtin_abort ();
> >   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
> > __builtin_abort ();
> >+  if (__builtin_memchr (global, null, 5) == 0)
> >+__builtin_abort ();
> >+  if (__builtin_memchr (local, null, 5) == 0)
> >+__builtin_abort ();
> 
> How is that a meaningful test? This seems to work even with an
> unpatched gcc. I'd have expected something that shows a benefit for
> doing this conversion, and maybe also a test that shows it isn't
> done in cases where it's not allowed.
> 
> > tree
> >-build_string_literal (int len, const char *str)
> >+build_string_literal (int len, const char *str, bool build_addr_expr)
> 
> New arguments should be documented in the function comment.
> 
> >+/* Return TRUE when CTOR can be converted to a string constant.  */
> 
> "if", not "when".
> 
> >+  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
> >+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
> >+{
> >+  if (key == NULL_TREE
> >+  || TREE_CODE (key) != INTEGER_CST
> >+  || !tree_fits_uhwi_p (value)
> >+  || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
> >+return false;
> 
> Shouldn't all elements have the same type, or do you really have to
> call useless_type_conversion in a loop?
> 
> >+  /* Allow zero character just at the end of a string.  */
> >+  if (integer_zerop (value))
> >+return idx == elements - 1;
> 
> Don't you also have to explicitly check it's there?
> 
> 
> Bernd


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Bernd Schmidt

On 11/03/2016 01:33 PM, Eric Botcazou wrote:

Thanks for the feedback,  I'll try to work through this.


Thanks, but since there doesn't seem to be a consensus on the goal, feel free
to disregard it and just implement what you need for your initial work.


FWIW here's a more complete version of my patch which I'm currently 
testing. Let me know if you think it's at least a good enough 
intermediate step to be installed. I think, the sel-sched example 
notwithstanding, this is something that normally should not be needed 
outside of emit_copy_of_insn_after, so having that do the right thing 
ought to be good enough.



Bernd

	* emit-rtl.c (emit_copy_of_insn_after): Duplicate notes in order.
	* sel-sched-ir.c (create_copy_of_insn_rtx): Likewise.
	* rtl.h (duplicate_reg_notes): Declare.
	* rtlanal.c (duplicate_reg_note): New function.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c	(revision 241233)
+++ gcc/emit-rtl.c	(working copy)
@@ -6169,17 +6169,18 @@ emit_copy_of_insn_after (rtx_insn *insn,
  which may be duplicated by the basic block reordering code.  */
   RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn);
 
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = ®_NOTES (new_rtx);
+  gcc_assert (*ptail == NULL_RTX);
+
   /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label
  will make them.  REG_LABEL_TARGETs are created there too, but are
  supposed to be sticky, so we copy them.  */
   for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
 if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (new_rtx, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_shallow_copy_of_reg_note (new_rtx, link);
+	*ptail = duplicate_reg_note (link);
+	ptail = &XEXP (*ptail, 1);
   }
 
   INSN_CODE (new_rtx) = INSN_CODE (insn);
Index: gcc/rtl.h
===
--- gcc/rtl.h	(revision 241233)
+++ gcc/rtl.h	(working copy)
@@ -3008,6 +3008,7 @@ extern rtx alloc_reg_note (enum reg_note
 extern void add_reg_note (rtx, enum reg_note, rtx);
 extern void add_int_reg_note (rtx, enum reg_note, int);
 extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx);
+extern rtx duplicate_reg_note (rtx);
 extern void remove_note (rtx, const_rtx);
 extern void remove_reg_equal_equiv_notes (rtx_insn *);
 extern void remove_reg_equal_equiv_notes_for_regno (unsigned int);
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c	(revision 241233)
+++ gcc/rtlanal.c	(working copy)
@@ -2304,6 +2304,20 @@ add_shallow_copy_of_reg_note (rtx_insn *
 add_reg_note (insn, REG_NOTE_KIND (note), XEXP (note, 0));
 }
 
+/* Duplicate NOTE and return the copy.  */
+rtx
+duplicate_reg_note (rtx note)
+{
+  reg_note kind = REG_NOTE_KIND (note);
+
+  if (GET_CODE (note) == INT_LIST)
+return gen_rtx_INT_LIST ((machine_mode) kind, XINT (note, 0), NULL_RTX);
+  else if (GET_CODE (note) == EXPR_LIST)
+return alloc_reg_note (kind, copy_insn_1 (XEXP (note, 0)), NULL_RTX);
+  else
+return alloc_reg_note (kind, XEXP (note, 0), NULL_RTX);
+}
+
 /* Remove register note NOTE from the REG_NOTES of INSN.  */
 
 void
Index: gcc/sel-sched-ir.c
===
--- gcc/sel-sched-ir.c	(revision 241233)
+++ gcc/sel-sched-ir.c	(working copy)
@@ -5762,6 +5762,10 @@ create_copy_of_insn_rtx (rtx insn_rtx)
   res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)),
   NULL_RTX);
 
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = ®_NOTES (res);
+  gcc_assert (*ptail == NULL_RTX);
+
   /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND
  since mark_jump_label will make them.  REG_LABEL_TARGETs are created
  there too, but are supposed to be sticky, so we copy them.  */
@@ -5770,11 +5774,8 @@ create_copy_of_insn_rtx (rtx insn_rtx)
 	&& REG_NOTE_KIND (link) != REG_EQUAL
 	&& REG_NOTE_KIND (link) != REG_EQUIV)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (res, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0));
+	*ptail = duplicate_reg_note (link);
+	ptail = &XEXP (*ptail, 1);
   }
 
   return res;


RE: [PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need branch-likely instructions.

2016-11-03 Thread Matthew Fortune
Toma Tabacu  writes:
> The gcc.target/mips/wrap-delay.c test was failing on mips-img-*
> toolchains
> because it was using -mbranch-likely with an R6 target, and branch-
> likely
> instructions were removed in R6.
> 
> This patch makes the testsuite downgrade to R5 if the -mbranch-likely
> option
> is present and we're targeting R6.
> 
> Tested with mips-img-elf and mips-img-linux-gnu.

Hi Toma,

Welcome to GCC development, thanks for your first patch. As you can see
from Catherine's reply the change looks good. I'll just cover some
housekeeping issues...

> gcc/testsuite/
> * gcc.target/mips/mips.exp: Add check for -mbranch-likely in
> condition for R5 downgrade.

Changelogs are an art form which will take some getting used to. This
is almost there but needs to reference the function affected.

* gcc.target/mips/mips.exp (mips-dg-options): Downgrade to R5
for -mbranch-likely and infer -mno-branch-likely for R6.

I have extended the comment as well as there is an additional change
needed for this patch ideally.

> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index 7c24140..382d69c 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -1176,7 +1176,8 @@ proc mips-dg-options { args } {
>|| [mips_have_test_option_p options "-mpaired-
> single"]
>|| [mips_have_test_option_p options "-
> mnan=legacy"]
>|| [mips_have_test_option_p options "-
> mabs=legacy"]
> -  || [mips_have_test_option_p options "!HAS_LSA"])
> } {
> +  || [mips_have_test_option_p options "!HAS_LSA"]
> +  || [mips_have_test_option_p options "-mbranch-
> likely"]) } {
> if { $gp_size == 32 } {
> mips_make_test_option options "-mips32r5"
> } else {

Please can you make sure to retain the original patch formatting
when posting. I suspect you have copied this out of a putty session
or similar and have therefore lost the tabs.

The extra change is that in the post-arch option processing we will
need to infer -mno-branch-likely for the $isa_rev > 5 case much like
we infer -mnan=2008 and -mabs=2008. This is so that when running the
testsuite using -mips32r5 or earlier, with -mbranch-likely as part of
the user-supplied test flags, then a test which is upgraded to
mips32r6 does not attempt to use -mbranch-likely.

Hope that wasn't too cryptic!

Thanks,
Matthew


Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Uros Bizjak
On Thu, Nov 3, 2016 at 1:58 PM, Eric Botcazou  wrote:
>> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
>> implemented in libgcc. But the enablement of this function inside the
>> compiler has to be performed by each target.
>
> So can we do it generically instead of duplicating it ~50 times?

I guess it can be done. Currently the expander goes:

--cut here--
  /* Check if optab_handler exists for divmod_optab for given mode.  */
  if (optab_handler (tab, mode) != CODE_FOR_nothing)
{
  quotient = gen_reg_rtx (mode);
  remainder = gen_reg_rtx (mode);
  expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
}

  /* Generate call to divmod libfunc if it exists.  */
  else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
   "ient, &remainder);

  else
gcc_unreachable ();
--cut here--

so, by declaring divmod libfunc, the target also has to provide target hook.

Let's ask authors of the original divmod patch for the details.

Uros.


[PATCH, Fortran, v1] Restructure initialization of allocatable components

2016-11-03 Thread Andre Vehreschild
Hi all,

the attached patch restructures gfortran's way of initializing components of
derived types in ALLOCATE. The old way was to generate a new gfc_code-node and
add it after the ALLOCATE node to initialize the the derived type on certain
conditions (like initializer or allocatable components exist). This patch
proposes to do the initialization as part of the ALLOCATE. This way it makes the
ALLOCATE-statement more atomic in that the ALLOCATE does everything it is
responsible for itself and does rely on other nodes adding to its
responsibilities. The patch furthermore enables to use the knowledge we have in
the allocate, i.e., a freshly allocated object can never have allocated
allocatable components, so no need to check before resetting them.

At the same time I remove some dead code from the resolve_alloc_expr and moved
a loop invariant piece out of the loop iterating over all objects to allocate.
This of course is only cosmetic.

Of course did I not do this out of fun. I have a patch upcoming for allocatable
components in coarrayed derived types. For this I needed to identify the
initialization of the structure and to parameterize it further. This was hard
when for the default initialization an additional code-node was created, but
now that everything necessary for ALLOCATE is done in ALLOCATE parameterizing
the initialization is way easier. The coarray patch is not yet perfect, but I
thought to publish this part already to get your opinions.

Bootstraps and regtests fine on x86_64-linux/F23. Ok for trunk?

@Dominique: Would you give it a go on your open patch collection? Maybe it
fixes one PR, but I am not very hopeful, because the patch is merely removing
complexity instead of doing new things.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


rework_derived_alloc.clog
Description: Binary data
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index bb183d4..0e94ae8 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4131,6 +4131,26 @@ gfc_apply_init (gfc_typespec *ts, symbol_attribute *attr, gfc_expr *init)
 }
 
 
+/* Check whether an expression is a structure constructor and whether it has
+   other values than NULL.  */
+
+bool
+is_non_empty_structure_constructor (gfc_expr * e)
+{
+  if (e->expr_type != EXPR_STRUCTURE)
+return false;
+
+  gfc_constructor *cons = gfc_constructor_first (e->value.constructor);
+  while (cons)
+{
+  if (!cons->expr || cons->expr->expr_type != EXPR_NULL)
+	return true;
+  cons = gfc_constructor_next (cons);
+}
+  return false;
+}
+
+
 /* Check for default initializer; sym->value is not enough
as it is also set for EXPR_NULL of allocatables.  */
 
@@ -4145,7 +4165,9 @@ gfc_has_default_initializer (gfc_symbol *der)
   {
 if (!c->attr.pointer && !c->attr.proc_pointer
 	 && !(c->attr.allocatable && der == c->ts.u.derived)
-	 && gfc_has_default_initializer (c->ts.u.derived))
+	 && ((c->initializer
+		  && is_non_empty_structure_constructor (c->initializer))
+		 || gfc_has_default_initializer (c->ts.u.derived)))
 	  return true;
 	if (c->attr.pointer && c->initializer)
 	  return true;
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 14685d2..c341bbc 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -7046,35 +7046,6 @@ conformable_arrays (gfc_expr *e1, gfc_expr *e2)
   return true;
 }
 
-static void
-cond_init (gfc_code *code, gfc_expr *e, int pointer, gfc_expr *init_e)
-{
-  gfc_code *block;
-  gfc_expr *cond;
-  gfc_code *init_st;
-  gfc_expr *e_to_init = gfc_expr_to_initialize (e);
-
-  cond = pointer
-? gfc_build_intrinsic_call (gfc_current_ns, GFC_ISYM_ASSOCIATED,
-	"associated", code->loc, 2, gfc_copy_expr (e_to_init), NULL)
-: gfc_build_intrinsic_call (gfc_current_ns, GFC_ISYM_ALLOCATED,
-	"allocated", code->loc, 1, gfc_copy_expr (e_to_init));
-
-  init_st = gfc_get_code (EXEC_INIT_ASSIGN);
-  init_st->loc = code->loc;
-  init_st->expr1 = e_to_init;
-  init_st->expr2 = init_e;
-
-  block = gfc_get_code (EXEC_IF);
-  block->loc = code->loc;
-  block->block = gfc_get_code (EXEC_IF);
-  block->block->loc = code->loc;
-  block->block->expr1 = cond;
-  block->block->next = init_st;
-  block->next = code->next;
-
-  code->next = block;
-}
 
 /* Resolve the expression in an ALLOCATE statement, doing the additional
checks to see whether the expression is OK or not.  The expression must
@@ -7325,34 +7296,6 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code, bool *array_alloc_wo_spec)
   /* We have to zero initialize the integer variable.  */
   code->expr3 = gfc_get_int_expr (gfc_default_integer_kind, &e->where, 0);
 }
-  else if (!code->expr3)
-{
-  /* Set up default initializer if needed.  */
-  gfc_typespec ts;
-  gfc_expr *init_e;
-
-  if (gfc_bt_struct (code->ext.alloc.ts.type))
-	ts = code->ext.alloc.ts;
-  else
-	ts = e->ts;
-
-  if (ts.type == BT_CLASS)
-	ts = ts.u.derived->components->ts;
-
-

Re: [PATCH] Create x.gcov file for binary w/o x.gcda file (PR, gcov-profile/65831)

2016-11-03 Thread Jan Hubicka
> On 08/04/2016 02:52 PM, Nathan Sidwell wrote:
> > On 08/04/16 08:27, Martin Liška wrote:
> >> Hi.
> >>
> >> Following patch is grabbed from the PR, where I just applied the patch
> >> and wrote a test-case which removes x.gcda file before running gcov tool.
> >>
> >> Ready to be installed?
> > 
> > 2016-08-04  Martin Liska  
> > 
> > * g++.dg/gcov/gcov-16.C: New test.
> > * lib/gcov.exp: Support new argument for run-gcov function.
> > 
> > gcc/ChangeLog:
> > 
> > 2016-08-04  Martin Liska  
> > Adam Fineman  
> > 
> > * gcov.c (process_file): Create .gcov file when .gcda
> > file is missing.
> > 
> > ok thanks
> 
> Hi.
> 
> As mentioned by the reporter, it's desired patch for backport to GCC 5 and 6 
> branches.
> 
> May I install the same patch after it survives regression tests?
OK,
thanks
Honza
> Thanks,
> Martin


RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2016-11-03 Thread Toma Tabacu
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andrew Bennett
> Sent: 03 November 2016 11:33
> To: Matthew Fortune; 'Moore, Catherine'; 'gcc-patches@gcc.gnu.org'
> Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard
> library support check the sysroot supports the required test options.
> 
> Ping.
> 
> 
> Regards,
> 
> 
> 
> Andrew
> 

Hi Andrew,

I believe the inline-memcpy-{1,2,3,4,5}.c tests also qualify for the
(REQUIRES_STDLIB) option.

Regards,
Toma Tabacu

> > -Original Message-
> > From: Andrew Bennett
> > Sent: 28 August 2015 16:50
> > To: Matthew Fortune; Moore, Catherine; gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires
> > standard library support check the sysroot supports the required test
> options.
> >
> > > I had some comments on this that I hadn't got round to posting. The
> > > fix in this patch is not general enough as the missing header
> > > problem comes in two (related) forms:
> > >
> > > 1) Using the new MTI and IMG sysroot layout we can end up with GCC
> looking
> > >for headers in a sysroot that simply does not exist. The current patch
> > >handles this.
> > > 2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible
> > >for the stdlib.h header to be found but the ABI dependent gnu-stubs
> > >header may not be installed depending on soft/hard nan1985/nan2008.
> > >
> > > The test for stdlib.h needs to therefore verify that preprocessing
> > > succeeds rather than just testing for an error relating to stdlib.h.
> > > This could be done by adding a further option to mips_preprocess to
> > > indicate the processor output should go to a file and that the
> > > caller wants the messages emitted by the compiler instead.
> > >
> > > A second issue is that you have added (REQUIRES_STDLIB) to too many
> tests.
> > > You only need to add it to tests that request a compiler option (via
> > > dg-options) that could potentially lead to forcing soft/hard
> > > nan1985/nan2008 directly or indirectly. So -mips32r6 implies nan2008
> > > so you need it -
> > mips32r5
> > > implies nan1985 so you need it. There are at least two tests which
> > > don't need the option but you need to check them all so we don't run
> > > the check needlessly.
> >
> > The updated patch and ChangeLog that addresses Matthew's comments is
> below.
> >
> > Ok to commit?
> >
> > Regards,
> >
> >
> > Andrew
> >
> >
> > testsuite/
> >
> > * gcc.target/mips/loongson-simd.c (dg-options): Add
> > (REQUIRES_STDLIB).
> > * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto
> > * gcc.target/mips/mips-3d-1.c: Ditto
> > * gcc.target/mips/mips-3d-2.c: Ditto
> > * gcc.target/mips/mips-3d-3.c: Ditto
> > * gcc.target/mips/mips-3d-4.c: Ditto
> > * gcc.target/mips/mips-3d-5.c: Ditto
> > * gcc.target/mips/mips-3d-6.c: Ditto
> > * gcc.target/mips/mips-3d-7.c: Ditto
> > * gcc.target/mips/mips-3d-8.c: Ditto
> > * gcc.target/mips/mips-3d-9.c: Ditto
> > * gcc.target/mips/mips-ps-1.c: Ditto
> > * gcc.target/mips/mips-ps-2.c: Ditto
> > * gcc.target/mips/mips-ps-3.c: Ditto
> > * gcc.target/mips/mips-ps-4.c: Ditto
> > * gcc.target/mips/mips-ps-6.c: Ditto
> > * gcc.target/mips/mips16-attributes.c: Ditto
> > * gcc.target/mips/mips32-dsp-run.c: Ditto
> > * gcc.target/mips/mips32-dsp.c: Ditto
> > * gcc.target/mips/save-restore-1.c: Ditto
> > * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib.
> > (mips_preprocess): Add ignore_output argument that when set
> > will not return the pre-processed output.
> > (mips_arch_info): Update arguments for the call to
> > mips_preprocess.
> > (mips-dg-init): Ditto.
> > (mips-dg-options): Check if a test having test option
> > (REQUIRES_STDLIB) has the required sysroot support for
> > the current test options.
> >
> >
> >
> > diff --git
> > a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > index f57a18c..baed48c 100644
> > --- a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > +++ b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > @@ -4,7 +4,7 @@
> >  /* loongson.h does not handle or check for MIPS16ness.  There doesn't
> > seem any good reason for it to, given that the Loongson processors
> > do not support MIPS16.  */
> > -/* { dg-options "isa=loongson -mhard-float -mno-mips16" } */
> > +/* { dg-options "isa=loongson -mhard-float -mno-mips16
> > +(REQUIRES_STDLIB)" }
> > */
> >  /* See PR 52155.  */
> >  /* { dg-options "isa=loongson -mhard-float -mno-mips16 -mlong64" {
> > mips*-*-
> > elf* && ilp32 } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > b/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > index 6d2ceb6..f263b43 100644
> 

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 01:55:33PM +0100, Markus Trippelsdorf wrote:
> On 2016.11.03 at 13:32 +0100, Jakub Jelinek wrote:
> > On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> > > On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> > > >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > > >>I'm concerned about the number of false positives for this warning, and
> > > >>judging by previous discussions, I'm not alone in this. This patch 
> > > >>limits it
> > > >>to level 1 (any comment before the case label disables the warning) for
> > > >>cases where the user specified no explicit level. It'll still generate
> > > >>enough noise that people will be aware of it and can choose whether to 
> > > >>use a
> > > >>higher level or not.
> > > >>
> > > >>Bootstrapped and tested on x86_64-linux. Ok?
> > > >
> > > >I disagree, I'm ok with changing it to 2, but 1 is too much.
> > > 
> > > Well, we have data from our own sources where we had to "fix" lots of
> > > perfectly good code, and also from the Linux kernel. From an earlier
> > > discussion:
> > 
> > That data wasn't really convincing on this.  All it proved is that most of
> > the cases are (likely) deliberate fall-throughs without any comment
> > whatsoever, the rest is in the noise.  As one needs to deal with those
> > where comments are missing altogether, dealing with the noise is acceptable.
> 
> Without Bernd's patch to set the default to 1 you will drown in false
> positives once you start using gcc-7 to build a whole distro. On my
> Gentoo test box anything but level 1 is simply unacceptable, because you
> will miss important other warnings in the -Wimplicit-fallthrough noise
> otherwise.

That is really strange.  First of all, most of packages aren't compiled with
-Wextra/-W.  And in those that are, are you sure that that no comment at all
for the implicit fallthroughs isn't significantly more common than the
set of comments that are accepted by -Wimplicit-fallthrough=1 and not
accepted by -Wimplicit-fallthrough=2?

Jakub


Re: [C++ Patch/RFC] PR 67980 ("left shift count is negative [-Wshift-count-negative] generated for unreachable code")

2016-11-03 Thread Jason Merrill
On Tue, Oct 18, 2016 at 4:26 PM, Paolo Carlini  wrote:
> Thus, I'm back to one of my first tries earlier today: a much more
> conservative change which uses fold_non_dependent_expr only for the purpose
> of suppressing the unwanted warnings, see the below.

This looks right; in general we use early folding only to control warnings.  OK.

Jason


Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2016-11-03 Thread Jason Merrill
On Thu, Nov 3, 2016 at 7:24 AM, Marek Polacek  wrote:
> On Tue, Nov 01, 2016 at 02:53:58PM +0100, Jakub Jelinek wrote:
>> On Tue, Nov 01, 2016 at 09:41:20AM -0400, Jason Merrill wrote:
>> > On Tue, Oct 25, 2016 at 9:59 AM, Marek Polacek  wrote:
>> > > On Mon, Oct 24, 2016 at 04:10:21PM +0200, Marek Polacek wrote:
>> > >> On Thu, Oct 20, 2016 at 12:28:36PM +0200, Marek Polacek wrote:
>> > >> > I found a problem with this patch--we can't call 
>> > >> > do_warn_duplicated_branches in
>> > >> > build_conditional_expr, because that way the C++-specific codes might 
>> > >> > leak into
>> > >> > the hasher.  Instead, I should use operand_equal_p, I think.  Let me 
>> > >> > rework
>> > >> > that part of the patch.
>> >
>> > Hmm, is there a reason not to use operand_equal_p for
>> > do_warn_duplicated_branches as well?  I'm concerned about hash
>> > collisions leading to false positives.
>>
>> If the hashing function is iterative_hash_expr / inchash::add_expr, then
>> that is supposed to pair together with operand_equal_p, we even have
>> checking verification of that.

Yes, but there could still be hash collisions; we can't guarantee that
everything that compares unequal also hashes unequal.

Jason


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška
On 11/02/2016 03:51 PM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
>> it converts:
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>   int _8;
>>   int _9;
>>
>>   :
>>   ASAN_MARK (2, &a, 1);
>>   a = 0;
>>   p_6 = &a;
>>   ASAN_MARK (1, &a, 1);
>>   _1 = *p_6;
> 
> You shouldn't convert if a is addressable (when ignoring &a in ASAN_MARK
> calls).  Only if there is &a just in ASAN_MARK and MEM_REF, you can convert.
> 
>> to:
>>
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>
>>   :
>>   a_10 = 0;
>>   a_12 = ASAN_POISON ();
>>   _1 = a_12;
>>   if (_1 != 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # _2 = PHI <1(2), 0(3)>
>>   return _2;
>>
>> }
>>
>> and probably the last goal is to convert the newly added internal fn to a 
>> runtime call.
>> Hope sanopt pass is the right place where to it?
> 
> If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
> would be to add an artificial variable you give the same name as the
> underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
> right away (keep unpoisoning only to the function epilogue) and then
> ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
> (D) SSA_NAME.
> 
>   Jakub
> 

Hi.

I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
Well, to be honest,
I still have a feeling that doing the magic with the parallel variable is bit 
overkill. Maybe
a new runtime call would make it easier for us.

However, I still don't fully understand why we want to support just 
is_gimple_reg variables.
Let's consider following test-case:

void foo()
{
char *ptr;
  {
char my_char[9];
ptr = &my_char[0];
  }
}

Where I would expect to optimize out:
  :
  _5 = (unsigned long) 9;
  _4 = (unsigned long) &my_char;
  __builtin___asan_unpoison_stack_memory (_4, _5);
  _7 = (unsigned long) 9;
  _6 = (unsigned long) &my_char;
  __builtin___asan_poison_stack_memory (_6, _7);
  return;

where address of my_char is taken in the original source code, while not during 
tree-ssa
optimization, where the address is used only by ASAN_MARK calls.

Doing such transformation can rapidly decrease number of 
__builtin___asan_{un}poison_stack_memory
in tramp3d: from ~36K to ~22K.

Thanks for clarification.
Martin



[PATCH, RFC] Improve ivopts group costs

2016-11-03 Thread Evgeny Kudryashov

Hello,

I'm facing the following problem related to ivopts. The problem is that 
GCC generates a lot of induction variables and as a result there is an 
unnecessary increase of stack usage and register pressure.


For instance, for the attached testcase (tc_ivopts.c) GCC generates 26 
induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for 
rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm 
target using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For 
reference, I'm attaching assembly I get on current trunk.


The reason might be in use groups costs, in particular, in the way of 
estimation. Currently, the cost of a tuple (group, candidate) is a sum 
of per-use costs in the group. So, the cost of a group grows 
proportional to the number of uses in the group. This approach has a 
negative effect on the algorithm for finding the best set of induction 
variables: the part of a total cost related to adding a new candidate is 
almost washed out by the cost of the group. In addition, when there is a 
lot of groups with many uses inside and a target is out of free 
registers, the cost of spill is washing out too. As a result, GCC 
prefers to use native candidates (candidate created for a particular 
group) for each group of uses instead of considering the real cost of 
introducing a new variable into a set.


The summing approach was added as a part of this patch 
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the 
motivation for taking the sum does not seem to be

specifically discussed.

I propose the following patch that changes a group cost from cost 
summing to selecting the largest one inside a group. For the given test 
case I have: necessary size of stack is decreased by almost 3 times and 
ldr\str amount are decreased by less than 2 times. Also I'm attaching 
assembly after applying the patch.


The essential change in the patch is just:

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index f9211ad..a149418 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct 
ivopts_data *data,

 offset and where the iv_use is.  */
cost = get_computation_cost (data, next, cand, true,
 NULL, &can_autoinc, NULL);
-  sum_cost += cost;
+  if (sum_cost < cost)
+sum_cost = cost;
 }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
 NULL_TREE, ERROR_MARK, inv_expr);

Any suggestions?


Thanks,
Evgeny.#define SIZE 20
int gp22 = SIZE;
double rhs [SIZE][SIZE][5];

void x_solve()
{
  int j, k;
  double lhsX[5][5][gp22][gp22];
  for (j = 1; j < gp22; j++) {
for (k = 1; k < gp22; k++) {
  lhsX[0][0][j][k] -= lhsX[1][0][j][k];
  lhsX[0][1][j][k] -= lhsX[1][1][j][k];
  lhsX[0][2][j][k] -= lhsX[1][2][j][k];
  lhsX[0][3][j][k] -= lhsX[1][3][j][k];
  lhsX[0][4][j][k] -= lhsX[1][4][j][k];

  lhsX[2][0][j][k] -= lhsX[1][0][j][k];
  lhsX[2][1][j][k] -= lhsX[1][1][j][k];
  lhsX[2][2][j][k] -= lhsX[1][2][j][k];
  lhsX[2][3][j][k] -= lhsX[1][3][j][k];
  lhsX[2][4][j][k] -= lhsX[1][4][j][k];

  lhsX[3][0][j][k] -= lhsX[1][0][j][k];
  lhsX[3][1][j][k] -= lhsX[1][1][j][k];
  lhsX[3][2][j][k] -= lhsX[1][2][j][k];
  lhsX[3][3][j][k] -= lhsX[1][3][j][k];
  lhsX[3][4][j][k] -= lhsX[1][4][j][k];

  lhsX[4][0][j][k] -= lhsX[1][0][j][k];
  lhsX[4][1][j][k] -= lhsX[1][1][j][k];
  lhsX[4][2][j][k] -= lhsX[1][2][j][k];
  lhsX[4][3][j][k] -= lhsX[1][3][j][k];
  lhsX[4][4][j][k] -= lhsX[1][4][j][k];

  lhsX[0][0][j][k] -= lhsX[3][0][j][k];
  lhsX[0][1][j][k] -= lhsX[3][1][j][k];
  lhsX[0][2][j][k] -= lhsX[3][2][j][k];
  lhsX[0][3][j][k] -= lhsX[3][3][j][k];
  lhsX[0][4][j][k] -= lhsX[3][4][j][k];

  lhsX[2][0][j][k] -= lhsX[3][0][j][k];
  lhsX[2][1][j][k] -= lhsX[3][1][j][k];
  lhsX[2][2][j][k] -= lhsX[3][2][j][k];
  lhsX[2][3][j][k] -= lhsX[3][3][j][k];
  lhsX[2][4][j][k] -= lhsX[3][4][j][k];

  lhsX[4][0][j][k] -= lhsX[3][0][j][k];
  lhsX[4][1][j][k] -= lhsX[3][1][j][k];
  lhsX[4][2][j][k] -= lhsX[3][2][j][k];
  lhsX[4][3][j][k] -= lhsX[3][3][j][k];
  lhsX[4][4][j][k] -= lhsX[3][4][j][k];

  rhs[k][j][0] -= rhs[k][j][1];
  rhs[k][j][1] -= rhs[k][j][2];
}/*end j*/
  }/*end k*/
}
	.cpu cortex-a9
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.file	"bt.c"
	.global	__aeabi_dsub
	.text
	.align	2
	.global	x_solve
	.syntax unified
	.arm
	.fpu softvfp
	.type	x_solve, %function
x_solve:
	@ args = 0, pretend = 0, frame = 216
	@ frame_needed = 1, uses_anonymous_args = 0
	movw	r3, #:lower16:.LANCHOR0
	push	{r4, r5, r6, r7, r8, r9, r10, fp, lr}
	movt	r3, #:upper16:.LANCHOR0
	add	fp, sp, #32

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 09:27:55AM -0400, Jason Merrill wrote:
> On Thu, Nov 3, 2016 at 7:24 AM, Marek Polacek  wrote:
> > On Tue, Nov 01, 2016 at 02:53:58PM +0100, Jakub Jelinek wrote:
> >> On Tue, Nov 01, 2016 at 09:41:20AM -0400, Jason Merrill wrote:
> >> > On Tue, Oct 25, 2016 at 9:59 AM, Marek Polacek  
> >> > wrote:
> >> > > On Mon, Oct 24, 2016 at 04:10:21PM +0200, Marek Polacek wrote:
> >> > >> On Thu, Oct 20, 2016 at 12:28:36PM +0200, Marek Polacek wrote:
> >> > >> > I found a problem with this patch--we can't call 
> >> > >> > do_warn_duplicated_branches in
> >> > >> > build_conditional_expr, because that way the C++-specific codes 
> >> > >> > might leak into
> >> > >> > the hasher.  Instead, I should use operand_equal_p, I think.  Let 
> >> > >> > me rework
> >> > >> > that part of the patch.
> >> >
> >> > Hmm, is there a reason not to use operand_equal_p for
> >> > do_warn_duplicated_branches as well?  I'm concerned about hash
> >> > collisions leading to false positives.
> >>
> >> If the hashing function is iterative_hash_expr / inchash::add_expr, then
> >> that is supposed to pair together with operand_equal_p, we even have
> >> checking verification of that.
> 
> Yes, but there could still be hash collisions; we can't guarantee that
> everything that compares unequal also hashes unequal.

Right, after h0 == h1 is missing && operand_equal_p (thenb, elseb, 0)
or so (the exact last operand needs to be figured out).
OEP_ONLY_CONST is certainly wrong, we want the same VAR_DECLs to mean the
same thing.  0 is a tiny bit better, but still it will give up on e.g. pure
and other calls.  OEP_PURE_SAME is tiny bit better than that, but still
calls with the same arguments to the same function will not be considered
equal, plus likely operand_equal_p doesn't handle STATEMENT_LIST etc.
So maybe we need another OEP_* mode for this.

Jakub


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf
On 2016.11.03 at 14:24 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 01:55:33PM +0100, Markus Trippelsdorf wrote:
> > On 2016.11.03 at 13:32 +0100, Jakub Jelinek wrote:
> > > On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> > > > On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> > > > >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > > > >>I'm concerned about the number of false positives for this warning, 
> > > > >>and
> > > > >>judging by previous discussions, I'm not alone in this. This patch 
> > > > >>limits it
> > > > >>to level 1 (any comment before the case label disables the warning) 
> > > > >>for
> > > > >>cases where the user specified no explicit level. It'll still generate
> > > > >>enough noise that people will be aware of it and can choose whether 
> > > > >>to use a
> > > > >>higher level or not.
> > > > >>
> > > > >>Bootstrapped and tested on x86_64-linux. Ok?
> > > > >
> > > > >I disagree, I'm ok with changing it to 2, but 1 is too much.
> > > > 
> > > > Well, we have data from our own sources where we had to "fix" lots of
> > > > perfectly good code, and also from the Linux kernel. From an earlier
> > > > discussion:
> > > 
> > > That data wasn't really convincing on this.  All it proved is that most of
> > > the cases are (likely) deliberate fall-throughs without any comment
> > > whatsoever, the rest is in the noise.  As one needs to deal with those
> > > where comments are missing altogether, dealing with the noise is 
> > > acceptable.
> > 
> > Without Bernd's patch to set the default to 1 you will drown in false
> > positives once you start using gcc-7 to build a whole distro. On my
> > Gentoo test box anything but level 1 is simply unacceptable, because you
> > will miss important other warnings in the -Wimplicit-fallthrough noise
> > otherwise.
> 
> That is really strange.  First of all, most of packages aren't compiled with
> -Wextra/-W.  And in those that are, are you sure that that no comment at all
> for the implicit fallthroughs isn't significantly more common than the
> set of comments that are accepted by -Wimplicit-fallthrough=1 and not
> accepted by -Wimplicit-fallthrough=2?

I don't have gathered detailed statistics. But for example a simple
/* drop through */ in a package header file will of course cause many
bogus warnings during the build on level 2.
For the Linux kernel false positives decrease ~20% when switching from
level 3 to 1.

-- 
Markus


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Jakub Jelinek
Hi!

FYI, I think it is much more important to get the initial patch in, so
resolve the switch + declarations inside of its body stuff, add testcases
for that and post for re-review and check in.
These optimizations can be done even in stage3.

On Thu, Nov 03, 2016 at 02:34:47PM +0100, Martin Liška wrote:
> I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
> Well, to be honest,
> I still have a feeling that doing the magic with the parallel variable is bit 
> overkill. Maybe
> a new runtime call would make it easier for us.
> 
> However, I still don't fully understand why we want to support just 
> is_gimple_reg variables.
> Let's consider following test-case:
> 
> void foo()
> {
> char *ptr;
>   {
> char my_char[9];
> ptr = &my_char[0];
>   }
> }
> 
> Where I would expect to optimize out:
>   :
>   _5 = (unsigned long) 9;
>   _4 = (unsigned long) &my_char;
>   __builtin___asan_unpoison_stack_memory (_4, _5);
>   _7 = (unsigned long) 9;
>   _6 = (unsigned long) &my_char;
>   __builtin___asan_poison_stack_memory (_6, _7);
>   return;
> 
> where address of my_char is taken in the original source code, while not 
> during tree-ssa
> optimization, where the address is used only by ASAN_MARK calls.

But how would you be able to find out if there isn't any return *ptr; after
the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
into SSA form and you can easily verify (uses of ASAN_POISON are a problem
if they are encountered at runtime).  What would you do for the
must_live_in_memory vars?  Add some pass that detects it, handle it somehow
in addressable pass, handle it in SRA, ... ?

> 
> Doing such transformation can rapidly decrease number of 
> __builtin___asan_{un}poison_stack_memory
> in tramp3d: from ~36K to ~22K.
> 
> Thanks for clarification.
> Martin

Jakub


Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

2016-11-03 Thread Wilco Dijkstra
Hi,

The patch looks correct, however I would suggest to rewrite this bit of the code
urgently in separate patch as it is way too complex to assert it is now bug 
free -
there are too many possible failure scenarios to list... Also it generates quite
inefficient code - pushable_regs should include R0-R3 minus the arguments plus
LR (if LR was already saved).

Also it is important to keep track of which registers still need to be saved, 
and
never use pushable_regs for two separate purposes. Finally it seems a good
idea to check the 2 masks passed to thumb1_emit_multi_reg_push have the
same number of set bits - that would have prevented this bug from ever
happening.

Wilco




Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> I don't have gathered detailed statistics. But for example a simple
> /* drop through */ in a package header file will of course cause many
> bogus warnings during the build on level 2.
> For the Linux kernel false positives decrease ~20% when switching from
> level 3 to 1.

One would have to count only warnings with unique locus (i.e. sort -u them
after grepping them from logs).
But even with 20%, if one spends the energy to analyze the 80%, where
one actually has to analyze the code, just mechanically changing a couple of
common comment kinds into more standardized one isn't going to be
significant.

Jakub


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Bernd Schmidt


On 11/03/2016 02:47 PM, Jakub Jelinek wrote:

On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:

I don't have gathered detailed statistics. But for example a simple
/* drop through */ in a package header file will of course cause many
bogus warnings during the build on level 2.
For the Linux kernel false positives decrease ~20% when switching from
level 3 to 1.


One would have to count only warnings with unique locus (i.e. sort -u them
after grepping them from logs).
But even with 20%, if one spends the energy to analyze the 80%, where
one actually has to analyze the code, just mechanically changing a couple of
common comment kinds into more standardized one isn't going to be
significant.


But it's a completely pointless exercise which we shouldn't impose on 
users unasked.



Bernd



Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 02:48:33PM +0100, Bernd Schmidt wrote:
> 
> On 11/03/2016 02:47 PM, Jakub Jelinek wrote:
> >On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> >>I don't have gathered detailed statistics. But for example a simple
> >>/* drop through */ in a package header file will of course cause many
> >>bogus warnings during the build on level 2.
> >>For the Linux kernel false positives decrease ~20% when switching from
> >>level 3 to 1.
> >
> >One would have to count only warnings with unique locus (i.e. sort -u them
> >after grepping them from logs).
> >But even with 20%, if one spends the energy to analyze the 80%, where
> >one actually has to analyze the code, just mechanically changing a couple of
> >common comment kinds into more standardized one isn't going to be
> >significant.
> 
> But it's a completely pointless exercise which we shouldn't impose on users
> unasked.

It isn't pointless.  Users can have completely unrelated comments in between
case labels, to describe what the case handles etc. and then miss a warning
when the fallthrough is not intentional.

Jakub


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf
On 2016.11.03 at 14:47 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> > I don't have gathered detailed statistics. But for example a simple
> > /* drop through */ in a package header file will of course cause many
> > bogus warnings during the build on level 2.
> > For the Linux kernel false positives decrease ~20% when switching from
> > level 3 to 1.
> 
> One would have to count only warnings with unique locus (i.e. sort -u them
> after grepping them from logs).
> But even with 20%, if one spends the energy to analyze the 80%, where
> one actually has to analyze the code, just mechanically changing a couple of
> common comment kinds into more standardized one isn't going to be
> significant.

I should have written: For the Linux kernel the number of warnings
dropped by 20% (going from level 3 to 1) and all of them turned out to
be false positives. And yes, I have used "sort -u".
I'm not sure if I would call 20% insignificant.

-- 
Markus


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 02:55:03PM +0100, Markus Trippelsdorf wrote:
> On 2016.11.03 at 14:47 +0100, Jakub Jelinek wrote:
> > On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> > > I don't have gathered detailed statistics. But for example a simple
> > > /* drop through */ in a package header file will of course cause many
> > > bogus warnings during the build on level 2.
> > > For the Linux kernel false positives decrease ~20% when switching from
> > > level 3 to 1.
> > 
> > One would have to count only warnings with unique locus (i.e. sort -u them
> > after grepping them from logs).
> > But even with 20%, if one spends the energy to analyze the 80%, where
> > one actually has to analyze the code, just mechanically changing a couple of
> > common comment kinds into more standardized one isn't going to be
> > significant.
> 
> I should have written: For the Linux kernel the number of warnings
> dropped by 20% (going from level 3 to 1) and all of them turned out to
> be false positives. And yes, I have used "sort -u".
> I'm not sure if I would call 20% insignificant.

But we are talking about 2 vs. 1 now, so that is likely smaller than 20%.
Plus what those comments in that 2 vs. 1 set are where the warnings differ,
if they are related to fall through or not.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška
On 11/03/2016 02:44 PM, Jakub Jelinek wrote:
> Hi!
> 
> FYI, I think it is much more important to get the initial patch in, so
> resolve the switch + declarations inside of its body stuff, add testcases
> for that and post for re-review and check in.
> These optimizations can be done even in stage3.

I know that it's much urgent to have it done first. I'm currently testing patch
for the switch + declaration. Hopefully I'll send it today.

> 
> On Thu, Nov 03, 2016 at 02:34:47PM +0100, Martin Liška wrote:
>> I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
>> Well, to be honest,
>> I still have a feeling that doing the magic with the parallel variable is 
>> bit overkill. Maybe
>> a new runtime call would make it easier for us.
>>
>> However, I still don't fully understand why we want to support just 
>> is_gimple_reg variables.
>> Let's consider following test-case:
>>
>> void foo()
>> {
>> char *ptr;
>>   {
>> char my_char[9];
>> ptr = &my_char[0];
>>   }
>> }
>>
>> Where I would expect to optimize out:
>>   :
>>   _5 = (unsigned long) 9;
>>   _4 = (unsigned long) &my_char;
>>   __builtin___asan_unpoison_stack_memory (_4, _5);
>>   _7 = (unsigned long) 9;
>>   _6 = (unsigned long) &my_char;
>>   __builtin___asan_poison_stack_memory (_6, _7);
>>   return;
>>
>> where address of my_char is taken in the original source code, while not 
>> during tree-ssa
>> optimization, where the address is used only by ASAN_MARK calls.
> 
> But how would you be able to find out if there isn't any return *ptr; after
> the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
> into SSA form and you can easily verify (uses of ASAN_POISON are a problem
> if they are encountered at runtime).  What would you do for the
> must_live_in_memory vars?  Add some pass that detects it, handle it somehow
> in addressable pass, handle it in SRA, ... ?

If there's return of *ptr, there must be a &my_char, and it looks
  _4 = MEM[(char *)&my_char];

properly identifies that my_char has address taken.

M.

> 
>>
>> Doing such transformation can rapidly decrease number of 
>> __builtin___asan_{un}poison_stack_memory
>> in tramp3d: from ~36K to ~22K.
>>
>> Thanks for clarification.
>> Martin
> 
>   Jakub
> 



Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou
> I guess it can be done. Currently the expander goes:
> 
> --cut here--
>   /* Check if optab_handler exists for divmod_optab for given mode.  */
>   if (optab_handler (tab, mode) != CODE_FOR_nothing)
> {
>   quotient = gen_reg_rtx (mode);
>   remainder = gen_reg_rtx (mode);
>   expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
> }
> 
>   /* Generate call to divmod libfunc if it exists.  */
>   else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
> targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
>"ient, &remainder);
> 
>   else
> gcc_unreachable ();
> --cut here--
> 
> so, by declaring divmod libfunc, the target also has to provide target hook.

Right, that's why I also suggested a default associated target hook.

-- 
Eric Botcazou


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 03:02:21PM +0100, Martin Liška wrote:
> > But how would you be able to find out if there isn't any return *ptr; after
> > the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
> > into SSA form and you can easily verify (uses of ASAN_POISON are a problem
> > if they are encountered at runtime).  What would you do for the
> > must_live_in_memory vars?  Add some pass that detects it, handle it somehow
> > in addressable pass, handle it in SRA, ... ?
> 
> If there's return of *ptr, there must be a &my_char, and it looks
>   _4 = MEM[(char *)&my_char];
> 
> properly identifies that my_char has address taken.

It doesn't.  MEM_REF's ADDR_EXPR isn't considered to be address taking.

Jakub


Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf
On 2016.11.03 at 14:57 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 02:55:03PM +0100, Markus Trippelsdorf wrote:
> > On 2016.11.03 at 14:47 +0100, Jakub Jelinek wrote:
> > > On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> > > > I don't have gathered detailed statistics. But for example a simple
> > > > /* drop through */ in a package header file will of course cause many
> > > > bogus warnings during the build on level 2.
> > > > For the Linux kernel false positives decrease ~20% when switching from
> > > > level 3 to 1.
> > > 
> > > One would have to count only warnings with unique locus (i.e. sort -u them
> > > after grepping them from logs).
> > > But even with 20%, if one spends the energy to analyze the 80%, where
> > > one actually has to analyze the code, just mechanically changing a couple 
> > > of
> > > common comment kinds into more standardized one isn't going to be
> > > significant.
> > 
> > I should have written: For the Linux kernel the number of warnings
> > dropped by 20% (going from level 3 to 1) and all of them turned out to
> > be false positives. And yes, I have used "sort -u".
> > I'm not sure if I would call 20% insignificant.
> 
> But we are talking about 2 vs. 1 now, so that is likely smaller than 20%.
> Plus what those comments in that 2 vs. 1 set are where the warnings differ,
> if they are related to fall through or not.

It is still a 12% reduction (2 vs. 1). I've posted a list of them in the
older thread.

-- 
Markus


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Eric Botcazou
> FWIW here's a more complete version of my patch which I'm currently
> testing. Let me know if you think it's at least a good enough
> intermediate step to be installed.

It is, thanks.

> I think, the sel-sched example notwithstanding, this is something that
> normally should not be needed outside of emit_copy_of_insn_after, so having
> that do the right thing ought to be good enough.

reload does direct note copying too (in forward order).

-- 
Eric Botcazou


Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Prathamesh Kulkarni
On 3 November 2016 at 18:36, Uros Bizjak  wrote:
> On Thu, Nov 3, 2016 at 1:58 PM, Eric Botcazou  wrote:
>>> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
>>> implemented in libgcc. But the enablement of this function inside the
>>> compiler has to be performed by each target.
>>
>> So can we do it generically instead of duplicating it ~50 times?
>
> I guess it can be done. Currently the expander goes:
>
> --cut here--
>   /* Check if optab_handler exists for divmod_optab for given mode.  */
>   if (optab_handler (tab, mode) != CODE_FOR_nothing)
> {
>   quotient = gen_reg_rtx (mode);
>   remainder = gen_reg_rtx (mode);
>   expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
> }
>
>   /* Generate call to divmod libfunc if it exists.  */
>   else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
> targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
>"ient, &remainder);
>
>   else
> gcc_unreachable ();
> --cut here--
>
> so, by declaring divmod libfunc, the target also has to provide target hook.
>
> Let's ask authors of the original divmod patch for the details.
Yes, in the initial patch, the default version of the hook targeted
generic divmod functions in libgcc,
however since full set of divmod libfuncs wasn't available
(__divmoddi4 for instance),
we had to drop that.

I have couple of concerns:
a) I am not sure if __udivmoddi4() is generically available for all targets.
For aarch64, I can confirm that __udivmoddi4() isn't available (it has
hardware div, so the divmod transform
should never generate call to __udivmoddi4() on aarch64).
Please see: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01239.html
Can we safely assume that a target would have generic divmod libfunc
like __udivmoddi4
available if it doesn't support hardware div insn and doesn't define
target-specific divmod libfuncs ?

b) Targets like AVR, have their own divmod patterns in the backend for
doing the divmod transform
(and haven't registered target-specific divmod libfuncs via
set_optab_libfunc() ).
We would end up generating call to generic divmod libfuncs for these
targets even though
they have target-specific divmod libfuncs available. I wonder if these
targets should now use the
generic divmod transform instead ?

Thanks,
Prathamesh
>
> Uros.


[PATCH GCC]Clean pedantic calls and useless lvalue code in fold_cond_expr_with_comparison

2016-11-03 Thread Bin Cheng
Hi,
According to analysis given by 
https://gcc.gnu.org/ml/gcc/2016-10/msg00228.html, calls to 
pedantic_non_lvalue_loc and code handling lvalue in 
fold_cond_expr_with_comparison are useless now.  Given this is complicated 
legacy code, it may be better to change code step by step, rather than doing 
this cleanup together with moving simplification from 
fold_cond_expr_with_comparison to match.pd.  
BTW, after last cleanup of pedantic_lvalues, function pedantic_non_lvalue_loc 
now has nothing to do with lvalue.  It could be further cleaned up, or at least 
renamed into something else.  This patch doesn't do that because that depends 
on the answer to the question of the aforementioned message.

Bootstrap and test on x86_64 and AArch64.  Any comments?

Thanks,
bin

2016-10-27  Bin Cheng  

* fold-const.c (fold_cond_expr_with_comparison): Remove call
to pedantic_non_lvalue_loc.  Remove useless code for lvalue
where cond_expr can't be a lvalue.diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 89ed89d..2bad470 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -5082,12 +5082,10 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
   case EQ_EXPR:
   case UNEQ_EXPR:
tem = fold_convert_loc (loc, arg1_type, arg1);
-   return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type,
- negate_expr (tem)));
+   return fold_convert_loc (loc, type, negate_expr (tem));
   case NE_EXPR:
   case LTGT_EXPR:
-   return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg1));
+   return fold_convert_loc (loc, type, arg1);
   case UNGE_EXPR:
   case UNGT_EXPR:
if (flag_trapping_math)
@@ -5098,7 +5096,7 @@ fold_cond_expr_with_comparison (location_t loc, tree type,
if (TYPE_UNSIGNED (TREE_TYPE (arg1)))
  break;
tem = fold_build1_loc (loc, ABS_EXPR, TREE_TYPE (arg1), arg1);
-   return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, tem));
+   return fold_convert_loc (loc, type, tem);
   case UNLE_EXPR:
   case UNLT_EXPR:
if (flag_trapping_math)
@@ -5124,7 +5122,7 @@ fold_cond_expr_with_comparison (location_t loc, tree type,
   && integer_zerop (arg01) && integer_zerop (arg2))
 {
   if (comp_code == NE_EXPR)
-   return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg1));
+   return fold_convert_loc (loc, type, arg1);
   else if (comp_code == EQ_EXPR)
return build_zero_cst (type);
 }
@@ -5170,20 +5168,12 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
   tree comp_op1 = arg01;
   tree comp_type = TREE_TYPE (comp_op0);
 
-  /* Avoid adding NOP_EXPRs in case this is an lvalue.  */
-  if (TYPE_MAIN_VARIANT (comp_type) == TYPE_MAIN_VARIANT (type))
-   {
- comp_type = type;
- comp_op0 = arg1;
- comp_op1 = arg2;
-   }
-
   switch (comp_code)
{
case EQ_EXPR:
- return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg2));
+ return fold_convert_loc (loc, type, arg2);
case NE_EXPR:
- return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg1));
+ return fold_convert_loc (loc, type, arg1);
case LE_EXPR:
case LT_EXPR:
case UNLE_EXPR:
@@ -5200,8 +5190,7 @@ fold_cond_expr_with_comparison (location_t loc, tree type,
? fold_build2_loc (loc, MIN_EXPR, comp_type, comp_op0, 
comp_op1)
: fold_build2_loc (loc, MIN_EXPR, comp_type,
   comp_op1, comp_op0);
- return pedantic_non_lvalue_loc (loc,
- fold_convert_loc (loc, type, tem));
+ return fold_convert_loc (loc, type, tem);
}
  break;
case GE_EXPR:
@@ -5216,19 +5205,16 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
? fold_build2_loc (loc, MAX_EXPR, comp_type, comp_op0, 
comp_op1)
: fold_build2_loc (loc, MAX_EXPR, comp_type,
   comp_op1, comp_op0);
- return pedantic_non_lvalue_loc (loc,
- fold_convert_loc (loc, type, tem));
+ return fold_convert_loc (loc, type, tem);
}
  break;
case UNEQ_EXPR:
  if (!HONOR_NANS (arg1))
-   return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type, arg2));
+   return fold_convert_loc (loc, type, arg2);
  break;
case LTGT_EXPR:
  if (!HONOR_NANS (arg1))
-   return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type, arg1));
+   return fold_convert_loc (loc, type, arg1);
  break;
default:
 

[fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+

2016-11-03 Thread Rainer Orth
As I've noticed some time ago, recent versions of Solaris 
include this little gem:

#if __cplusplus >= 201103L
#undef  _GLIBCXX_USE_C99_MATH
#undef  _GLIBCXX_USE_C99_MATH_TR1
#endif

This renders a couple of perfectly good libstdc++ tests as UNSUPPORTED
and is completely unsustainable.  Agreement has now been reached with
the Solaris libm maintainers that this wasn't a good idea and the plan
is to do away with it at least in future Solaris 11.3 SRUs and Solaris 12.

Some digging discovered that we have 3 levels of support in  and
 overloads on Solaris 12
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02075.html

   and the 4.9 backport

[v3, 4.9] Handle C++11  overloads on Solaris 11, 12
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01477.html

   They were introduced in Solaris 10 Patch 11996[67]-02, Solaris 11.3
   SRU 3.6, and Solaris 12 Build 86.

3) Full C++11 math support, including the templates present in libstdc++
   , and the #undef's above.

   Introduced in the same Solaris 10 Patch, Solaris 11.3 SRU 5.6, and
   Solaris 12 Build 90.

To check for potential fallout, I've added a fixincludes fix to remove
the #undefs.  As expected, there was quite a bit, e.g.

+FAIL: 25_algorithms/pop_heap/complexity.cc (test for excess errors)
+WARNING: 25_algorithms/pop_heap/complexity.cc compilation failed to produce 
executable

/vol/gcc/src/hg/trunk/local/libstdc++-v3/testsuite/25_algorithms/pop_heap/complexity.cc:48:
 error: call of overloaded 'log2(const size_t&)' is ambiguous

In file included from 
/var/gcc/regression/trunk/12-gcc/build/gcc/include-fixed/math.h:25,
 from 
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:45,
 from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/include/iso/math_c99.h:474: note: candidate: double std::log2(double)
In file included from 
/var/gcc/regression/trunk/12-gcc/build/gcc/include-fixed/math.h:25,
 from 
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:45,
 from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/include/iso/math_c99.h:763: note: candidate: float std::log2(float)
/usr/include/iso/math_c99.h:805: note: candidate: long double std::log2(long 
double)
/usr/include/iso/math_c99.h:946: note: candidate: typename 
std::__math_impl::__enable_if >::__value, double>::__type std::log2(_Tp) [with _Tp = 
unsigned int; typename 
std::__math_impl::__enable_if >::__value, double>::__type = double]
In file included from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:1530:
 note: candidate: constexpr typename 
__gnu_cxx::__enable_if::__value, double>::__type 
std::log2(_Tp) [with _Tp = unsigned int; typename 
__gnu_cxx::__enable_if::__value, double>::__type = 
double]

due to duplicate log2 templates in  and c_global/cmath,
and many more of the same kind.

Fortunately, this is all easily fixed by wrapping the affected templates
in a new macro.  That's what this patch does.  The new libstdc++
acinclude.m4 test may well need wording changes in comments etc. and can
perhaps be shortened a bit, bit it worked for me.

The fixincludes part passes make check.

The patch has been regtested (C++ only) on

* Solaris 11.2 SRU 13.6 (i.e. without both overloads and templates in
  ),

* Solaris 11.3 SRU 3.6 (i.e. with overloads only), and

* Solaris 12 Build 110 (i.e. with both overloads and templates, and the
  _GLIBCXX_USE_C99_MATH #undef's)

* Linux/x86_64

I've checked that __CORRECT_ISO_CPP11_MATH_H_PROTO[12] were defined in
config.h as expected, and there were no testsuite regressions.  On the
contrary, a couple of tests that before showed up as UNSUPPORTED due to
the _GLIBCXX_USE_C99_MATH* #undef's now PASS as they should.

Ok for mainline now, and for backports to the gcc-6 and gcc-5 branches
after some soak time?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-10-27  Rainer Orth  

libstdc++-v3:
* acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Update comments.
(__CORRECT_ISO_CPP11_MATH_H_PROTO): Rename to ...
(__CORRECT_ISO_CPP11_MATH_H_PROTO1): ... this.
Add test for C++11  templates.
* configure: Regenerate.
* config.h.in: Regenerate.

* include/c_global/cmath [__cplusplus >= 201103L]: Reflect
__CORRECT_ISO_CPP11_MATH_H_PROTO to
__CORRECT_ISO_CPP11_MATH_H_PROTO1 rename.
* include/c_global/cmath [_GLIBCXX_USE_C99_MATH &&
!_GLIBCXX_USE_C99_FP_MACROS_DYNAMIC && __cplusplus >= 201103L]
(std::fpclassify): Wrap in !__CORRECT_ISO_CPP11_MATH_H_PROTO2.
(std::isfinite): Likewise.
(std::isinf): Likewise.
(st

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška
On 11/03/2016 03:03 PM, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 03:02:21PM +0100, Martin Liška wrote:
>>> But how would you be able to find out if there isn't any return *ptr; after
>>> the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
>>> into SSA form and you can easily verify (uses of ASAN_POISON are a problem
>>> if they are encountered at runtime).  What would you do for the
>>> must_live_in_memory vars?  Add some pass that detects it, handle it somehow
>>> in addressable pass, handle it in SRA, ... ?
>>
>> If there's return of *ptr, there must be a &my_char, and it looks
>>   _4 = MEM[(char *)&my_char];
>>
>> properly identifies that my_char has address taken.
> 
> It doesn't.  MEM_REF's ADDR_EXPR isn't considered to be address taking.
> 
>   Jakub
> 

You are of course right, my mistake in the patch draft.

Thanks for clarification,
Martin


Re: [patch, libgomp, OpenACC] Additional enter/exit data map handling

2016-11-03 Thread Chung-Lin Tang

Ping this patch again.

On 2016/9/21 12:43 AM, Cesar Philippidis wrote:
>> +/* Returns the number of mappings associated with the pointer or pset. PSET
>> > +   have three mappings, whereas pointer have two.  */
>> > +
>> >  static int
>> > -find_pset (int pos, size_t mapnum, unsigned short *kinds)
>> > +find_pointer (int pos, size_t mapnum, unsigned short *kinds)
>> >  {
>> >if (pos + 1 >= mapnum)
>> >  return 0;
>> >  
>> >unsigned char kind = kinds[pos+1] & 0xff;
>> >  
>> > -  return kind == GOMP_MAP_TO_PSET;
>> > +  if (kind == GOMP_MAP_TO_PSET)
>> > +return 3;
>> > +  else if (kind == GOMP_MAP_POINTER)
>> > +return 2;
>> > +
>> > +  return 0;
>> >  }
> Is this still necessary with the firstprivatization of subarrays
> pointers? Well, it might be for fortran. Conceptually, the gimplifier
> should prune out those unnecessary firstprivate pointer clauses for
> executable constructs such as enter/exit data and update.

It appears that GOMP_MAP_POINTER/GOMP_MAP_TO_PSET maps are currently
created only from the Fortran FE, so I think your description is accurate.

> Actually, this is one area in the spec where the intent of enter/exit
> data conflicts with what it describes. If you look at the runtime
> documentation for, say, acc_create, it states that
> 
>   acc_create (pvar, n*sizeof(var))
> 
> is equivalent to
> 
>   acc enter data create (pvar[n])
> 
> And to free acc_create, you use acc_delete. So in theory, you should be
> able to
> 
>   #pragma acc enter data create (pvar[n])
>   acc_free (pvar)
> 
> but this may result in a memory leak if the pointer mapping isn't freed.

Upon re-reading the OpenACC spec, it appears that acc_malloc/acc_free are 
supposed
to be "dumb" allocation/deallocation interfaces, i.e. the implementation is 
likely
to be something that directly wires to the alloc_func/free_func plugin hooks.
I don't think it's supposed to be something that works with the enter/exit data 
directives,
or anything that works on the maps managed by libgomp.

Chung-Lin





Re: [match.pd] Fix for PR35691

2016-11-03 Thread Prathamesh Kulkarni
On 3 November 2016 at 16:13, Richard Biener  wrote:
> On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:
>
>> Hi Richard,
>> The attached patch tries to fix PR35691, by adding the following two
>> transforms to match.pd:
>> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
>> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.
>>
>> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
>> it gets transformed to bit_and_expr
>> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:
>>
>> #if GENERIC
>> (for op (truth_andif truth_orif)
>> #elif GIMPLE
>> (for op (bit_and bit_ior)
>> #endif
>>
>> Is that OK ?
>
> As you are not removing the fold-const.c variant I'd say you should
> simply not look for truth_* and only handle GIMPLE.  Note that we
> have tree-ssa-ifcombine.c which should handle the variant with
> control-flow (but I guess it does not and your patch wouldn't help
> it either).
>
> The transform would also work for vectors (element_precision for
> the test but also a value-matching zero which should ensure the
> same number of elements).
Um sorry, I didn't get how to check vectors to be of equal length by a
matching zero.
Could you please elaborate on that ?
Thanks!
>
> Richard.


[PATCH] Make direct emission of time profiler counter

2016-11-03 Thread Martin Liška
Hello.

As Honza noticed we spent quite some time in __gcov_time_profiler:

perf report for Postgres make check command:
   4.10%  postgres  postgres[.]__gcov_time_profiler

Thus I rewrote the profiling code directly to GIMPLE statements:

  _4 = __gcov7.main[0];
  if (_4 == 0)
goto ;
  else
goto ;

  :
  time_profile_5 = __gcov_time_profiler_counter;
  time_profile_6 = time_profile_5 + 1;
  __gcov7.main[0] = time_profile_6;
  __gcov_time_profiler_counter = time_profile_6;

thread-safe variant of the patch:

  _3 = __gcov7.foo[0];
  if (_3 == 0)
goto ;
  else
goto ;

  :
  time_profiler_counter_ptr_4 = (long int) &__gcov_time_profiler_counter;
  time_profile_5 = __atomic_add_fetch_8 (time_profiler_counter_ptr_4, 1, 0);
  time_profile_6 = (long int) time_profile_5;
  __gcov7.foo[0] = time_profile_6;


Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 1a277ec6ef25cddbf0518b679fa8f1928cd3d891 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 27 Oct 2016 09:37:34 +0200
Subject: [PATCH] Make direct emission of time profiler counter

libgcc/ChangeLog:

2016-10-31  Martin Liska  

	* libgcov-profiler.c (__gcov_time_profiler): Remove.
	(__gcov_time_profiler_atomic): Likewise.

gcc/ChangeLog:

2016-10-31  Martin Liska  

	* profile.c (instrument_values): Fix coding style.
	(branch_prob): Use renamed function.
	* tree-profile.c (init_ic_make_global_vars): Likewise.
	(gimple_init_edge_profiler): Rename to
	gimple_init_gcov_profiler.
	tree_time_profiler_counter variable declaration.
	(gimple_gen_time_profiler): Rewrite to do a direct gimple code
	emission.
	* value-prof.h: Remove an argument.

gcc/testsuite/ChangeLog:

2016-11-03  Martin Liska  

	* gcc.dg/no_profile_instrument_function-attr-1.c: Update scanned
	output.
	* gcc.dg/tree-prof/time-profiler-3.c: New test.
---
 gcc/profile.c  |  14 +--
 .../gcc.dg/no_profile_instrument_function-attr-1.c |   2 +-
 gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c   |  22 +
 gcc/tree-profile.c | 106 -
 gcc/value-prof.h   |   5 +-
 libgcc/libgcov-profiler.c  |  23 +
 6 files changed, 111 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c

diff --git a/gcc/profile.c b/gcc/profile.c
index 2564f07..ef38f98 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -192,15 +192,9 @@ instrument_values (histogram_values values)
 	  gimple_gen_ior_profiler (hist, t, 0);
 	  break;
 
-  case HIST_TYPE_TIME_PROFILE:
-{
-  basic_block bb =
- split_edge (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-  gimple_stmt_iterator gsi = gsi_start_bb (bb);
-
-  gimple_gen_time_profiler (t, 0, gsi);
-  break;
-}
+	case HIST_TYPE_TIME_PROFILE:
+	  gimple_gen_time_profiler (t, 0);
+	  break;
 
 	default:
 	  gcc_unreachable ();
@@ -1305,7 +1299,7 @@ branch_prob (void)
 {
   unsigned n_instrumented;
 
-  gimple_init_edge_profiler ();
+  gimple_init_gcov_profiler ();
 
   n_instrumented = instrument_edges (el);
 
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index c93d171..e0c2600 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -19,5 +19,5 @@ int main ()
 
 /* { dg-final { scan-tree-dump-times "__gcov0\\.main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v2" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "__gcov_time_profiler" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c
new file mode 100644
index 000..69ce026
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c
@@ -0,0 +1,22 @@
+/* { dg-options "-O2 -fdump-ipa-profile -fprofile-update=atomic" } */
+/* { dg-require-effective-target profile_update_atomic } */
+
+__attribute__ ((noinline))
+int foo()
+{
+  return 0;
+}
+
+__attribute__ ((noinline))
+int bar()
+{
+  return 1;
+}
+
+int main ()
+{
+  return foo ();
+}
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 0" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 2" 1 "profile"} } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index abeee92..09a702f 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -56,9 +56,9 @@ static GTY(()) tree tree_interval_profiler_fn;
 static GTY(()) tree

[RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Dominik Vogt
I've been trying to fix some bad tree-ssa related optimisation for
s390x and come up with the attached experimental patch.  The patch
is not really good - it breaks some situations in which the
optimisation was useful.  With this code:

  void bar(long);
  void foo (char a)
  {
long l;
char b;

b = a & 63;
l = b;
if (l > 9)
  bar(l);
  }

We get this representation before value range propagation:

  ...
  a_4 = *p_3(D);
  b_5 = a_4 & 63;
  l_6 = (long int) b_5;
  if (l_6 > 9)
  ...

Now, there's some code in tree-vrp.c:simplify_cond_using_ranges()
that folds b_5 into the if condition, because l_6 is just a sign
extension of b_5, and the value range of l_6 can also be
represented by the type of b (char).

  if (b_5 > 9)

(On s390x we end up with "a & 63" stored in two separate
registers, extended to 32 bits in one and to 64 bits in the other,
adding up to two unnecessary instructions.)

A naive idea to prevent folding in this situation was to suppress
it if it would introduce a second use of b_5 (i.e. b_5 was only
used in the cast before) while not eliminating all uses of l_6.
However, calling has_single_use() for both purposes proves to be
not good enough, and VRP does not do this kind of optimisation
yet.  It does not catch cases like

  if (l_6 > 9)
...
  else if (l_6 > 7)
...

where all occurences of l_6 could be replaced, and simply looking
at the use counts is too coarse.

--

Is VRP the right pass to do this optimisation or should a later
pass rather attempt to eliminate the new use of b_5 instead?  Uli
has brought up the idea a mini "sign extend elimination" pass that
checks if the result of a sign extend could be replaced by the
original quantity in all places, and if so, eliminate the ssa
name.  (I guess that won't help with the above code because l is
used also as a function argument.)  How could a sensible approach
to deal with the situation look like?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Dominik Vogt
On Thu, Nov 03, 2016 at 04:03:22PM +0100, Dominik Vogt wrote:
> I've been trying to fix some bad tree-ssa related optimisation for
> s390x and come up with the attached experimental patch.  The patch
> is not really good - it breaks some situations in which the
> optimisation was useful.  With this code:
...

Missing patch attached.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From c1ee9d85c47ce66aa0b5c97739717fefcb07b4ce Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 2 Nov 2016 14:01:46 +0100
Subject: [PATCH] Check number of uses in simplify_cond_using_ranges().

---
 gcc/tree-vrp.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 3c75a0d..e74d935 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9599,7 +9599,12 @@ simplify_cond_using_ranges (gcond *stmt)
 with strict overflow semantics.  */
  && ((!is_negative_overflow_infinity (vr->min)
   && !is_positive_overflow_infinity (vr->max))
- || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (innerop
+ || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (innerop)))
+ /* If the only use of INNEROP is the cast to OP0, and OP0 is also
+used in other places, folding would introduce new uses of the
+otherwise dead INNEROP without eliminating OP0, so do not
+fold.  */
+ && (!has_single_use (innerop) || has_single_use (op0)))
{
  /* If the range overflowed and the user has asked for warnings
 when strict overflow semantics were used to optimize code,
-- 
2.3.0



Re: [PATCH] Make direct emission of time profiler counter

2016-11-03 Thread Jan Hubicka
> 
> 2016-10-31  Martin Liska  
> 
>   * libgcov-profiler.c (__gcov_time_profiler): Remove.
>   (__gcov_time_profiler_atomic): Likewise.
> 
> gcc/ChangeLog:
> 
> 2016-10-31  Martin Liska  
> 
>   * profile.c (instrument_values): Fix coding style.
>   (branch_prob): Use renamed function.
>   * tree-profile.c (init_ic_make_global_vars): Likewise.
>   (gimple_init_edge_profiler): Rename to
>   gimple_init_gcov_profiler.
>   tree_time_profiler_counter variable declaration.
>   (gimple_gen_time_profiler): Rewrite to do a direct gimple code
>   emission.
>   * value-prof.h: Remove an argument.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-03  Martin Liska  
> 
>   * gcc.dg/no_profile_instrument_function-attr-1.c: Update scanned
>   output.
>   * gcc.dg/tree-prof/time-profiler-3.c: New test.

OK,
Thanks!
Honza


Re: [patch,testsuite] Support dg-require-effective-target label_offsets.

2016-11-03 Thread Georg-Johann Lay

On 28.10.2016 17:58, Mike Stump wrote:

On Oct 27, 2016, at 3:16 AM, Georg-Johann Lay  wrote:


Now imagine some arithmetic like &&LAB2 - &&LAB1.  This might result in
one or two stub addresses, and difference between such addresses is a
complete different thing than the difference between the original labels:
The result might differ in absolute value and in sign, i.e. you can't even
determine whether LAB1 or LAB2 comes first in the generated code as the
order of stubs might differ from the order of respective labels.


So, I think this all doesn't matter any.  Every address gs(LAB) fits in
16-bits by definition, and every gs(LAB1) - gs(LAB2) fits into 16 bits and


Yes, you are right.  Label differences could be implemented.  AFAIK there is 
currently no activity by the Binutils folks to add respective support, so it is 
somewhat pointless to add that support to avr-gcc...


Bottom line is that label offsets are not supported by avr BE, and the 
intention of the patch is to reduce FAIL noise in testsuite results because of 
the missing support.


If a dg-require is not appropriate, should I rather add dg-skip-if to every 
test case?  I'd still prefer the dg-require solution because if the Binutils 
will ever support label differences, then the testsuite will automatically 
catch up with that.



thus is valid for all 16-bit one contexts.  The fact the order between the
stub and the actual code is different is irrelevant, it is a private


Only if the code is not executed; there are several test cases that compute 
relative placements of labels like:


x(){if(&&e-&&b<0)x();b:goto*&&b;e:;}

If a test ever relies on the fact that &&e - &&b tells anything about the 
ordering of the labels, the code is about to crash.



implementation detail of the port, the point is the semantics are fixed and
constant and useful.  In deed that there is even a stub is a private
implementation detail of the port.  I think the `extra' helpful warning from
avr_print_operand_address is wrong and should be removed.  Think of the


The following code simple makes absolutely no sense in the presence of stubs:
int
test (int foo)
{
  static void *dummy[] = { &&a, &&b };
  goto *((char *) &&b - 2 * (foo < 0));
a:
b:
  return 0;
}

It's only a compile test (the original PR66123 is about ICE), but the compiler 
cannot know that it's just a crazy test case that won't be executed...


When you are offsetting from a stub you might end up *anywhere*, even in some 
data range.



label as gs(LAB), not LAB, burn LAB from your mind.  Once you do that, you
see you can't talk about the order LAB1 > LAB2, the concept doesn't make any
sense.  The _only_ think you can talk about is gs(LAB1) > gs(LAB2).  And
because of that, it is always consistent and works just fine.

Once that misguided complains from gcc and bisutils are fixed, are their any
failing cases?


State of matters is that Binutils support is missing, and if I understand you 
correctly, dg-require is not appropriate to factor out availability of such 
features?


Johann


Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-03 Thread David Edelsohn
[Please cc me and Segher on PowerPC / AIX patches.]

I bootstrapped successfully with the patch and ran the testsuite.

Without patch:
https://gcc.gnu.org/ml/gcc-testresults/2016-11/msg00186.html

With patch:
https://gcc.gnu.org/ml/gcc-testresults/2016-11/msg00233.html

Not exactly the same revision, but close enough and no differences.

The patch doesn't break anything, but we still need to understand if
it is correct.

Thanks, David


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Eric Botcazou
> Is VRP the right pass to do this optimisation or should a later
> pass rather attempt to eliminate the new use of b_5 instead?  Uli
> has brought up the idea a mini "sign extend elimination" pass that
> checks if the result of a sign extend could be replaced by the
> original quantity in all places, and if so, eliminate the ssa
> name.

Did you try to enable -free on s390x?  It's a RTL pass.

-- 
Eric Botcazou


Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-03 Thread Segher Boessenkool
On Thu, Nov 03, 2016 at 09:29:07AM +0100, Richard Biener wrote:
> On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool
>  wrote:
> > On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
> >> >> I don't believe it needs a cleanup on every iteration. One cleanup at
> >> >> the end should work fine.
> >> >
> >> > But as the comment there says:
> >> >
> >> >   /* Merge the duplicated blocks into predecessors, when possible.  
> >> > */
> >> >   cleanup_cfg (0);
> >> >
> >> > (this is not a new comment), and without merging blocks this whole
> >> > patch does zilch.
> >> >
> >> > There is no need to create new basic blocks at all (can replace the
> >> > final branch in a block with a copy of the whole block it jumps to,
> >> > instead); and then it is painfully obvious that switching to layout
> >> > mode here is pointless, too.
> >> >
> >> > Do you want me to do that?
> >> >
> >> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
> >> > length allowed of the block to copy) linear.  Unless there are 
> >> > unboundedly
> >> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
> >>
> >> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) 
> >> that
> >> walk the whole function.  So unless the number of iterations is bound
> >> with a constant I call this quadratic (in function size).
> >
> > It is bounded (with that caveat above): new candidates will be bigger than
> > the block merged into it, so there won't be more than
> > PARAM_MAX_GOTO_DUPLICATION_INSNS passes.
> >
> > But I can make it all work simpler, in non-layout mode, if you prefer.
> 
> Iteration is fine but we should restrict where we look for new
> candidates.  Likewise
> block merging opportunities should be easier to exploit than by
> running cfg-cleanup...

Yes, but that was what the original code does already, so I didn't look
deeper.  "It was just a simple bugfix".

> I'm just thinking of those gigantic machine-generated state machines where we
> ATM do quite well (at least at -O1 ...) with respect to compile-time.

Like I said, I tested it on a 2000 node VM, very artificial so almost
no code except the threading itself, and the compgotos pass takes less
than 1% time both before and after the patch.  I ony tested at -O2 though.

I'll get back to it, but first I have some things that need handling during
stage 1.


Segher


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Dominik Vogt
On Thu, Nov 03, 2016 at 04:29:05PM +0100, Eric Botcazou wrote:
> > Is VRP the right pass to do this optimisation or should a later
> > pass rather attempt to eliminate the new use of b_5 instead?  Uli
> > has brought up the idea a mini "sign extend elimination" pass that
> > checks if the result of a sign extend could be replaced by the
> > original quantity in all places, and if so, eliminate the ssa
> > name.
> 
> Did you try to enable -free on s390x?  It's a RTL pass.

Just tried it, but it does not seem to make a difference:

  Trying to eliminate extension:
  (insn 8 6 10 2 (set (reg:SI 1 %r1 [orig:63 b+-3 ] [63])
  (zero_extend:SI (reg/v:QI 2 %r2 [orig:60 b ] [60]))) y.c:9
  1258 {*zero_extendqisi2_extimm}
   (nil))
  Elimination opportunities = 1 realized = 0

I guess when in Rtl format, the code is already too convoluted to
be cleaned up.  Before combine - which should make use of some
advanced s390x instruction (risbg) but doesn't - Rtl is this:

(insn 2 4 3 2 (set (reg/v:DI 62 [ a+-7 ])
(reg:DI 2 %r2 [ a+-7 ])) y.c:3 1074 {*movdi_64}
# (SImode)(a & 63)
(insn 6 3 8 2 (parallel [
(set (subreg:SI (reg/v:QI 60 [ b ]) 0)
(and:SI (subreg/s/v:SI (reg/v:DI 62 [ a+-7 ]) 4)
(const_int 63 [0x3f])))
(clobber (reg:CC 33 %cc))
]) y.c:7 1526 {*andsi3_zarch}
(insn 8 6 9 2 (set (reg:SI 63 [ b+-3 ])
(zero_extend:SI (reg/v:QI 60 [ b ]))) y.c:9 1258
{*zero_extendqisi2_extimm}
# Compare quantity in SImode
(insn 9 8 10 2 (set (reg:CCU 33 %cc)
(compare:CCU (reg:SI 63 [ b+-3 ])
(const_int 9 [0x9]))) y.c:9 1047 {*cmpsi_ccu}
(jump_insn 10 9 11 2 (set (pc)
(if_then_else (leu (reg:CCU 33 %cc)
(const_int 0 [0]))
(label_ref:DI 18)
(pc))) y.c:9 1706 {*cjump_64}
# Extend to DImode
(insn 12 11 13 3 (set (reg:DI 64 [ l ])
(zero_extend:DI (reg/v:QI 60 [ b ]))) y.c:8 1257
{*zero_extendqidi2_extimm}
# Load DImode function argument for call to bar()
(insn 13 12 14 3 (set (reg:DI 2 %r2)
(reg:DI 64 [ l ])) y.c:10 1074 {*movdi_64}
...

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH, RFC] Improve ivopts group costs

2016-11-03 Thread Bin.Cheng
On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov  wrote:
> Hello,
>
> I'm facing the following problem related to ivopts. The problem is that GCC
> generates a lot of induction variables and as a result there is an
> unnecessary increase of stack usage and register pressure.
>
> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26
> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for
> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target
> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm
> attaching assembly I get on current trunk.
>
> The reason might be in use groups costs, in particular, in the way of
> estimation. Currently, the cost of a tuple (group, candidate) is a sum of
> per-use costs in the group. So, the cost of a group grows proportional to
> the number of uses in the group. This approach has a negative effect on the
> algorithm for finding the best set of induction variables: the part of a
> total cost related to adding a new candidate is almost washed out by the
> cost of the group. In addition, when there is a lot of groups with many uses
> inside and a target is out of free registers, the cost of spill is washing
> out too. As a result, GCC prefers to use native candidates (candidate
> created for a particular group) for each group of uses instead of
> considering the real cost of introducing a new variable into a set.
>
> The summing approach was added as a part of this patch
> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the
> motivation for taking the sum does not seem to be
> specifically discussed.
>
> I propose the following patch that changes a group cost from cost summing to
> selecting the largest one inside a group. For the given test case I have:
> necessary size of stack is decreased by almost 3 times and ldr\str amount
> are decreased by less than 2 times. Also I'm attaching assembly after
> applying the patch.
>
> The essential change in the patch is just:
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index f9211ad..a149418 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data
> *data,
>  offset and where the iv_use is.  */
> cost = get_computation_cost (data, next, cand, true,
>  NULL, &can_autoinc, NULL);
> -  sum_cost += cost;
> +  if (sum_cost < cost)
> +sum_cost = cost;
>  }
>set_group_iv_cost (data, group, cand, sum_cost, depends_on,
>  NULL_TREE, ERROR_MARK, inv_expr);
>
> Any suggestions?
Hi Evgeny,

I tend to be practical here.  Given cost is not model that well in
IVOPT, any heuristic tuning in cost is quite likely resulting in
improvement in some cases, while regressions in others.  At least we
need to run good number of benchmarks for such changes.  Specific to
this one, the summary of cost in a group is a natural result of the
original code, in which uses' cost is added up before candidate
selection.  Take your code as an example, choosing loop's original
candidate for group results in lots of memory accesses with [base +
index << scale] addressing mode, which could have higher cost than
[base + offset] mode wrto u-arch, IMHO, it makes sense to add up this
cost.  OTOH, I totally agree that IVOPT tends to choose too many
candidates at the moment, especially for large loops.  Is it possible
to achieve the same effect by penalizing register pressure cost?
Meanwhile, I can collect benchmark data for your patch on AArch64 and
get back to you later.

Thanks,
bin
>
>
> Thanks,
> Evgeny.


[PATCH] combine lhs zero_extract fix (PR78186)

2016-11-03 Thread Segher Boessenkool
I managed to forget to mask the thing inserted.  Tested on powerpc64-linux
{-m32,-m64}, and Bin tested on arm.  Applying to trunk.


Segher


2016-11-03  Segher Boessenkool  

PR rtl-optimization/78186
* combine.c (change_zero_ext): Mask the RHS of a zero_extract as
well, when converting to IOR.

---
 gcc/combine.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 7c21fe4..7ed0a62 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11224,6 +11224,9 @@ change_zero_ext (rtx pat)
   rtx x = gen_rtx_AND (mode, reg, immed_wide_int_const (mask, mode));
   rtx y = simplify_gen_binary (ASHIFT, mode, SET_SRC (pat),
   GEN_INT (offset));
+  wide_int mask2 = wi::shifted_mask (offset, width, false, reg_width);
+  y = simplify_gen_binary (AND, mode, y,
+  immed_wide_int_const (mask2, mode));
   rtx z = simplify_gen_binary (IOR, mode, x, y);
   SUBST (SET_DEST (pat), reg);
   SUBST (SET_SRC (pat), z);
-- 
1.9.3



[SPARC] Remove couple of obsolete patterns

2016-11-03 Thread Eric Botcazou
The vec_interleave_{low, high} standard patterns were removed 5 years ago.

Tested on SPARC/Solaris, applied on the mainline.


2016-11-03  Eric Botcazou  

* config/sparc/sparc.md (vec_interleave_lowv8qi): Delete.
(vec_interleave_highv8qi): Likewise.

-- 
Eric BotcazouIndex: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 241808)
+++ config/sparc/sparc.md	(working copy)
@@ -8844,34 +8844,6 @@ (define_insn "fpmerge_vis"
  [(set_attr "type" "fga")
   (set_attr "fptype" "double")])
 
-(define_insn "vec_interleave_lowv8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=e")
-(vec_select:V8QI
-  (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
-(match_operand:V8QI 2 "register_operand" "f"))
-  (parallel [(const_int 0) (const_int 8)
- (const_int 1) (const_int 9)
- (const_int 2) (const_int 10)
-		 (const_int 3) (const_int 11)])))]
- "TARGET_VIS"
- "fpmerge\t%L1, %L2, %0"
- [(set_attr "type" "fga")
-  (set_attr "fptype" "double")])
-
-(define_insn "vec_interleave_highv8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=e")
-(vec_select:V8QI
-  (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
-(match_operand:V8QI 2 "register_operand" "f"))
-  (parallel [(const_int 4) (const_int 12)
- (const_int 5) (const_int 13)
- (const_int 6) (const_int 14)
-		 (const_int 7) (const_int 15)])))]
- "TARGET_VIS"
- "fpmerge\t%H1, %H2, %0"
- [(set_attr "type" "fga")
-  (set_attr "fptype" "double")])
-
 ;; Partitioned multiply instructions
 (define_insn "fmul8x16_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")


Re: [PATCH] Add mcpu flag for Qualcomm falkor core

2016-11-03 Thread Andrew Pinski
On Thu, Nov 3, 2016 at 3:03 AM, Siddhesh Poyarekar
 wrote:
> This adds an mcpu option for the upcoming Qualcomm Falkor core.  This
> is identical to the qdf24xx part that was added earlier and hence
> retains the same tuning structure and continues to have the a57
> pipeline for now.  The part number has also been changed and this
> patch fixes this for both qdf24xx and falkor options.
>
> Tested aarch64-linux-gnu and arm-linux-gnuabihf and did not find any
> regressions resulting from this patch.


This patch no longer applies after the recent changes (starting around
a month ago) to aarch64-cores.def.  Please updat the patch for the
recent changes

Thanks,
Andrew Pinski

>
> Siddhesh
>
> * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part
> number.
> (falkor): New core.
> * gcc/config/aarch64/aarch64-tune.md: Regenerated.
> * gcc/config/arm/arm-cores.def (falkor): New core.
> * gcc/config/arm/arm-tables.opt: Regenerated.
> * gcc/config/arm/arm-tune.md: Regenerated.
> * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support.
> * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor
> support.
> * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor.
> (ARM Options/-mtune): Add falkor.
>
> ---
>  gcc/config/aarch64/aarch64-cores.def | 3 ++-
>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>  gcc/config/arm/arm-cores.def | 1 +
>  gcc/config/arm/arm-tables.opt| 3 +++
>  gcc/config/arm/arm-tune.md   | 2 +-
>  gcc/config/arm/bpabi.h   | 2 ++
>  gcc/config/arm/t-aprofile| 1 +
>  gcc/doc/invoke.texi  | 9 +
>  8 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index d9da257..0aad9a9 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -46,7 +46,8 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  
> AARCH64_FL_FOR_ARCH8 | AA
>  AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
>  AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
>  AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
> -AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0x800")
> +AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0xC00")
> +AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0xC00")
>  AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
>  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
> xgene1, "0x50", "0x000")
>
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 022b131..29afcdf 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> +   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
> index 2072e1e..1a382e1 100644
> --- a/gcc/config/arm/arm-cores.def
> +++ b/gcc/config/arm/arm-cores.def
> @@ -174,6 +174,7 @@ ARM_CORE("cortex-a72",  cortexa72, cortexa57,   8A,   
>   ARM_FSET_MAKE_CPU1 (FL_LDSCHED
>  ARM_CORE("cortex-a73", cortexa73, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
>  ARM_CORE("exynos-m1",  exynosm1,  exynosm1,8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
>  ARM_CORE("qdf24xx",qdf24xx,   cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
> +ARM_CORE("falkor", falkor,cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
>  ARM_CORE("xgene1",  xgene1,xgene1,  8A,ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
>
>  /* V8 big.LITTLE implementations */
> diff --git a/gcc/config/arm

[openacc] add warnings for unused parallelism

2016-11-03 Thread Cesar Philippidis
OpenACC permits the user to request more gang, worker and vector level
parallelism than what the compiler can utilize. For instance, if the
user writes worker routine without including a worker-partitioned loop,
the compiler will not generate worker-partitioned code for that function.

The intent behind this patch is to warn the user of any potentially
unutilized parallelism as a debugging aid. Users often find it
disconcerting when their code doesn't speed up despite explicitly
setting num_gangs, num_workers and vector_length. This patch at least
warns them not to expect parallelism across a specific axis.

Is this patch OK for trunk? This patch was originally posted by Nathan
here . Most of
the changes in that patch are already in trunk.

Cesar
2016-11-03  Cesar Philippidis  
	Nathan Sidwell 

	gcc/
	* omp-low.c (oacc_validate_dims): Emit warnings about strange
	partitioning choices.

	gcc/testsuite/
	* c-c++-common/goacc/pr70688.c (parallel_reduction): Adjust expected
	warnings.
	* c-c++-common/goacc/routine-1.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* gcc.dg/goacc/loop-processing-1.c (void vector_1): Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/routine-4.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/vector_length.f90: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/crash-1.c: Adjust to account
	for insufficient parallelism warnings.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e5b9e4c..cbf4f3e 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -18882,6 +18882,36 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
   pos = TREE_CHAIN (pos);
 }
 
+  bool check = true;
+#ifdef ACCEL_COMPILER
+  /* When device_type is implemented, we should also check on the
+ target, if device_type has been used to affect the partitioning
+ and/or dimensions.  */
+  check = false;
+#endif
+  if (!is_kernel && check)
+{
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+	{"gang", "worker", "vector" };
+  for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+	if (dims[ix] < 0)
+	  ; /* Defaulting axis.  */
+	else if ((used & GOMP_DIM_MASK (ix)) && dims[ix] == 1)
+	  /* There is partitioned execution, but the user requested a
+	 dimension size of 1.  They're probably confused.  */
+	  warning_at (DECL_SOURCE_LOCATION (fn), 0,
+		  "region contains %s partitoned code but"
+		  " is not %s partitioned", axes[ix], axes[ix]);
+	else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+	  /* The dimension is explicitly partitioned to non-unity, but
+	 no use is made within the region.  */
+	  warning_at (DECL_SOURCE_LOCATION (fn), 0,
+		  "region is %s partitioned but"
+		  " does not contain %s partitioned code",
+		  axes[ix], axes[ix]);
+}
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level);
 
   /* Default anything left to 1 or a partitioned default.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/pr70688.c b/gcc/testsuite/c-c++-common/goacc/pr70688.c
index 5a23665..37c3885 100644
--- a/gcc/testsuite/c-c++-common/goacc/pr70688.c
+++ b/gcc/testsuite/c-c++-common/goacc/pr70688.c
@@ -1,3 +1,5 @@
+/* { dg-compile } */
+
 const int n = 100;
 
 int
@@ -21,7 +23,7 @@ parallel_reduction ()
 
 #pragma acc data copy (dummy)
   {
-#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)
+#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum) /* { dg-warning "region is gang partitioned" } */
 {
   int v = 5;
   sum += 10 + v;
@@ -36,11 +38,11 @@ main ()
 {
   int i, s = 0;
 
-#pragma acc parallel num_gangs (10) copy (s) reduction (+:s)
+#pragma acc parallel num_gangs (10) copy (s) reduction (+:s) /* { dg-warning "region is gang partitioned" } */
   fo

Re: [ipa-vrp] ice in set_value_range

2016-11-03 Thread Martin Jambor
Hi,

On Fri, Oct 28, 2016 at 01:58:13PM +1100, kugan wrote:
> > Do I understand it correctly that extract_range_from_unary_expr deals
> > with any potential type conversions better (compared to what you did
> > before here)?
> 
> Yes, this can be wrong at times too as reported in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121. I have separated this
> part of the patch with a testcase.
> 
> Please note that I am using fold_convert in the attached patch.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions. Is this OK for trunk?
> 

I have no objections, but we need to wait for Honza.

Thanks,

Martin

> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2016-10-28  Kugan Vivekanandarajah  
> 
>   PR ipa/78121
>   * ipa-cp.c (propagate_vr_accross_jump_function): Pass param type.
>   Also fold constant passed as argument while computing value range.
>   (propagate_constants_accross_call): Pass param type.
>   * ipa-prop.c: export ipa_get_callee_param_type.
>   * ipa-prop.h: export ipa_get_callee_param_type.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-10-28  Kugan Vivekanandarajah  
> 
>   PR ipa/78121
>   * gcc.dg/ipa/pr78121.c: New test.



[PATCH] Print repeated rtl vector elements in a nicer way

2016-11-03 Thread Martin Jambor
Hi,

this patch comes from our GCN BE branch.  Because vectors of that
architectures have many elements, RTL dumps can get quite unwieldy when
all elements are printed out all the time.  However, pretty much all the
time it is the same value repeated over and over again, which lead us to
the following patch which just prints how many times a given value
occurs.

Honza has asked me yesterday to submit this hunk upstream so here it
is.  It has passed bootstrap and testing on x86_64-linux and
aarch64-linux, OK for trunk?

Thanks,

Martin


2016-11-02  Jan Hubicka  
Martin Jambor  

* print-rtl.c (print_rtx_operand_codes_E_and_V): Print how many times
an element is repeated istead of printing each repeated element.
---
 gcc/print-rtl.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 341ecdf..9752c85 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -252,7 +252,20 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
m_sawclose = 1;
 
   for (int j = 0; j < XVECLEN (in_rtx, idx); j++)
-   print_rtx (XVECEXP (in_rtx, idx, j));
+   {
+ int j1;
+
+ print_rtx (XVECEXP (in_rtx, idx, j));
+ for (j1 = j + 1; j1 < XVECLEN (in_rtx, idx); j1++)
+   if (XVECEXP (in_rtx, idx, j) != XVECEXP (in_rtx, idx, j1))
+ break;
+
+ if (j1 != j + 1)
+   {
+ fprintf (m_outfile, " repeated %ix", j1 - j);
+ j = j1;
+   }
+   }
 
   m_indent -= 2;
 }
-- 
2.10.1



[PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those (take 2)

2016-11-03 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 01:19:09PM -0400, Jason Merrill wrote:
> On Wed, Nov 2, 2016 at 12:33 PM, Jakub Jelinek  wrote:
> > On Wed, Nov 02, 2016 at 04:44:05PM +0100, Jakub Jelinek wrote:
> >> which means if gen_type_die or gen_type_die_with_usage doesn't
> >> use the langhook, then we'd emit a completely useless { __pfn; __delta }
> >> struct into debug info first, and then in modified_type_die used
> >> the langhook, get OFFSET_TYPE and probably create the
> >> DW_TAG_ptr_to_member_type.  So I think we really need that.
> >>
> >> > > 2) it is used for something Ada-ish I really don't know how to test 
> >> > > etc.
> >> > >to be able to find out if it is safe to call it in
> >> > >gen_type_die_with_usage too
> >> >
> >> > You could find an Ada test that uses the code and verify that the
> >> > output stays the same?
> >>
> >> I can try to find the patch that introduced it and if it contained any
> >> testcases.
> >
> > I couldn't find any unfortunately.  Pierre, could you please test if the
> > following patch doesn't regress anything in the Ada debug info area?
> >
> > Here is updated patch I'm going to bootstrap/regtest; it generates the
> > same debuginfo on ref-3.C testcase.
> 
> Looks good.

Keith's testing revealed a bug in the patch, that we don't emit
DW_TAG_typedef for PMF typedefs.  Fixed by adding && !typedef_variant_p (type)
check to the cp_get_debug_type hook, so that we emit the proper typedef
with the right name and only its DECL_ORIGINAL_TYPE is replaced with
OFFSET_TYPE handling.  I believe that was the only issue, so I think it
should be ok to enable it always, not just for -gdwarf-5.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-03  Jakub Jelinek  
Alexandre Oliva  
Jason Merrill  

PR debug/28767
PR debug/56974
* langhooks.h (struct lang_hooks_for_types): Document type_hash_eq
being also called on METHOD_TYPEs.  Add type_dwarf_attribute langhook.
* langhooks.c (lhd_type_dwarf_attribute): New function.
* langhooks-def.h (lhd_type_dwarf_attribute): Declare.
(LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Add
LANG_HOOKS_TYPE_DWARF_ATTRIBUTE.
* tree.h (check_lang_type): Declare.
* tree.c (check_lang_type): New function.
(check_qualified_type, check_aligned_type): Call it.
* dwarf2out.c (modified_type_die): Don't use type_main_variant
for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with
check_base_type and check_lang_type.
(gen_ptr_to_mbr_type_die): If lookup_type_die is already non-NULL,
return early.  For pointer-to-data-member add DW_AT_use_location
attribute.
(gen_subroutine_type_die): Add DW_AT_{,rvalue_}reference attribute
if needed.
(gen_type_die_with_usage): Don't use type_main_variant
for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with
check_base_type and check_lang_type.  Formatting fixes. Call
get_debug_type langhook.
cp/
* tree.c (cp_check_qualified_type): Use check_base_type and
TYPE_QUALS comparison instead of check_qualified_type.
(cxx_type_hash_eq): Return false if type_memfn_rqual don't match.
* cp-objcp-common.c (cp_get_debug_type): New function.
(cp_decl_dwarf_attribute): Don't handle types here.
(cp_type_dwarf_attribute): New function.
* cp-objcp-common.h (cp_get_debug_type, cp_type_dwarf_attribute):
Declare.
(LANG_HOOKS_GET_DEBUG_TYPE, LANG_HOOKS_TYPE_DWARF_ATTRIBUTE):
Define.
testsuite/
* g++.dg/debug/dwarf2/ptrdmem-1.C: New test.
* g++.dg/debug/dwarf2/ref-3.C: New test.
* g++.dg/debug/dwarf2/ref-4.C: New test.
* g++.dg/debug/dwarf2/refqual-1.C: New test.
* g++.dg/debug/dwarf2/refqual-2.C: New test.

--- gcc/langhooks.h.jj  2016-11-03 08:47:59.373747992 +0100
+++ gcc/langhooks.h 2016-11-03 13:08:27.003538880 +0100
@@ -120,7 +120,7 @@ struct lang_hooks_for_types
   /* Return TRUE if TYPE1 and TYPE2 are identical for type hashing purposes.
  Called only after doing all language independent checks.
  At present, this function is only called when both TYPE1 and TYPE2 are
- FUNCTION_TYPEs.  */
+ FUNCTION_TYPE or METHOD_TYPE.  */
   bool (*type_hash_eq) (const_tree, const_tree);
 
   /* Return TRUE if TYPE uses a hidden descriptor and fills in information
@@ -162,6 +162,10 @@ struct lang_hooks_for_types
  for the debugger about scale factor, etc.  */
   bool (*get_fixed_point_type_info) (const_tree,
 struct fixed_point_type_info *);
+
+  /* Returns -1 if dwarf ATTR shouldn't be added for TYPE, or the attribute
+ value otherwise.  */
+  int (*type_dwarf_attribute) (const_tree, int);
 };
 
 /* Language hooks related to decls and the symbol table.  */
--- gcc/langhooks.c.jj  2016-11-03 08

Re: [PATCH] Print repeated rtl vector elements in a nicer way

2016-11-03 Thread Bernd Schmidt

On 11/03/2016 05:35 PM, Martin Jambor wrote:


* print-rtl.c (print_rtx_operand_codes_E_and_V): Print how many times
an element is repeated istead of printing each repeated element.


"instead"


---
 gcc/print-rtl.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 341ecdf..9752c85 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -252,7 +252,20 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
m_sawclose = 1;

   for (int j = 0; j < XVECLEN (in_rtx, idx); j++)
-   print_rtx (XVECEXP (in_rtx, idx, j));
+   {
+ int j1;
+
+ print_rtx (XVECEXP (in_rtx, idx, j));
+ for (j1 = j + 1; j1 < XVECLEN (in_rtx, idx); j1++)
+   if (XVECEXP (in_rtx, idx, j) != XVECEXP (in_rtx, idx, j1))
+ break;
+
+ if (j1 != j + 1)
+   {
+ fprintf (m_outfile, " repeated %ix", j1 - j);
+ j = j1;
+   }
+   }


Good idea, but can you give an example of how this looks in practice? 
Also, it would be nice (and necessary for David's rtl-testing) to also 
teach the rtl reader to parse this format.



Bernd




[PATCH] Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)

2016-11-03 Thread Jakub Jelinek
Hi!

My recent optimize_abbrev_table optimization apparently broke Solaris
bootstrap.
The bug is specific I think just to -gdwarf-{2,3} -gno-strict-dwarf, where
DW_FORM_exprloc can't be used for location expressions and some location
expression contains DW_OP_GNU_{{const,regval,deref}_type,convert,reinterpret}
and there are at least 128 die abbreviations used by more than one DIE.
Because of the lack of DW_FORM_exprloc we need to decide on
DW_FORM_block{1,2,4} and size the location expression, for that we need to know
the DIE offsets of base offset DIEs that are referenced (earlier code
ensures such types appear very early in the CU, right after the
DW_TAG_compile_unit DIE) and for that their size and their abbreviation values
need to be constant, so the optimize_abbrev_table opts that reorder abbreviation
numbers by decreasing usage count or for -gdwarf-5 attempt to optimize implicit
constants can't be done for the CU and base types.
For -gdwarf-4 and above, we can emit DW_FORM_exprloc, so value_format doesn't
need to compute any sizes and thus calc_base_type_die_sizes shouldn't be
called.

Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly
tested it on Solaris, ok for trunk?

2016-11-03  Jakub Jelinek  

* dwarf2out.c (size_of_discr_list): Fix typo in function comment.

PR debug/78191
* dwarf2out.c (abbrev_opt_base_type_end): New variable.
(die_abbrev_cmp): Sort dies with die_abbrev smaller than
abbrev_opt_base_type_end only by increasing die_abbrev, before
any other dies.
(optimize_abbrev_table): Don't change abbrev numbers of
base types and CU or optimize implicit consts in them if
calc_base_type_die_sizes has been called during build_abbrev_table.
(calc_base_type_die_sizes): If abbrev_opt_start, set
abbrev_opt_base_type_end to one plus largest base type's
die_abbrev.

--- gcc/dwarf2out.c.jj  2016-11-03 08:47:59.0 +0100
+++ gcc/dwarf2out.c 2016-11-03 12:26:03.192459170 +0100
@@ -1909,7 +1909,7 @@ size_of_discr_value (dw_discr_value *dis
 return size_of_sleb128 (discr_value->v.sval);
 }
 
-/* Return the size of the value in a DW_discr_list attribute.  */
+/* Return the size of the value in a DW_AT_discr_list attribute.  */
 
 static int
 size_of_discr_list (dw_discr_list_ref discr_list)
@@ -8548,6 +8548,11 @@ optimize_external_refs (dw_die_ref die)
 /* First abbrev_id that can be optimized based on usage.  */
 static unsigned int abbrev_opt_start;
 
+/* Maximum abbrev_id of a base type plus one (we can't optimize DIEs with
+   abbrev_id smaller than this, because they must be already sized
+   during build_abbrev_table).  */
+static unsigned int abbrev_opt_base_type_end;
+
 /* Vector of usage counts during build_abbrev_table.  Indexed by
abbrev_id - abbrev_opt_start.  */
 static vec abbrev_usage_count;
@@ -8646,12 +8651,16 @@ die_abbrev_cmp (const void *p1, const vo
   gcc_checking_assert (die1->die_abbrev >= abbrev_opt_start);
   gcc_checking_assert (die2->die_abbrev >= abbrev_opt_start);
 
-  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
-  > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
-return -1;
-  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
-  < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
-return 1;
+  if (die1->die_abbrev >= abbrev_opt_base_type_end
+  && die2->die_abbrev >= abbrev_opt_base_type_end)
+{
+  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
+ > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
+   return -1;
+  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
+ < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
+   return 1;
+}
 
   /* Stabilize the sort.  */
   if (die1->die_abbrev < die2->die_abbrev)
@@ -8729,10 +8738,12 @@ optimize_abbrev_table (void)
   sorted_abbrev_dies.qsort (die_abbrev_cmp);
 
   unsigned int abbrev_id = abbrev_opt_start - 1;
-  unsigned int first_id = 0;
+  unsigned int first_id = ~0U;
   unsigned int last_abbrev_id = 0;
   unsigned int i;
   dw_die_ref die;
+  if (abbrev_opt_base_type_end > abbrev_opt_start)
+   abbrev_id = abbrev_opt_base_type_end - 1;
   /* Reassign abbreviation ids from abbrev_opt_start above, so that
 most commonly used abbreviations come first.  */
   FOR_EACH_VEC_ELT (sorted_abbrev_dies, i, die)
@@ -8740,10 +8751,15 @@ optimize_abbrev_table (void)
  dw_attr_node *a;
  unsigned ix;
 
+ /* If calc_base_type_die_sizes has been called, the CU and
+base types after it can't be optimized, because we've already
+calculated their DIE offsets.  We've sorted them first.  */
+ if (die->die_abbrev < abbrev_opt_base_type_end)
+   continue;
  if (die->die_abbrev != last_abbrev_id)
{
  last_abbrev_id = die->die_abbr

[PATCH, GCC/ARM] Fix PR77904: callee-saved register trashed when clobbering sp

2016-11-03 Thread Thomas Preudhomme

Hi,

When using a callee-saved register to save the frame pointer the Thumb-1 
prologue fails to save the callee-saved register before that. For ARM and 
Thumb-2 targets the frame pointer is handled as a special case but nothing is 
done for Thumb-1 targets. This patch adds the same logic for Thumb-1 targets.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): mark frame pointer
in save register mask if it is needed.


*** gcc/testsuite/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* gcc.target/arm/pr77904.c: New test.


Testing: Testsuite shows no regression when run with arm-none-eabi GCC 
cross-compiler for Cortex-M0 target.


Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd8d5e5db8ca50daab648e58df290969aa794862..c7bf3320a3db5dfc4f33ae145ff2e5f239d6c0f9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19495,6 +19495,10 @@ thumb1_compute_save_reg_mask (void)
 if (df_regs_ever_live_p (reg) && callee_saved_reg_p (reg))
   mask |= 1 << reg;
 
+  /* Handle the frame pointer as a special case.  */
+  if (frame_pointer_needed)
+mask |= 1 << HARD_FRAME_POINTER_REGNUM;
+
   if (flag_pic
   && !TARGET_SINGLE_PIC_BASE
   && arm_pic_register != INVALID_REGNUM
diff --git a/gcc/testsuite/gcc.target/arm/pr77904.c b/gcc/testsuite/gcc.target/arm/pr77904.c
new file mode 100644
index ..76728c07e73350ce44160cabff3dd2fa7a6ef021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77904.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_sp (void)
+{
+  __asm volatile ("" : : : "sp");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+  clobber_sp ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}


Re: [Patch, testsuite] Add new effective-target_store_merge

2016-11-03 Thread Mike Stump
On Nov 3, 2016, at 2:29 AM, Senthil Kumar Selvaraj 
 wrote:
> 
>  The below patch adds a new effective target keyword (store_merge) for
>  use in the store_merging_xxx.c tests.
> 
>  The tests currently require non_strict_align, but they still fail for the 
> avr.
>  Eyeballing the dump, I found that the pass doesn't attempt merging as it is
>  unprofitable for a target like the avr which has only single byte
>  stores.
> 
>  I figured store merging is unlikely to be profitable for targets with
>  smallish word sizes, and added a check_effective_target_store_merge
>  that combines non_strict_align and int32plus.
> 
>  Is this ok for trunk?

Ok.

If anyone knows of a reason why this would turn off these tests for a target 
that currently tests them and works ok, let us know.



Re: [match.pd] Fix for PR35691

2016-11-03 Thread Marc Glisse

On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:


On 3 November 2016 at 16:13, Richard Biener  wrote:

On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:


Hi Richard,
The attached patch tries to fix PR35691, by adding the following two
transforms to match.pd:
(x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
(x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.

For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
it gets transformed to bit_and_expr
so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:

#if GENERIC
(for op (truth_andif truth_orif)
#elif GIMPLE
(for op (bit_and bit_ior)
#endif

Is that OK ?


As you are not removing the fold-const.c variant I'd say you should
simply not look for truth_* and only handle GIMPLE.  Note that we
have tree-ssa-ifcombine.c which should handle the variant with
control-flow (but I guess it does not and your patch wouldn't help
it either).

The transform would also work for vectors (element_precision for
the test but also a value-matching zero which should ensure the
same number of elements).

Um sorry, I didn't get how to check vectors to be of equal length by a
matching zero.
Could you please elaborate on that ?


He may have meant something like:

  (op (cmp @0 integer_zerop@2) (cmp @1 @2))

So the last operand is checked with operand_equal_p instead of 
integer_zerop. But the fact that we could compute bit_ior on the 
comparison results should already imply that the number of elements is the 
same. This would also prevent the case where one vector is signed and the 
other unsigned, which requires a view_convert (I don't remember if convert 
automatically becomes view_convert here as in fold_convert or not).


For (some_int == 0) & (some_long == 0), doing ((long)some_int | some_long) 
== 0 should also work (and it doesn't matter if we pick a sign- or 
zero-extension), but that's more complicated, not necessary for a first 
version.


On platforms that have IOR on floats (at least x86 with SSE, maybe some 
vector mode on s390?), it would be cool to do the same for floats (most 
likely at the RTL level).


--
Marc Glisse


  1   2   >