Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Richard Biener
On Mon, 29 Jan 2018, Jeff Law wrote:

> On 01/29/2018 04:37 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for
> > renaming and then invokes some SCEV using functions before finally updating
> > the SSA form.  On the testcase we end up with 3 degenerate PHIs pointing at
> > each other, so follow_copies_to_constant loops forever.
> I'm at a loss to understand how we could get in that situation.  Was it
> edge removal (if so for which PHI) or the duplicate process that created
> the loop?

The issue is that

static bool
tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
bitmap father_bbs, struct loop *loop)
{
  struct loop *loop_father;
  bool changed = false;
  struct loop *inner;
  enum unroll_level ul;

  /* Process inner loops first.  */
  for (inner = loop->inner; inner != NULL; inner = inner->next)
changed |= tree_unroll_loops_completely_1 (may_increase_size,
   unroll_outer, father_bbs,
   inner);

when recursing tree_unroll_loops_completely_1 appends unrolled
bodies and loops contained therein in the loop->inner chain.  We
specifically do _not_ allow processing of outer loops here due
to not up-to-date SSA form and then iterate the unrolling process
instead:

  /* If we changed an inner loop we cannot process outer loops in this
 iteration because SSA form is not up-to-date.  Continue with
 siblings of outer loops instead.  */
  if (changed)
return true;

so I think the proper fix is to avoid visiting loops that were added
in the above iteration over loop->inner.  For example by collecting
the loop->inner chain in a vec and then iterating over that (much
like FOR_EACH_LOOP does).  Or rely on the fact that loop->num of new
loops will be >= number_of_loops () [number_of_loops () is really
mis-named, would be a good time to add max_loop_num () or so]

Richard.

> We've seen one case where this happened inside DOM, but it was only
> because we ignored an unexecutable edge which then caused a PHI to
> become a degenerate.One or more of the PHI args in the degenerate
> was marked as EDGE_DFS_BACK on its associated edge.
> 
> Are any of the PHI args in pr84111 marked in a similar manner?
> 
> > 
> > The following patch has 2 different fixes for this, one is to avoid looking
> > at SSA_NAMEs registered for update (in theory all we care are the old ssa
> > names, but we don't have an API to query just those), the other is to put
> > some upper bound on how many times we follow SSA_NAMEs (either single defs
> > or degenerate phis).  Either of those changes is sufficient to fix this.
> Interestingly enough I was looking at a bug several weeks ago where the
> ability to query one of the sets (I can't remember if it was old or new)
> of names for rewriting would have been useful.
> 
> 
> > 
> > During x86_64-linux and i686-linux bootstraps/regtests 
> > follow_copies_to_constant
> > actually never returned anything useful, so the patch doesn't regress at
> > least on these targets anything, there are quite a few cases where SSA_NAME
> > is name_registered_for_update_p but the function would never return anything
> > but var in those cases.  And the highest depth ever seen was 3, except for
> > the newly added testcase if the && !name_registered_for_update_p (res) part
> > is commented out.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > 2018-01-29  Jakub Jelinek  
> > 
> > PR tree-optimization/84111
> > * tree-scalar-evolution.c: Include tree-into-ssa.h.
> > (follow_copies_to_constant): Stop on SSA_NAMEs registered for update,
> > loop at most 128 times.
> > 
> > * gcc.c-torture/compile/pr84111.c: New test.
> Probably reasonable.  Though I would like to better understand how we
> got the degenerates forming a loop before final ack.
> 
> jeff
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Jakub Jelinek
On Mon, Jan 29, 2018 at 11:58:45PM -0700, Jeff Law wrote:
> On 01/29/2018 04:37 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for
> > renaming and then invokes some SCEV using functions before finally updating
> > the SSA form.  On the testcase we end up with 3 degenerate PHIs pointing at
> > each other, so follow_copies_to_constant loops forever.
> I'm at a loss to understand how we could get in that situation.  Was it
> edge removal (if so for which PHI) or the duplicate process that created
> the loop?

The original loop before unrolling had following PHIs for the y parameter:
   [local count: 118111601]:
  # y_37 = PHI 

   [local count: 3277506]:
  # y_29 = PHI 

   [local count: 3498303]:
  # y_8 = PHI 
and actually no other PHIs, so effectively all uses of y in the function
could be y_20(D), but we haven't cleaned it up yet; before pre we had:
   [local count: 118111601]:
  # y_29 = PHI <0(4), y_37(3)>
which prevented optimizing everything into y_20(D).
In the cunroll emergency dump I've done when it was looping I got:
;;   basic block 15, loop depth 0
;;pred:   2
  # y_25 = PHI 

;;   basic block 17, loop depth 1
;;pred:   16
;;19
  # y_10 = PHI 

;;   basic block 20, loop depth 0
;;pred:   18
  # y_5 = PHI 

;;   basic block 3, loop depth 2
;;pred:   11
;;14
  # y_37 = PHI 

;;   basic block 13, loop depth 1
;;pred:   5
  # y_29 = PHI 

;;   basic block 6, loop depth 1
;;pred:   20
;;13
  # y_8 = PHI 

where bb 3, 13 and 6 are the same PHIs as before which aren't going to be
renamed, and bb20 is the new cunroll copy of bb13, bb17 copy of bb3.
The PHI results of the PHIs in the new bbs already have a new SSA_NAME, but
the PHI arguments are waiting to be renamed.
Already before the cunroll, 2 of the PHIs are degenerate and only one is
not.  When SCEV is called, the only non-degenerate among those 3 previously
looks like degenerate too, but we'll be replacing y_29(20) in there with
y_5(20) during ssa update.

Now, with the patch at the end of cunroll after ssa update and cfg cleanup
we have:
   [local count: 18182121]:
  # y_10 = PHI 

   [local count: 2063999]:
  # y_5 = PHI 
PHIs for y and nothing else and finally phicprop2 optimizes all y references
to just y_20(D).

> > The following patch has 2 different fixes for this, one is to avoid looking
> > at SSA_NAMEs registered for update (in theory all we care are the old ssa
> > names, but we don't have an API to query just those), the other is to put
> > some upper bound on how many times we follow SSA_NAMEs (either single defs
> > or degenerate phis).  Either of those changes is sufficient to fix this.
> Interestingly enough I was looking at a bug several weeks ago where the
> ability to query one of the sets (I can't remember if it was old or new)
> of names for rewriting would have been useful.

tree-cfg.c has a comment like that:
tree-cfg.c-  /* Technically only new names matter.  */
tree-cfg.c:  if (name_registered_for_update_p (PHI_RESULT (phi)))
tree-cfg.c- return false;

Jakub


Contents of PO file 'cpplib-8.1-b20180128.vi.po'

2018-01-30 Thread Translation Project Robot


cpplib-8.1-b20180128.vi.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



New Vietnamese PO file for 'cpplib' (version 8.1-b20180128)

2018-01-30 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/vi.po

(This file, 'cpplib-8.1-b20180128.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH, i386, AVX] PR target/83828: fix AVX-512BITALG test failures

2018-01-30 Thread Kirill Yukhin
Hello,
If masked variant of vpopcnt[w,q] is being issued: there's no way for reload
to put 32/64 bit mask into mask register, since kmov[d,q] are only available
under -mavx512bw switch.

We can insist user to issue -mavx512bw along w/ -mavx512bitalg if she is
going to use masked variants of corresponding intrinsics.

gcc/testsuite
PR target/83828
* gcc.target/i386/avx512bitalg-vpopcntb-1.c: Fix test.
* gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto.

Bootstrapped & regtested. Checked into main trunk.

--
Thanks, K

Index: gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c
===
--- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c (revision 
257172)
+++ gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -mavx512bitalg" } */
+/* { dg-options "-O2 -mavx512bitalg -mavx512bw" } */
 /* { dg-require-effective-target avx512bitalg } */
+/* { dg-require-effective-target avx512bw } */
 
 #include "avx512f-helper.h"
 
Index: gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c
===
--- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c (revision 
257172)
+++ gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -mavx512bitalg" } */
+/* { dg-options "-O2 -mavx512bitalg -mavx512bw" } */
 /* { dg-require-effective-target avx512bitalg } */
+/* { dg-require-effective-target avx512bw } */
 
 #include "avx512f-helper.h"
 
Index: gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c
===
--- gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c   (revision 
257172)
+++ gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c   (working copy)
@@ -1,7 +1,8 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -mavx512vl -mavx512bitalg" } */
+/* { dg-options "-O2 -mavx512vl -mavx512bitalg -mavx512bw" } */
 /* { dg-require-effective-target avx512vl } */
 /* { dg-require-effective-target avx512bitalg } */
+/* { dg-require-effective-target avx512bw } */
 
 #define AVX512VL
 #define AVX512F_LEN 256
Index: gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c
===
--- gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c   (revision 
257172)
+++ gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c   (working copy)
@@ -1,7 +1,8 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -mavx512vl -mavx512bitalg" } */
+/* { dg-options "-O2 -mavx512vl -mavx512bitalg -mavx512bw" } */
 /* { dg-require-effective-target avx512vl } */
 /* { dg-require-effective-target avx512bitalg } */
+/* { dg-require-effective-target avx512bw } */
 
 #define AVX512VL
 #define AVX512F_LEN 256



New Russian PO file for 'gcc' (version 8.1-b20180128)

2018-01-30 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Russian team of translators.  The file is available at:

http://translationproject.org/latest/gcc/ru.po

(This file, 'gcc-8.1-b20180128.ru.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote:
> The issue is that
> 
> static bool
> tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
> bitmap father_bbs, struct loop *loop)
> {
>   struct loop *loop_father;
>   bool changed = false;
>   struct loop *inner;
>   enum unroll_level ul;
> 
>   /* Process inner loops first.  */
>   for (inner = loop->inner; inner != NULL; inner = inner->next)
> changed |= tree_unroll_loops_completely_1 (may_increase_size,
>unroll_outer, father_bbs,
>inner);
> 
> when recursing tree_unroll_loops_completely_1 appends unrolled
> bodies and loops contained therein in the loop->inner chain.  We
> specifically do _not_ allow processing of outer loops here due
> to not up-to-date SSA form and then iterate the unrolling process
> instead:
> 
>   /* If we changed an inner loop we cannot process outer loops in this
>  iteration because SSA form is not up-to-date.  Continue with
>  siblings of outer loops instead.  */
>   if (changed)
> return true;
> 
> so I think the proper fix is to avoid visiting loops that were added
> in the above iteration over loop->inner.  For example by collecting
> the loop->inner chain in a vec and then iterating over that (much
> like FOR_EACH_LOOP does).  Or rely on the fact that loop->num of new
> loops will be >= number_of_loops () [number_of_loops () is really
> mis-named, would be a good time to add max_loop_num () or so]

So like this if it passes bootstrap/regtest?

OT, why is loop->num signed rather than unsigned?

2018-01-30  Richard Biener  
Jakub Jelinek  

PR tree-optimization/84111
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip
inner loops added during recursion, as they don't have up-to-date
SSA form.

* gcc.c-torture/compile/pr84111.c: New test.

--- gcc/tree-ssa-loop-ivcanon.c.jj  2018-01-29 14:05:46.689153783 +0100
+++ gcc/tree-ssa-loop-ivcanon.c 2018-01-30 09:30:21.932069737 +0100
@@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may
   bool changed = false;
   struct loop *inner;
   enum unroll_level ul;
+  unsigned num = number_of_loops (cfun);
 
-  /* Process inner loops first.  */
+  /* Process inner loops first.  Don't walk loops added by the recursive
+ calls because SSA form is not up-to-date.  They can be handled in the
+ next iteration.  */
   for (inner = loop->inner; inner != NULL; inner = inner->next)
-changed |= tree_unroll_loops_completely_1 (may_increase_size,
-  unroll_outer, father_bbs,
-  inner);
- 
+if ((unsigned) inner->num < num)
+  changed |= tree_unroll_loops_completely_1 (may_increase_size,
+unroll_outer, father_bbs,
+inner);
+
   /* If we changed an inner loop we cannot process outer loops in this
  iteration because SSA form is not up-to-date.  Continue with
  siblings of outer loops instead.  */
--- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj2018-01-29 
20:38:10.236528914 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr84111.c   2018-01-29 
20:37:53.227522763 +0100
@@ -0,0 +1,31 @@
+/* PR tree-optimization/84111 */
+
+void
+foo (int x, int y, int z)
+{
+  int a = 0;
+  int *b = &x;
+
+  while (a < 1)
+{
+  int c = y;
+  *b = x;
+ lab:
+  for (a = 0; a < 36; ++a)
+{
+  *b = 0;
+  if (x != 0)
+y = 0;
+  while (c < 1)
+   ;
+}
+}
+  if (z < 33)
+{
+  b = (int *) 0;
+  ++y;
+  ++z;
+  if (x / *b != 0)
+goto lab;
+}
+}


Jakub


RE: [patch][x86] -march=icelake

2018-01-30 Thread Koval, Julia
Renamed it. Ok for trunk?

gcc/c-family/
* c-common.h (omp_clause_mask): Move to wide_int_bitmask.h

gcc/
* config/i386/i386.c (ix86_option_override_internal): Change flags type 
to
wide_int_bitmask.
* wide-int-bitmask.h: New.

Thanks,
Julia


> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Wednesday, January 24, 2018 12:18 PM
> To: Koval, Julia 
> Cc: Jakub Jelinek ; Uros Bizjak ; GCC
> Patches ; Kirill Yukhin 
> Subject: RE: [patch][x86] -march=icelake
> 
> On Wed, 24 Jan 2018, Koval, Julia wrote:
> 
> > I think we may want to extend it to more than 2 ints someday, when we run
> out of bits again. It won't break the existing functionality if 3rd int will 
> be zero by
> default. That's why I tried to avoid "two" in the name.
> >
> > Julia
> >
> > > -Original Message-
> > > From: Jakub Jelinek [mailto:ja...@redhat.com]
> > > Sent: Wednesday, January 24, 2018 12:06 PM
> > > To: Uros Bizjak ; Richard Biener 
> > > Cc: Koval, Julia ; GCC Patches  > > patc...@gcc.gnu.org>; Kirill Yukhin 
> > > Subject: Re: [patch][x86] -march=icelake
> > >
> > > On Wed, Jan 24, 2018 at 12:00:26PM +0100, Uros Bizjak wrote:
> > > > On Mon, Jan 22, 2018 at 3:44 PM, Koval, Julia 
> wrote:
> > > > > Yes, you are right, any() is not required. Here is the patch.
> > > >
> > > > Please also attach ChangeLog.
> > > >
> > > > The patch is OK for x86 target, it needs global reviewer approval
> > > > (Maybe Jakub, as the patch touches OMP part).
> > >
> > > I don't like the new class name nor header name, bit_mask is way too
> generic
> > > name for something very specialized (double hwi bitmask).
> > >
> > > Richard, any suggestions for this?
> 
> Maybe wide_int_bitmask?  You could then even use fixed_wide_int <> as
> "implementation".
> 
> Richard.


0001-bitmask.patch
Description: 0001-bitmask.patch


New Swedish PO file for 'cpplib' (version 8.1-b20180128)

2018-01-30 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/sv.po

(This file, 'cpplib-8.1-b20180128.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-8.1-b20180128.sv.po'

2018-01-30 Thread Translation Project Robot


cpplib-8.1-b20180128.sv.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [patch][x86] -march=icelake

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 08:35:38AM +, Koval, Julia wrote:
>   * c-common.h (omp_clause_mask): Move to wide_int_bitmask.h

Missing dot ad the end.

+  wide_int_bitmask PTA_3DNOW (HOST_WIDE_INT_1U << 0);

Can't all these be const wide_int_bitmask instead of just
wide_int_bitmask?

...
+
+  wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
+| PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
+  wide_int_bitmask PTA_NEHALEM = PTA_CORE2 | PTA_SSE4_1 | PTA_SSE4_2
+| PTA_POPCNT;
+  wide_int_bitmask PTA_WESTMERE = PTA_NEHALEM | PTA_AES | PTA_PCLMUL;
+  wide_int_bitmask PTA_SANDYBRIDGE = PTA_WESTMERE | PTA_AVX | PTA_XSAVE
+| PTA_XSAVEOPT;
+  wide_int_bitmask PTA_IVYBRIDGE = PTA_SANDYBRIDGE | PTA_FSGSBASE | PTA_RDRND
+| PTA_F16C;
+  wide_int_bitmask PTA_HASWELL = PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI | PTA_BMI2
+| PTA_LZCNT | PTA_FMA | PTA_MOVBE | PTA_HLE;
+  wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | PTA_PRFCHW
+| PTA_RDSEED;
+  wide_int_bitmask PTA_SKYLAKE = PTA_BROADWELL | PTA_CLFLUSHOPT | PTA_XSAVEC
+| PTA_XSAVES;
+  wide_int_bitmask PTA_SKYLAKE_AVX512 = PTA_SKYLAKE | PTA_AVX512F | 
PTA_AVX512CD
+| PTA_AVX512VL | PTA_AVX512BW | PTA_AVX512DQ | PTA_PKU | PTA_CLWB;
+  wide_int_bitmask PTA_CANNONLAKE = PTA_SKYLAKE_AVX512 | PTA_AVX512VBMI
+| PTA_AVX512IFMA | PTA_SHA;
+  wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER
+| PTA_AVX512F | PTA_AVX512CD;
+  wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
+  wide_int_bitmask PTA_SILVERMONT = PTA_WESTMERE | PTA_MOVBE | PTA_RDRND;
+  wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW | PTA_AVX5124FMAPS
+| PTA_AVX512VPOPCNTDQ;

Likewise for these.

--- /dev/null
+++ b/gcc/wide-int-bitmask.h
@@ -0,0 +1,145 @@
+/* Operation with 128 bit bitmask.
+   Copyright (C) 1987-2018 Free Software Foundation, Inc.

Please use 2013-2018 instead, all the omp_clause_mask stuff was
introduced in 2013.

+
+#ifndef GCC_BIT_MASK_H
+#define GCC_BIT_MASK_H

The macro hasn't been renamed for the header file rename.

+
+#endif /* ! GCC_BIT_MASK_H */

Here as well.  Otherwise LGTM.

Jakub


Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)

2018-01-30 Thread Richard Biener
On Tue, 30 Jan 2018, Jakub Jelinek wrote:

> On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote:
> > The issue is that
> > 
> > static bool
> > tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
> > bitmap father_bbs, struct loop *loop)
> > {
> >   struct loop *loop_father;
> >   bool changed = false;
> >   struct loop *inner;
> >   enum unroll_level ul;
> > 
> >   /* Process inner loops first.  */
> >   for (inner = loop->inner; inner != NULL; inner = inner->next)
> > changed |= tree_unroll_loops_completely_1 (may_increase_size,
> >unroll_outer, father_bbs,
> >inner);
> > 
> > when recursing tree_unroll_loops_completely_1 appends unrolled
> > bodies and loops contained therein in the loop->inner chain.  We
> > specifically do _not_ allow processing of outer loops here due
> > to not up-to-date SSA form and then iterate the unrolling process
> > instead:
> > 
> >   /* If we changed an inner loop we cannot process outer loops in this
> >  iteration because SSA form is not up-to-date.  Continue with
> >  siblings of outer loops instead.  */
> >   if (changed)
> > return true;
> > 
> > so I think the proper fix is to avoid visiting loops that were added
> > in the above iteration over loop->inner.  For example by collecting
> > the loop->inner chain in a vec and then iterating over that (much
> > like FOR_EACH_LOOP does).  Or rely on the fact that loop->num of new
> > loops will be >= number_of_loops () [number_of_loops () is really
> > mis-named, would be a good time to add max_loop_num () or so]
> 
> So like this if it passes bootstrap/regtest?

Yes.

> OT, why is loop->num signed rather than unsigned?

History.  Like so many others (block->index, etc.).  Possibly to
allow better optimization of loops over those indexes (relying
on undefined overflow).

Richard.

> 2018-01-30  Richard Biener  
>   Jakub Jelinek  
> 
>   PR tree-optimization/84111
>   * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip
>   inner loops added during recursion, as they don't have up-to-date
>   SSA form.
> 
>   * gcc.c-torture/compile/pr84111.c: New test.
> 
> --- gcc/tree-ssa-loop-ivcanon.c.jj2018-01-29 14:05:46.689153783 +0100
> +++ gcc/tree-ssa-loop-ivcanon.c   2018-01-30 09:30:21.932069737 +0100
> @@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may
>bool changed = false;
>struct loop *inner;
>enum unroll_level ul;
> +  unsigned num = number_of_loops (cfun);
>  
> -  /* Process inner loops first.  */
> +  /* Process inner loops first.  Don't walk loops added by the recursive
> + calls because SSA form is not up-to-date.  They can be handled in the
> + next iteration.  */
>for (inner = loop->inner; inner != NULL; inner = inner->next)
> -changed |= tree_unroll_loops_completely_1 (may_increase_size,
> -unroll_outer, father_bbs,
> -inner);
> - 
> +if ((unsigned) inner->num < num)
> +  changed |= tree_unroll_loops_completely_1 (may_increase_size,
> +  unroll_outer, father_bbs,
> +  inner);
> +
>/* If we changed an inner loop we cannot process outer loops in this
>   iteration because SSA form is not up-to-date.  Continue with
>   siblings of outer loops instead.  */
> --- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj  2018-01-29 
> 20:38:10.236528914 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr84111.c 2018-01-29 
> 20:37:53.227522763 +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/84111 */
> +
> +void
> +foo (int x, int y, int z)
> +{
> +  int a = 0;
> +  int *b = &x;
> +
> +  while (a < 1)
> +{
> +  int c = y;
> +  *b = x;
> + lab:
> +  for (a = 0; a < 36; ++a)
> +{
> +  *b = 0;
> +  if (x != 0)
> +y = 0;
> +  while (c < 1)
> + ;
> +}
> +}
> +  if (z < 33)
> +{
> +  b = (int *) 0;
> +  ++y;
> +  ++z;
> +  if (x / *b != 0)
> +goto lab;
> +}
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Mon, Jan 29, 2018 at 11:41:32PM +0100, Jakub Jelinek wrote:
>> On Mon, Jan 29, 2018 at 04:26:50PM -0600, Segher Boessenkool wrote:
>> > > The code actually meant pointer comparison, the question is what is
>> > > different on powerpc* that you end up with a different REG.
>> > > >From what I can see, function.c uses crtl->args.internal_arg_pointer
>> > > directly rather than a REG with the same REGNO.
>> > > Where does it become something different and why?
>> > 
>> > There is a lot of code that copies any RTX that isn't obviously unique.
>> > Here we have a PLUS of some things, which always needs copying, can
>> > never be shared.
>> 
>> Yes, PLUS needs to be unshared.
>> So it ought to be handled by the PLUS handling code.
>>   || (GET_CODE (XEXP (incoming, 0)) == PLUS
>>   && XEXP (XEXP (incoming, 0), 0)
>>  == crtl->args.internal_arg_pointer
>>   && CONST_INT_P (XEXP (XEXP (incoming, 0), 1)
>> Here we are talking about MEMs on the PARM_DECL DECL_RTLs, those are not
>> instantiated as normal IL in the RTL stream is.
>
> This is a PLUS of something with the internal_arg_pointer.  Our
> internal_arg_pointer _itself_ is a PLUS, so it should be copy_rtx'd
> everywhere it is used (or risk RTL sharing).

Is that needed even in DECL_RTL?

>> I'm not saying the var-tracking.c change is wrong, but I'd like to see
>> analysis on what is going on.
>
> The rtx_equal_p is equal to the == for anything that uses just a single
> register.  For us it makes the compiler bootstrap again (with Go enabled),
> not unnice to have in stage4 ;-)
>
> I agree the code does not seem to be set up for PLUS to work here.
>
>> Is the patch for -fsplit-stack where rs6000_internal_arg_pointer
>> returns a PLUS of some pseudo and some offset?
>
> Yeah.
>
>> In that case I wonder how does the patch with rtx_equal_p actually work,
>> because function.c then uses plus_constant on this result, and if there are
>
> Not everywhere, in places it does
>
>   gen_rtx_PLUS (Pmode, stack_parm, offset_rtx);
>
> (so we end up with non-canonical RTL).

But in that case, what does the copying?

That's what seems strange.  I can see why we'd have two nested
pluses with the inner plus being pointer-equal to internal_arg_ptr.
And I can see why we'd have a single canonical plus (which IMO would
be better, but I agree it's not stage 4 material).  It's having the two
nested pluses without maintaining pointer equality that seems strange.

Thanks,
Richard


Re: [PATCH] [PR testsuite/81010] Fix PPC test

2018-01-30 Thread Andreas Schwab
On Jan 29 2018, Jeff Law  wrote:

> On 01/07/2018 10:09 AM, Segher Boessenkool wrote:
>> Hi!
>> 
>> On Sun, Jan 07, 2018 at 12:58:25AM -0700, Jeff Law wrote:
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>>> index 3bc335f..be6456c 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>>> @@ -1,7 +1,7 @@
>>>  /* PR rtl-optimization/56605 */
>>> -/* { dg-do compile { target { powerpc64-*-* && lp64 } } } */
>>> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
>> 
>> You could as well do powerpc*-*-* afaics?
> I guess, but I suspect the lp64 test would prevent it from running on
> the 32bit configurations anyway.

-m64 will pass the lp64 test.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Segher Boessenkool
On Tue, Jan 30, 2018 at 09:41:47AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Mon, Jan 29, 2018 at 11:41:32PM +0100, Jakub Jelinek wrote:
> >> On Mon, Jan 29, 2018 at 04:26:50PM -0600, Segher Boessenkool wrote:
> >> > > The code actually meant pointer comparison, the question is what is
> >> > > different on powerpc* that you end up with a different REG.
> >> > > >From what I can see, function.c uses crtl->args.internal_arg_pointer
> >> > > directly rather than a REG with the same REGNO.
> >> > > Where does it become something different and why?
> >> > 
> >> > There is a lot of code that copies any RTX that isn't obviously unique.
> >> > Here we have a PLUS of some things, which always needs copying, can
> >> > never be shared.
> >> 
> >> Yes, PLUS needs to be unshared.
> >> So it ought to be handled by the PLUS handling code.
> >>   || (GET_CODE (XEXP (incoming, 0)) == PLUS
> >>   && XEXP (XEXP (incoming, 0), 0)
> >>  == crtl->args.internal_arg_pointer
> >>   && CONST_INT_P (XEXP (XEXP (incoming, 0), 1)
> >> Here we are talking about MEMs on the PARM_DECL DECL_RTLs, those are not
> >> instantiated as normal IL in the RTL stream is.
> >
> > This is a PLUS of something with the internal_arg_pointer.  Our
> > internal_arg_pointer _itself_ is a PLUS, so it should be copy_rtx'd
> > everywhere it is used (or risk RTL sharing).
> 
> Is that needed even in DECL_RTL?

When it is copied to the RTL stream, it has to yes.  No sharing allowed,
even if nothing inside this is ever modified.

> >> I'm not saying the var-tracking.c change is wrong, but I'd like to see
> >> analysis on what is going on.
> >
> > The rtx_equal_p is equal to the == for anything that uses just a single
> > register.  For us it makes the compiler bootstrap again (with Go enabled),
> > not unnice to have in stage4 ;-)
> >
> > I agree the code does not seem to be set up for PLUS to work here.
> >
> >> Is the patch for -fsplit-stack where rs6000_internal_arg_pointer
> >> returns a PLUS of some pseudo and some offset?
> >
> > Yeah.
> >
> >> In that case I wonder how does the patch with rtx_equal_p actually work,
> >> because function.c then uses plus_constant on this result, and if there are
> >
> > Not everywhere, in places it does
> >
> >   gen_rtx_PLUS (Pmode, stack_parm, offset_rtx);
> >
> > (so we end up with non-canonical RTL).
> 
> But in that case, what does the copying?

I don't know.  Aaron will look at it, but timezones etc. :-)

> That's what seems strange.  I can see why we'd have two nested
> pluses with the inner plus being pointer-equal to internal_arg_ptr.
> And I can see why we'd have a single canonical plus (which IMO would
> be better, but I agree it's not stage 4 material).  It's having the two
> nested pluses without maintaining pointer equality that seems strange.

The inner plus is *not* pointer-equal, that is the problem.  Something
did copy_rtx (or such) on it, many things do.  We can tell you what
exactly later today.


Segher


[PATCH] Fix PR84101, account for function ABI details in vectorization costs

2018-01-30 Thread Richard Biener

This patch tries to deal with the "easy" part of a function ABI,
the return value location, in vectorization costing.  The testcase
shows that if we vectorize the returned value but the function
doesn't return in memory or in a vector register but as in this
case in an integer register pair (reg:TI ax) (bah, ABI details
exposed late?  why's this not a parallel?) we end up spilling
badly.

The idea is to account for such spilling so if vectorization
benefits outweight the spilling we'll vectorize anyway.

I think the particular testcase could be fixed in the subreg
pass basically undoing the vectorization but I realize that
generally this is a too hard problem and avoiding vectorization
is better.  Still this patch is somewhat fragile in that it
depends on us "seeing" that the stored to decl is returned
(see cfun_returns).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

I'd like to hear opinions on my use of hard_function_value
and also from other target maintainers.  I'm not sure we
have sufficient testsuite coverage of _profitable_ vectorization
of a return value.  Feel free to add to this for your
target.

Ok for trunk?

Thanks,
Richard.

2018-01-30  Richard Biener  

PR tree-optimization/84101
* tree-vect-stmts.c: Include explow.h for hard_function_value.
(cfun_returns): New helper.
(vect_model_store_cost): When vectorizing a store to a decl
we return and the function ABI returns via a non-vector
register account for the possible spilling that will happen.

* gcc.target/i386/pr84101.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 257139)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
 #include "tree-cfg.h"
 #include "tree-ssa-loop-manip.h"
 #include "cfgloop.h"
+#include "explow.h"
 #include "tree-ssa-loop.h"
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
@@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt
  "prologue_cost = %d .\n", inside_cost, prologue_cost);
 }
 
+/* Returns true if the current function returns DECL.  */
+
+static bool
+cfun_returns (tree decl)
+{
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+{
+  greturn *ret = safe_dyn_cast  (last_stmt (e->src));
+  if (ret && gimple_return_retval (ret) == decl)
+   return true;
+}
+  return false;
+}
+
 /* Function vect_model_store_cost
 
Models cost for stores.  In the case of grouped accesses, one access
@@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm
   vec_to_scalar, stmt_info, 0, vect_body);
 }
 
+  /* When vectorizing an SLP store into the function result assign
+ a penalty if the function returns in a non-vector register.
+ In this case we assume we'll end up with having to spill the
+ vector result and do element loads.  */
+  if (slp_node)
+{
+  tree base = get_base_address (dr->ref);
+  if (base
+ && (TREE_CODE (base) == RESULT_DECL
+ || (DECL_P (base) && cfun_returns (base
+   {
+ rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
+ if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg)))
+ || GET_CODE (reg) == PARALLEL
+ || GET_CODE (reg) == CONCAT)
+   {
+ /* Spill.  */
+ inside_cost += record_stmt_cost (body_cost_vec,
+  ncopies, vector_store,
+  stmt_info, 0, vect_body);
+ /* Element loads.  */
+ unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+ inside_cost += record_stmt_cost (body_cost_vec,
+  ncopies * assumed_nunits,
+  scalar_load,
+  stmt_info, 0, vect_body);
+   }
+   }
+}
+
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
  "vect_model_store_cost: inside_cost = %d, "
Index: gcc/testsuite/gcc.target/i386/pr84101.c
===
--- gcc/testsuite/gcc.target/i386/pr84101.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr84101.c (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-slp2-details" } */
+
+typedef struct uint64_pair uint64_pair_t ;
+struct uint64_pair
+{
+  unsigned long w0 ;
+  unsigned long w1 ;
+} ;
+
+uint64_pair_t pair(int num)
+{
+  uint64_pair_t p ;
+
+  p.w0 = num << 1 ;
+  p.w1 = num >> 1 ;
+
+  return p ;
+}
+
+/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */


Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Tue, Jan 30, 2018 at 09:41:47AM +, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> > On Mon, Jan 29, 2018 at 11:41:32PM +0100, Jakub Jelinek wrote:
>> >> On Mon, Jan 29, 2018 at 04:26:50PM -0600, Segher Boessenkool wrote:
>> >> > > The code actually meant pointer comparison, the question is what is
>> >> > > different on powerpc* that you end up with a different REG.
>> >> > > >From what I can see, function.c uses crtl->args.internal_arg_pointer
>> >> > > directly rather than a REG with the same REGNO.
>> >> > > Where does it become something different and why?
>> >> > 
>> >> > There is a lot of code that copies any RTX that isn't obviously unique.
>> >> > Here we have a PLUS of some things, which always needs copying, can
>> >> > never be shared.
>> >> 
>> >> Yes, PLUS needs to be unshared.
>> >> So it ought to be handled by the PLUS handling code.
>> >>   || (GET_CODE (XEXP (incoming, 0)) == PLUS
>> >>   && XEXP (XEXP (incoming, 0), 0)
>> >>  == crtl->args.internal_arg_pointer
>> >>   && CONST_INT_P (XEXP (XEXP (incoming, 0), 1)
>> >> Here we are talking about MEMs on the PARM_DECL DECL_RTLs, those are not
>> >> instantiated as normal IL in the RTL stream is.
>> >
>> > This is a PLUS of something with the internal_arg_pointer.  Our
>> > internal_arg_pointer _itself_ is a PLUS, so it should be copy_rtx'd
>> > everywhere it is used (or risk RTL sharing).
>> 
>> Is that needed even in DECL_RTL?
>
> When it is copied to the RTL stream, it has to yes.  No sharing allowed,
> even if nothing inside this is ever modified.

Right.  But DECL_RTL isn't in the RTL stream.  It needs to be copied
before being used there.

And this code is looking at DECL_RTL, not at rtl stream (insn patterns).

>> >> I'm not saying the var-tracking.c change is wrong, but I'd like to see
>> >> analysis on what is going on.
>> >
>> > The rtx_equal_p is equal to the == for anything that uses just a single
>> > register.  For us it makes the compiler bootstrap again (with Go enabled),
>> > not unnice to have in stage4 ;-)
>> >
>> > I agree the code does not seem to be set up for PLUS to work here.
>> >
>> >> Is the patch for -fsplit-stack where rs6000_internal_arg_pointer
>> >> returns a PLUS of some pseudo and some offset?
>> >
>> > Yeah.
>> >
>> >> In that case I wonder how does the patch with rtx_equal_p actually work,
>> >> because function.c then uses plus_constant on this result, and if there 
>> >> are
>> >
>> > Not everywhere, in places it does
>> >
>> >   gen_rtx_PLUS (Pmode, stack_parm, offset_rtx);
>> >
>> > (so we end up with non-canonical RTL).
>> 
>> But in that case, what does the copying?
>
> I don't know.  Aaron will look at it, but timezones etc. :-)
>
>> That's what seems strange.  I can see why we'd have two nested
>> pluses with the inner plus being pointer-equal to internal_arg_ptr.
>> And I can see why we'd have a single canonical plus (which IMO would
>> be better, but I agree it's not stage 4 material).  It's having the two
>> nested pluses without maintaining pointer equality that seems strange.
>
> The inner plus is *not* pointer-equal, that is the problem.  Something
> did copy_rtx (or such) on it, many things do.  We can tell you what
> exactly later today.

Yeah, I realise that.  And I'm saying that's what seems strange :-)
If something is deliberately avoiding folding the plus so that we
can still see internal_arg_ptr, why does it end up copying the
internal_arg_ptr regardless?

Thanks,
Richard


[PATCH] Fix PR83008

2018-01-30 Thread Richard Biener

I have been asked to push this change, fixing (somewhat) the impreciseness
of costing constant/invariant vector uses in SLP stmts.  The previous
code always just considered a single constant to be generated in the
prologue irrespective of how many we'd need.  With this patch we
properly handle this count and optimize for the case when we can use
a vector splat.  It doesn't yet handle CSE (or CSE among stmts) which
means it could in theory regress cases it overall costed correctly
before "optimistically" (aka by accident).  But at least the costing
now matches code generation.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  On x86_64
Haswell with AVX2 SPEC 2k6 shows no off-noise changes.

The patch is said to help the case in the PR when additional backend
costing changes are done (for AVX512).

Ok for trunk at this stage?

Thanks,
Richard.

2018-01-30  Richard Biener  

PR tree-optimization/83008
* tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost
invariant and constant vector uses in stmts when they need
more than one stmt.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 257047)
+++ gcc/tree-vect-slp.c (working copy)
@@ -1911,18 +1911,56 @@ vect_analyze_slp_cost_1 (slp_instance in
   enum vect_def_type dt;
   if (!op || op == lhs)
continue;
-  if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt))
+  if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt)
+ && (dt == vect_constant_def || dt == vect_external_def))
{
  /* Without looking at the actual initializer a vector of
 constants can be implemented as load from the constant pool.
-???  We need to pass down stmt_info for a vector type
-even if it points to the wrong stmt.  */
- if (dt == vect_constant_def)
-   record_stmt_cost (prologue_cost_vec, 1, vector_load,
- stmt_info, 0, vect_prologue);
- else if (dt == vect_external_def)
-   record_stmt_cost (prologue_cost_vec, 1, vec_construct,
- stmt_info, 0, vect_prologue);
+When all elements are the same we can use a splat.  */
+ tree vectype = get_vectype_for_scalar_type (TREE_TYPE (op));
+ unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length ();
+ unsigned num_vects_to_check;
+ unsigned HOST_WIDE_INT const_nunits;
+ unsigned nelt_limit;
+ if (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits)
+ && ! multiple_p (const_nunits, group_size))
+   {
+ num_vects_to_check = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
+ nelt_limit = const_nunits;
+   }
+ else
+   {
+ /* If either the vector has variable length or the vectors
+are composed of repeated whole groups we only need to
+cost construction once.  All vectors will be the same.  */
+ num_vects_to_check = 1;
+ nelt_limit = group_size;
+   }
+ tree elt = NULL_TREE;
+ unsigned nelt = 0;
+ for (unsigned j = 0; j < num_vects_to_check * nelt_limit; ++j)
+   {
+ unsigned si = j % group_size;
+ if (nelt == 0)
+   elt = gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i);
+ /* ???  We're just tracking whether all operands of a single
+vector initializer are the same, ideally we'd check if
+we emitted the same one already.  */
+ else if (elt != gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i))
+   elt = NULL_TREE;
+ nelt++;
+ if (nelt == nelt_limit)
+   {
+ /* ???  We need to pass down stmt_info for a vector type
+even if it points to the wrong stmt.  */
+ record_stmt_cost (prologue_cost_vec, 1,
+   dt == vect_external_def
+   ? (elt ? scalar_to_vec : vec_construct)
+   : vector_load,
+   stmt_info, 0, vect_prologue);
+ nelt = 0;
+   }
+   }
}
 }
 


Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 03:55:58AM -0600, Segher Boessenkool wrote:
> > But in that case, what does the copying?
> 
> I don't know.  Aaron will look at it, but timezones etc. :-)
> 
> > That's what seems strange.  I can see why we'd have two nested
> > pluses with the inner plus being pointer-equal to internal_arg_ptr.
> > And I can see why we'd have a single canonical plus (which IMO would
> > be better, but I agree it's not stage 4 material).  It's having the two
> > nested pluses without maintaining pointer equality that seems strange.
> 
> The inner plus is *not* pointer-equal, that is the problem.  Something
> did copy_rtx (or such) on it, many things do.  We can tell you what
> exactly later today.

Most likely unshare_all_rtl, which does:
  for (tree decl = DECL_ARGUMENTS (cfun->decl); decl; decl = DECL_CHAIN (decl))
{
  if (DECL_RTL_SET_P (decl))
SET_DECL_RTL (decl, copy_rtx_if_shared (DECL_RTL (decl)));
  DECL_INCOMING_RTL (decl) = copy_rtx_if_shared (DECL_INCOMING_RTL (decl));
}

Anyway, my preference would be to change that gen_rtx_PLUS into
  stack_parm = crtl->args.internal_arg_pointer;
  if (!CONST_INT_P (offset_rtx))
stack_parm = gen_rtx_PLUS (Pmode, stack_parm, offset_rtx);
  else if (offset_rtx != const0_rtx)
stack_parm = plus_constant (Pmode, stack_parm, INTVAL (offset_rtx));
  stack_parm = gen_rtx_MEM (data->promoted_mode, stack_parm);
and deal specially with GET_CODE (crtl->args.internal_arg_pointer)
in var-tracking.c.
rs6000/powerpcspe with -fsplit-stack are the only cases where
crtl->args.internal_arg_pointer is not a REG, so just running libgo
testsuite on powerpc{,64,64le} should cover it all.

Jakub


Re: [PATCH] Fix PR83008

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 11:07:50AM +0100, Richard Biener wrote:
> 
> I have been asked to push this change, fixing (somewhat) the impreciseness
> of costing constant/invariant vector uses in SLP stmts.  The previous
> code always just considered a single constant to be generated in the
> prologue irrespective of how many we'd need.  With this patch we
> properly handle this count and optimize for the case when we can use
> a vector splat.  It doesn't yet handle CSE (or CSE among stmts) which
> means it could in theory regress cases it overall costed correctly
> before "optimistically" (aka by accident).  But at least the costing
> now matches code generation.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  On x86_64
> Haswell with AVX2 SPEC 2k6 shows no off-noise changes.
> 
> The patch is said to help the case in the PR when additional backend
> costing changes are done (for AVX512).
> 
> Ok for trunk at this stage?

LGTM.

> 2018-01-30  Richard Biener  
> 
>   PR tree-optimization/83008
>   * tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost
>   invariant and constant vector uses in stmts when they need
>   more than one stmt.

Jakub


RE: [patch][x86] -march=icelake

2018-01-30 Thread Koval, Julia
Thank you for your comments, fixed them and rebased Ice Lake patch on top of 
it. Ok for trunk?

Bitmask patch changelog:

gcc/c-family/
* c-common.h (omp_clause_mask): Move to wide_int_bitmask.h.

gcc/
* config/i386/i386.c (ix86_option_override_internal): Change flags type 
to
wide_int_bitmask.
* wide-int-bitmask.h: New.

Icelake patch changelog:

gcc/
* config.gcc: Add -march=icelake.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect icelake.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle icelake.
* config/i386/i386.c (processor_costs): Add m_ICELAKE.
(PTA_ICELAKE, PTA_AVX512VNNI, PTA_GFNI, PTA_VAES, PTA_AVX512VBMI2,
PTA_VPCLMULQDQ, PTA_RDPID, PTA_AVX512BITALG): New.
(processor_target_table): Add icelake.
(ix86_option_override_internal): Handle new PTAs.
(get_builtin_code_for_version): Handle icelake.
(M_INTEL_COREI7_ICELAKE): New.
(fold_builtin_cpu): Handle icelake.
* config/i386/i386.h (TARGET_ICELAKE, PROCESSOR_ICELAKE): New.
* doc/invoke.texi: Add -march=icelake.
gcc/testsuite/
* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.dg/ext/mv16.C: Ditto.
libgcc/
* config/i386/cpuinfo.h (processor_subtypes): Add INTEL_COREI7_ICELAKE.

Thanks,
Julia

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, January 30, 2018 9:47 AM
> To: Koval, Julia 
> Cc: Richard Biener ; Uros Bizjak ;
> GCC Patches ; Kirill Yukhin
> 
> Subject: Re: [patch][x86] -march=icelake
> 
> On Tue, Jan 30, 2018 at 08:35:38AM +, Koval, Julia wrote:
> > * c-common.h (omp_clause_mask): Move to wide_int_bitmask.h
> 
> Missing dot ad the end.
> 
> +  wide_int_bitmask PTA_3DNOW (HOST_WIDE_INT_1U << 0);
> 
> Can't all these be const wide_int_bitmask instead of just
> wide_int_bitmask?
> 
> ...
> +
> +  wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE |
> PTA_SSE2
> +| PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
> +  wide_int_bitmask PTA_NEHALEM = PTA_CORE2 | PTA_SSE4_1 | PTA_SSE4_2
> +| PTA_POPCNT;
> +  wide_int_bitmask PTA_WESTMERE = PTA_NEHALEM | PTA_AES |
> PTA_PCLMUL;
> +  wide_int_bitmask PTA_SANDYBRIDGE = PTA_WESTMERE | PTA_AVX |
> PTA_XSAVE
> +| PTA_XSAVEOPT;
> +  wide_int_bitmask PTA_IVYBRIDGE = PTA_SANDYBRIDGE | PTA_FSGSBASE |
> PTA_RDRND
> +| PTA_F16C;
> +  wide_int_bitmask PTA_HASWELL = PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI |
> PTA_BMI2
> +| PTA_LZCNT | PTA_FMA | PTA_MOVBE | PTA_HLE;
> +  wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX |
> PTA_PRFCHW
> +| PTA_RDSEED;
> +  wide_int_bitmask PTA_SKYLAKE = PTA_BROADWELL | PTA_CLFLUSHOPT |
> PTA_XSAVEC
> +| PTA_XSAVES;
> +  wide_int_bitmask PTA_SKYLAKE_AVX512 = PTA_SKYLAKE | PTA_AVX512F |
> PTA_AVX512CD
> +| PTA_AVX512VL | PTA_AVX512BW | PTA_AVX512DQ | PTA_PKU |
> PTA_CLWB;
> +  wide_int_bitmask PTA_CANNONLAKE = PTA_SKYLAKE_AVX512 |
> PTA_AVX512VBMI
> +| PTA_AVX512IFMA | PTA_SHA;
> +  wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF |
> PTA_AVX512ER
> +| PTA_AVX512F | PTA_AVX512CD;
> +  wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
> +  wide_int_bitmask PTA_SILVERMONT = PTA_WESTMERE | PTA_MOVBE |
> PTA_RDRND;
> +  wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW |
> PTA_AVX5124FMAPS
> +| PTA_AVX512VPOPCNTDQ;
> 
> Likewise for these.
> 
> --- /dev/null
> +++ b/gcc/wide-int-bitmask.h
> @@ -0,0 +1,145 @@
> +/* Operation with 128 bit bitmask.
> +   Copyright (C) 1987-2018 Free Software Foundation, Inc.
> 
> Please use 2013-2018 instead, all the omp_clause_mask stuff was
> introduced in 2013.
> 
> +
> +#ifndef GCC_BIT_MASK_H
> +#define GCC_BIT_MASK_H
> 
> The macro hasn't been renamed for the header file rename.
> 
> +
> +#endif /* ! GCC_BIT_MASK_H */
> 
> Here as well.  Otherwise LGTM.
> 
>   Jakub


0001-bitmask.patch
Description: 0001-bitmask.patch


0002-icelake_rebased.patch
Description: 0002-icelake_rebased.patch


New Brazilian Portuguese PO file for 'cpplib' (version 8.1-b20180128)

2018-01-30 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Brazilian Portuguese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/pt_BR.po

(This file, 'cpplib-8.1-b20180128.pt_BR.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-8.1-b20180128.pt_BR.po'

2018-01-30 Thread Translation Project Robot


cpplib-8.1-b20180128.pt_BR.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Aaron Sawdey
On Tue, 2018-01-30 at 11:13 +0100, Jakub Jelinek wrote:
> On Tue, Jan 30, 2018 at 03:55:58AM -0600, Segher Boessenkool wrote:
> > > But in that case, what does the copying?
> > 
> > I don't know.  Aaron will look at it, but timezones etc. :-)

Indeed I did see unshare_all_rtl() copying internal_arg_pointer. But
also several places in function.c:

assign_parm_adjust_entry_rtl:
  move_block_from_reg (REGNO (entry_parm),
   validize_mem (copy_rtx (stack_parm)),
   data->partial / UNITS_PER_WORD);

assign_parm_setup_reg:
  /* Copy the value into the register, thus bridging between
 assign_parm_find_data_types and expand_expr_real_1.  */

  equiv_stack_parm = data->stack_parm;
  validated_mem = validize_mem (copy_rtx (data->entry_parm));

assign_parm_setup_block:
  mem = validize_mem (copy_rtx (stack_parm));

in expr.c:

expand_expr_real_1:
  /* DECL_MODE might change when TYPE_MODE depends on attribute target
 settings for VECTOR_TYPE_P that might switch for the function.  */
  if (currently_expanding_to_rtl
  && code == VAR_DECL && MEM_P (decl_rtl)
  && VECTOR_TYPE_P (type) && exp && DECL_MODE (exp) != mode)
decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
  else
decl_rtl = copy_rtx (decl_rtl);


> > 
> > > That's what seems strange.  I can see why we'd have two nested
> > > pluses with the inner plus being pointer-equal to
> > > internal_arg_ptr.
> > > And I can see why we'd have a single canonical plus (which IMO
> > > would
> > > be better, but I agree it's not stage 4 material).  It's having
> > > the two
> > > nested pluses without maintaining pointer equality that seems
> > > strange.
> > 
> > The inner plus is *not* pointer-equal, that is the
> > problem.  Something
> > did copy_rtx (or such) on it, many things do.  We can tell you what
> > exactly later today.
> 
> Most likely unshare_all_rtl, which does:
>   for (tree decl = DECL_ARGUMENTS (cfun->decl); decl; decl =
> DECL_CHAIN (decl))
> {
>   if (DECL_RTL_SET_P (decl))
> SET_DECL_RTL (decl, copy_rtx_if_shared (DECL_RTL (decl)));
>   DECL_INCOMING_RTL (decl) = copy_rtx_if_shared
> (DECL_INCOMING_RTL (decl));
> }
> 
> Anyway, my preference would be to change that gen_rtx_PLUS into
>   stack_parm = crtl->args.internal_arg_pointer;
>   if (!CONST_INT_P (offset_rtx))
> stack_parm = gen_rtx_PLUS (Pmode, stack_parm, offset_rtx);
>   else if (offset_rtx != const0_rtx)
> stack_parm = plus_constant (Pmode, stack_parm, INTVAL
> (offset_rtx));
>   stack_parm = gen_rtx_MEM (data->promoted_mode, stack_parm);
> and deal specially with GET_CODE (crtl->args.internal_arg_pointer)
> in var-tracking.c.
> rs6000/powerpcspe with -fsplit-stack are the only cases where
> crtl->args.internal_arg_pointer is not a REG, so just running libgo
> testsuite on powerpc{,64,64le} should cover it all.

I'll give this a try today when I get to the office.

Thanks,
Aaron


> 
>   Jakub
> 
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PING^3] [PATCH] Remove CANADIAN, that break compilation for foreign target

2018-01-30 Thread Jonathan Wakely

On 30/01/18 10:19 +0300, Petr Ovtchenkov wrote:

On Tue, 19 Dec 2017 11:37:43 +0300
Petr Ovtchenkov  wrote:

ping^3



I don't fully understand the consequences (or need) for this patch.

I asked some other people to look at it, and didn't get confirmation
it's OK. So I'm reluctant to make the change. Especially this late in
the GCC 8 cycle.



On Thu, 16 Nov 2017 20:55:37 +0300
Petr Ovtchenkov  wrote:

> On Wed, 20 Sep 2017 13:44:59 +0300
> Petr Ovtchenkov  wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71212
> >
> > On Fri, 20 May 2016 16:10:50 +0300
> > Petr Ovtchenkov  wrote:
> >
> > > Some old ad-hoc (adding -I/usr/include to compiler
> > > flags) break compilation of libstdc++ for foreign
> > > target architecture (due to compiler see includes
> > > of native).
>
> Reference for terms:
>
> https://gcc.gnu.org/onlinedocs/gccint/Configure-Terms.html
>
> Present of "CANADIAN=yes" lead to inclusion of
> headers from build (-I/usr/include). "CANADIAN=yes" used _only_
> to set "-I/usr/include".
>
> Inclusion of build headers in cross-compilation
> process is not a mistake only in case of native (i.e. it is mistake
> for cross, for canadian, for crossed native and for crossback),
> but sometimes give "success".
>
> Note, that build/host/target may be different not only due to
> different architectures, but due to different sysroots
> (libc, kernel, binutils, etc.).
>
> CANADIAN is set to "yes" by code
>
> -  # If Canadian cross, then don't pick up tools from the build directory.
> -  # Used only in GLIBCXX_EXPORT_INCLUDES.
> -  if test -n "$with_cross_host" &&
> - test x"$build_alias" != x"$with_cross_host" &&
> - test x"$build" != x"$target";
> -  then
> -CANADIAN=yes
> -  else
> -CANADIAN=no
> -  fi
>
> and it add "-I/usr/include" to compiler flags for building libstdc++.
> This is wrong.
>
> Reference to patch:
> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01332.html


Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 06:50:50AM -0600, Aaron Sawdey wrote:
> > Anyway, my preference would be to change that gen_rtx_PLUS into
> >   stack_parm = crtl->args.internal_arg_pointer;
> >   if (!CONST_INT_P (offset_rtx))
> > stack_parm = gen_rtx_PLUS (Pmode, stack_parm, offset_rtx);
> >   else if (offset_rtx != const0_rtx)
> > stack_parm = plus_constant (Pmode, stack_parm, INTVAL
> > (offset_rtx));
> >   stack_parm = gen_rtx_MEM (data->promoted_mode, stack_parm);
> > and deal specially with GET_CODE (crtl->args.internal_arg_pointer)
> > in var-tracking.c.
> > rs6000/powerpcspe with -fsplit-stack are the only cases where
> > crtl->args.internal_arg_pointer is not a REG, so just running libgo
> > testsuite on powerpc{,64,64le} should cover it all.
> 
> I'll give this a try today when I get to the office.

On IRC when discussing it with Segher this morning we've come to the
conclusion that it would be best if rs6000 just followed what all other
ports to, i.e. return a pseudo from the target hook, like:

--- gcc/config/rs6000/rs6000.c  2018-01-30 12:30:27.416360076 +0100
+++ gcc/config/rs6000/rs6000.c  2018-01-30 13:59:07.360639803 +0100
@@ -29602,8 +29602,9 @@ rs6000_internal_arg_pointer (void)
  emit_insn_after (pat, get_insns ());
  pop_topmost_sequence ();
}
-  return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
-   FIRST_PARM_OFFSET (current_function_decl));
+  rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
+  FIRST_PARM_OFFSET (current_function_decl));
+  return copy_to_reg (ret);
 }
   return virtual_incoming_args_rtx;
 }

copy_to_reg is what e.g. the generic or pa target hook conditionally uses.

Optionally, check also whether the
  push_topmost_sequence ();
  emit_insn_after (pat, get_insns ());
  pop_topmost_sequence ();
stuff is needed instead of just
emit_insn (pat);
and whether it needs to be guarded with cfun->machine->split_stack_arg_pointer
== NULL, because the target hook is called exactly once for each function.

Jakub


Fix ipa-inline ICE

2018-01-30 Thread Jan Hubicka
Hi,
this patch fixed ICE that was introduced by Richard Standiford's change to 
reorder
can and want_inline predicates to reduce amount of work done to verify inlining 
limits.
This bypasses check that the function is optimized that makes inliner to ICE 
because
function summary is missing.

This patch breaks out the expensive limits checking from can predicate to new 
one
which makes code bit more convoluted but I hope to clean things up next stage1.

Bootstrapped/regtested x86_64-linux, will commit it later today.

Honza

PR ipa/81360
* ipa-inline.c (can_inline_edge_p): Break out late tests to...
(can_inline_edge_by_limits_p): ... here.
(can_early_inline_edge_p, check_callers,
update_caller_keys, update_callee_keys, recursive_inlining,
add_new_edges_to_heap, speculation_useful_p,
inline_small_functions,
inline_small_functions, flatten_function,
inline_to_all_callers_1): Update.

* g++.dg/torture/pr81360.C: New testcase
Index: ipa-inline.c
===
--- ipa-inline.c(revision 257174)
+++ ipa-inline.c(working copy)
@@ -289,18 +289,16 @@ sanitize_attrs_match_for_inline_p (const
   (opts_for_fn (caller->decl)->x_##flag\
!= opts_for_fn (callee->decl)->x_##flag)
 
- /* Decide if we can inline the edge and possibly update
+/* Decide if we can inline the edge and possibly update
inline_failed reason.  
We check whether inlining is possible at all and whether
caller growth limits allow doing so.  
 
-   if REPORT is true, output reason to the dump file.  
-
-   if DISREGARD_LIMITS is true, ignore size limits.*/
+   if REPORT is true, output reason to the dump file. */
 
 static bool
 can_inline_edge_p (struct cgraph_edge *e, bool report,
-  bool disregard_limits = false, bool early = false)
+  bool early = false)
 {
   gcc_checking_assert (e->inline_failed);
 
@@ -316,9 +314,6 @@ can_inline_edge_p (struct cgraph_edge *e
   cgraph_node *caller = e->caller->global.inlined_to
? e->caller->global.inlined_to : e->caller;
   cgraph_node *callee = e->callee->ultimate_alias_target (&avail, caller);
-  tree caller_tree = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (caller->decl);
-  tree callee_tree
-= callee ? DECL_FUNCTION_SPECIFIC_OPTIMIZATION (callee->decl) : NULL;
 
   if (!callee->definition)
 {
@@ -379,12 +374,47 @@ can_inline_edge_p (struct cgraph_edge *e
   e->inline_failed = CIF_ATTRIBUTE_MISMATCH;
   inlinable = false;
 }
+  if (!inlinable && report)
+report_inline_failed_reason (e);
+  return inlinable;
+}
+
+/* Decide if we can inline the edge and possibly update
+   inline_failed reason.  
+   We check whether inlining is possible at all and whether
+   caller growth limits allow doing so.  
+
+   if REPORT is true, output reason to the dump file.
+
+   if DISREGARD_LIMITS is true, ignore size limits.  */
+
+static bool
+can_inline_edge_by_limits_p (struct cgraph_edge *e, bool report,
+bool disregard_limits = false, bool early = false)
+{
+  gcc_checking_assert (e->inline_failed);
+
+  if (cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR)
+{
+  if (report)
+report_inline_failed_reason (e);
+  return false;
+}
+
+  bool inlinable = true;
+  enum availability avail;
+  cgraph_node *caller = e->caller->global.inlined_to
+   ? e->caller->global.inlined_to : e->caller;
+  cgraph_node *callee = e->callee->ultimate_alias_target (&avail, caller);
+  tree caller_tree = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (caller->decl);
+  tree callee_tree
+= callee ? DECL_FUNCTION_SPECIFIC_OPTIMIZATION (callee->decl) : NULL;
   /* Check if caller growth allows the inlining.  */
-  else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
-  && !disregard_limits
-  && !lookup_attribute ("flatten",
-DECL_ATTRIBUTES (caller->decl))
-   && !caller_growth_limits (e))
+  if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
+  && !disregard_limits
+  && !lookup_attribute ("flatten",
+DECL_ATTRIBUTES (caller->decl))
+  && !caller_growth_limits (e))
 inlinable = false;
   /* Don't inline a function with a higher optimization level than the
  caller.  FIXME: this is really just tip of iceberg of handling
@@ -541,7 +571,8 @@ can_early_inline_edge_p (struct cgraph_e
fprintf (dump_file, "  edge not inlinable: not in SSA form\n");
   return false;
 }
-  if (!can_inline_edge_p (e, true, false, true))
+  if (!can_inline_edge_p (e, true, true)
+  || !can_inline_edge_by_limits_p (e, true, false, true))
 return false;
   return true;
 }
@@ -925,6 +956,8 @@ check_callers (struct cgraph_node *node,
  return true;
if (e->recursive_p ())
 return true;
+   if (!can_inline_edge_

Fix template for ipa-inline-2.c and ipa-inline-3.c

2018-01-30 Thread Jan Hubicka
Hi,
this PR is about rounding issue of sreal->integer conversion. I converted the 
whole
code to sreals but the change seems bit too big and with no practical 
improvement
to be justifiable, so this patch goes by simply fixing the template for current 
roundoff
error. Hopefully counter code is now stable enough so we won't more random 
flips in
GCC 8 release window.

Bootstrapped/regtested x86_64-linux, comitted.
Honza
PR ipa/83179
* gcc.dg/ipa/inline-2.c: Fix template.
* gcc.dg/ipa/inline-3.c: Fix template.

Index: testsuite/gcc.dg/ipa/inline-2.c
===
--- testsuite/gcc.dg/ipa/inline-2.c (revision 257182)
+++ testsuite/gcc.dg/ipa/inline-2.c (working copy)
@@ -28,6 +28,6 @@ void foo (int invariant)
 }
 }
 /* After inlining bar into foo, op2 is invariant within inner loop.  */
-/* { dg-final { scan-ipa-dump "op2 change 10.00. of time"  "inline"  } } */
+/* { dg-final { scan-ipa-dump "op2 change 9.99. of time"  "inline"  } } */
 /* After inlining bar into foo, op3 is invariant within both loops.  */
 /* { dg-final { scan-ipa-dump "op3 change 1.00. of time"  "inline"  } } */
Index: testsuite/gcc.dg/ipa/inline-3.c
===
--- testsuite/gcc.dg/ipa/inline-3.c (revision 257182)
+++ testsuite/gcc.dg/ipa/inline-3.c (working copy)
@@ -21,4 +21,4 @@ int foo (int invariant)
 }
 
 
-/* { dg-final { scan-ipa-dump "Scaling time by probability:0.10"  "inline" 
 } } */
+/* { dg-final { scan-ipa-dump "Scaling time by probability:0.099900"  "inline" 
 } } */


Re: Link with correct values-*.o files on Solaris (PR target/40411)

2018-01-30 Thread Rainer Orth
Hi Joseph,

> On Fri, 12 Jan 2018, Rainer Orth wrote:
>
>> At the same time, I had a new look at when values-Xc.o is used by the
>> Studio compilers.  It selects strict ISO C mode in a couple of cases,
>> and the latest Studio 12.6 cc, which is about to do away with the
>> previous -Xc option which enabled that mode in favour of gcc-compatible
>> -pedantic, uses the latter to control its use.  So I've changed gcc to
>> follow suit.
>
> gcc -pedantic is only ever a diagnostic option, not a semantic one; it's 
> incorrect to use it to change library semantics.  The relevant options for 
> strict ISO C (and C++) modes are -ansi, -std=c*, -std=iso9899:199409 and 
> aliases for those options.

that's what you get for changing your code at the eleventh hour ;-)
Before introducing -pedantic, I had

#define STARTFILE_ARCH_SPEC \
  "%{ansi|std=c90|std=iso9899\\:199409:values-Xc.o%s; :values-Xa.o%s} \

(which isn't correct either since it only handled C90 and C94).  I don't
think you need to handle the aliases explicitly: last time I checked
they never arrive in specs.

Prompted by the realization that -ansi applies to both C and C++ and I
only meant to affect C, I had a look at what the (then recently
released) Studio 12.6 compilers do, which strive for more GCC
compatibility.

I've now also checked the OpenSolaris libc and libm sources for uses of
_lib_version == strict_ansi (as is set in values-Xc.o).  libc has none,
and the uses in libm all follow this pattern (ignoring C Issue 4.2
compatibility mode handling no longer activated):

if (lib_version == strict_ansi) {
errno = EDOM;
} else if (!matherr(&exc)) {
errno = EDOM;
}

The default implementation of matherr (overridable by the user) just returns 0.

So it seems the following snippet

#define STARTFILE_ARCH_SPEC \
[...]
 %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \

seems like the right thing to do, as you said.

> -ansi is not defined as a .opt alias of -std=c90 (because the option it's 
> an alias for depends on whether C or C++ is being compiled), so I'd expect 
> the specs to need to handle it along with -std=c90.  I'd also expect 
> -std=iso9899:199409 to need to be handled there, supposing handling it 
> like -std=c90 is correct.  And what about C++ versions not based on C99 or 
> later?

I'm of mixed minds about whether or not to include -ansi in the above
list: for C it's correct, for C++ it's less clear.  AFAIK there's no way
to distinguish between different languages in specs (like via an -x
 switch always passed in).  OTOH, it has always been there already.

The following patch implements the above with corresponding comment
adjustments.  I'm open to suggestions what to do about -ansi.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-01-29  Rainer Orth  

PR target/40411
* config/sol2.h (STARTFILE_ARCH_SPEC): Use -std=c9*,
-std=iso9899:199409, -std=c1* instead of -pedantic to select
values-Xc.o.

# HG changeset patch
# Parent  85f8b72d36b77c7c044e6383f825596017ad
Fix use of Solaris values-Xc.o (PR target/40411)

diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -180,7 +180,9 @@ along with GCC; see the file COPYING3.  
The values-X[ac].o objects set the variable _lib_version.  The Studio C
compilers use values-Xc.o with either -Xc or (since Studio 12.6)
-pedantic to select strictly conformant ISO C behaviour, otherwise
-   values-Xa.o.
+   values-Xa.o.  Since -pedantic is a diagnostic option only in GCC, we
+   need to specifiy the -std=c* options and -std=iso9899:199409.  -ansi is
+   omitted from that list to avoid influencing C++.
 
The values-xpg[46].o objects define either or both __xpg[46] variables,
selecting XPG4 mode (__xpg4) and conforming C99/SUSv3 behavior (__xpg6).
@@ -195,7 +197,7 @@ along with GCC; see the file COPYING3.  
 #undef STARTFILE_ARCH_SPEC
 #define STARTFILE_ARCH_SPEC \
   "%{!shared:%{!symbolic: \
- %{pedantic:values-Xc.o%s; :values-Xa.o%s} \
+ %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \
  %{std=c90|std=gnu90:values-xpg4.o%s; :values-xpg6.o%s}}}"
 
 #if defined(HAVE_LD_PIE) && defined(HAVE_SOLARIS_CRTS)


[PATCH][PR target/84066] Wrong shadow stack register size is saved for x32

2018-01-30 Thread Tsimbalist, Igor V
x32 is a 64-bit process with 32-bit software pointer and kernel may
place x32 shadow stack above 4GB.  We need to save and restore 64-bit
shadow stack register for x32. builtin jmp buf size is 5 pointers.  We
have space to save 64-bit shadow stack pointer: 32-bit SP, 32-bit FP,
32-bit IP, 64-bit SSP for x32.

PR target/84066
* gcc/config/i386/i386.md: Replace Pmode with word_mode in
builtin_setjmp_setup and builtin_longjmp to support x32.
* gcc/testsuite/gcc.target/i386/cet-sjlj-6.c: New test.

Ok for trunk?

Igor




0001-PR84066-Wrong-shadow-stack-register-size-is-saved-fo.patch
Description: 0001-PR84066-Wrong-shadow-stack-register-size-is-saved-fo.patch


Re: [PATCH] Fix PR84101, account for function ABI details in vectorization costs

2018-01-30 Thread Richard Biener
On Tue, 30 Jan 2018, Richard Biener wrote:

> 
> This patch tries to deal with the "easy" part of a function ABI,
> the return value location, in vectorization costing.  The testcase
> shows that if we vectorize the returned value but the function
> doesn't return in memory or in a vector register but as in this
> case in an integer register pair (reg:TI ax) (bah, ABI details
> exposed late?  why's this not a parallel?) we end up spilling
> badly.
> 
> The idea is to account for such spilling so if vectorization
> benefits outweight the spilling we'll vectorize anyway.
> 
> I think the particular testcase could be fixed in the subreg
> pass basically undoing the vectorization but I realize that
> generally this is a too hard problem and avoiding vectorization
> is better.  Still this patch is somewhat fragile in that it
> depends on us "seeing" that the stored to decl is returned
> (see cfun_returns).
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> I'd like to hear opinions on my use of hard_function_value
> and also from other target maintainers.  I'm not sure we
> have sufficient testsuite coverage of _profitable_ vectorization
> of a return value.  Feel free to add to this for your
> target.
> 
> Ok for trunk?

Ok, looks like hard_function_value doesn't like being called
with

type 

we then ICE:

#0  fancy_abort (
file=0x2094ef0 
"/space/rguenther/src/svn/early-lto-debug/gcc/machmode.h", 
line=292, 
function=0x2095534 ::require() 
const::__FUNCTION__> "require") at 
/space/rguenther/src/svn/early-lto-debug/gcc/diagnostic.c:1500
#1  0x00c34790 in opt_mode::require (
this=0x7fffcf20)
at /space/rguenther/src/svn/early-lto-debug/gcc/machmode.h:292
#2  0x00db271a in hard_function_value (valtype=0x765009d8, 
func=0x764fff00, fntype=0x0, outgoing=1)
at /space/rguenther/src/svn/early-lto-debug/gcc/explow.c:2212

2201  /* int_size_in_bytes can return -1.  We don't need a check 
here
2202 since the value of bytes will then be large enough that 
no
2203 mode will match anyway.  */
2204
2205  FOR_EACH_MODE_IN_CLASS (tmpmode, MODE_INT)
2206{
(gdb) l
2207  /* Have we found a large enough mode?  */
2208  if (GET_MODE_SIZE (tmpmode.require ()) >= bytes)
2209break;
2210}
2211
2212  PUT_MODE (val, tmpmode.require ());

which means we didn't find an integer mode of the wanted size.
The backend simply returned (reg:BLK 0 ax).  I suppose I have
to guard hard_function_value with sth ...

Richard.

> Thanks,
> Richard.
> 
> 2018-01-30  Richard Biener  
> 
>   PR tree-optimization/84101
>   * tree-vect-stmts.c: Include explow.h for hard_function_value.
>   (cfun_returns): New helper.
>   (vect_model_store_cost): When vectorizing a store to a decl
>   we return and the function ABI returns via a non-vector
>   register account for the possible spilling that will happen.
> 
>   * gcc.target/i386/pr84101.c: New testcase.
> 
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c (revision 257139)
> +++ gcc/tree-vect-stmts.c (working copy)
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfg.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "cfgloop.h"
> +#include "explow.h"
>  #include "tree-ssa-loop.h"
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
> @@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt
>   "prologue_cost = %d .\n", inside_cost, prologue_cost);
>  }
>  
> +/* Returns true if the current function returns DECL.  */
> +
> +static bool
> +cfun_returns (tree decl)
> +{
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +{
> +  greturn *ret = safe_dyn_cast  (last_stmt (e->src));
> +  if (ret && gimple_return_retval (ret) == decl)
> + return true;
> +}
> +  return false;
> +}
> +
>  /* Function vect_model_store_cost
>  
> Models cost for stores.  In the case of grouped accesses, one access
> @@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm
>  vec_to_scalar, stmt_info, 0, vect_body);
>  }
>  
> +  /* When vectorizing an SLP store into the function result assign
> + a penalty if the function returns in a non-vector register.
> + In this case we assume we'll end up with having to spill the
> + vector result and do element loads.  */
> +  if (slp_node)
> +{
> +  tree base = get_base_address (dr->ref);
> +  if (base
> +   && (TREE_CODE (base) == RESULT_DECL
> +   || (DECL_P (base) && cfun_returns (base
> + {
> +   rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
> +   if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg)))
> +   || GET_CODE (reg) ==

Re: [PATCH][PR target/84066] Wrong shadow stack register size is saved for x32

2018-01-30 Thread Uros Bizjak
On Tue, Jan 30, 2018 at 3:19 PM, Tsimbalist, Igor V
 wrote:
> x32 is a 64-bit process with 32-bit software pointer and kernel may
> place x32 shadow stack above 4GB.  We need to save and restore 64-bit
> shadow stack register for x32. builtin jmp buf size is 5 pointers.  We
> have space to save 64-bit shadow stack pointer: 32-bit SP, 32-bit FP,
> 32-bit IP, 64-bit SSP for x32.
>
> PR target/84066
> * gcc/config/i386/i386.md: Replace Pmode with word_mode in
> builtin_setjmp_setup and builtin_longjmp to support x32.
> * gcc/testsuite/gcc.target/i386/cet-sjlj-6.c: New test.
>
> Ok for trunk?

LGTM, but please check the testcase with -mx32
-maddress-mode={short,long} nevertheless to catch any incosistencies.

Thanks,
Uros.


Re: Link with correct values-*.o files on Solaris (PR target/40411)

2018-01-30 Thread Joseph Myers
On Tue, 30 Jan 2018, Rainer Orth wrote:

> So it seems the following snippet
> 
> #define STARTFILE_ARCH_SPEC \
> [...]
>  %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \
> 
> seems like the right thing to do, as you said.

That version would need updating when we add -std=c2x (once there's a C2x 
working draft with features being added to it).  If you use std=c* instead 
of separate std=c9* and std=c1* you'd avoid needing such a change - but 
then of course it would cover -std=c++* for C++.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][PR target/84066] Wrong shadow stack register size is saved for x32

2018-01-30 Thread H.J. Lu
On Tue, Jan 30, 2018 at 6:38 AM, Uros Bizjak  wrote:
> On Tue, Jan 30, 2018 at 3:19 PM, Tsimbalist, Igor V
>  wrote:
>> x32 is a 64-bit process with 32-bit software pointer and kernel may
>> place x32 shadow stack above 4GB.  We need to save and restore 64-bit
>> shadow stack register for x32. builtin jmp buf size is 5 pointers.  We
>> have space to save 64-bit shadow stack pointer: 32-bit SP, 32-bit FP,
>> 32-bit IP, 64-bit SSP for x32.
>>
>> PR target/84066
>> * gcc/config/i386/i386.md: Replace Pmode with word_mode in
>> builtin_setjmp_setup and builtin_longjmp to support x32.
>> * gcc/testsuite/gcc.target/i386/cet-sjlj-6.c: New test.
>>
>> Ok for trunk?
>
> LGTM, but please check the testcase with -mx32
> -maddress-mode={short,long} nevertheless to catch any incosistencies.
>

Speaking of -maddress-mode=, shouldn't

+  reg_adj = gen_rtx_SUBREG (Pmode, reg_ssp, 0);
   tmp = gen_rtx_SET (reg_adj,
  gen_rtx_LSHIFTRT (Pmode, negate_rtx (Pmode, reg_adj),
-   GEN_INT ((Pmode == SImode)
+   GEN_INT ((word_mode == SImode)
 ? 2
 : 3)));

be

+  reg_adj = gen_rtx_SUBREG (ptr_mode, reg_ssp, 0);
   tmp = gen_rtx_SET (reg_adj,
  gen_rtx_LSHIFTRT (ptr_mode, negate_rtx (ptr_mode, reg_adj),
-   GEN_INT ((Pmode == SImode)
+   GEN_INT ((word_mode == SImode)
 ? 2
 : 3)));

Pmode == word_mode for -maddress-mode=long.

+++ b/gcc/testsuite/gcc.target/i386/cet-sjlj-6.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fcf-protection -mcet -mx32" } */
+/* { dg-final { scan-assembler-times "endbr64" 2 } } */
+/* { dg-final { scan-assembler-times "movq\t.*buf\\+12" 1 } } */
+/* { dg-final { scan-assembler-times "subq\tbuf\\+12" 1 } } */
+/* { dg-final { scan-assembler-times "rdsspq" 2 } } */
+/* { dg-final { scan-assembler-times "incsspq" 2 } } */

Please add a test for

   tmp = gen_rtx_SET (reg_adj,
  gen_rtx_LSHIFTRT (Pmode, negate_rtx (Pmode, reg_adj),
-   GEN_INT ((Pmode == SImode)
+   GEN_INT ((word_mode == SImode)
 ? 2
 : 3)));


-- 
H.J.


Re: Link with correct values-*.o files on Solaris (PR target/40411)

2018-01-30 Thread Rainer Orth
Hi Joseph,

> On Tue, 30 Jan 2018, Rainer Orth wrote:
>
>> So it seems the following snippet
>> 
>> #define STARTFILE_ARCH_SPEC \
>> [...]
>>  %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \
>> 
>> seems like the right thing to do, as you said.
>
> That version would need updating when we add -std=c2x (once there's a C2x 
> working draft with features being added to it).  If you use std=c* instead 
> of separate std=c9* and std=c1* you'd avoid needing such a change - but 
> then of course it would cover -std=c++* for C++.

I know, that why I used the current form.  Admittedly it's a maintenance
problem (likely to be forgotten in the future).  If we add in -ansi, we
can just as well add -std=c++* (thus -std=c*), too.

I guess that's the safest option overall, unless Jonathan has objections
from a C++ standards perspective.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Link with correct values-*.o files on Solaris (PR target/40411)

2018-01-30 Thread Jonathan Wakely

On 30/01/18 15:51 +0100, Rainer Orth wrote:

Hi Joseph,


On Tue, 30 Jan 2018, Rainer Orth wrote:


So it seems the following snippet

#define STARTFILE_ARCH_SPEC \
[...]
 %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \

seems like the right thing to do, as you said.


That version would need updating when we add -std=c2x (once there's a C2x
working draft with features being added to it).  If you use std=c* instead
of separate std=c9* and std=c1* you'd avoid needing such a change - but
then of course it would cover -std=c++* for C++.


I know, that why I used the current form.  Admittedly it's a maintenance
problem (likely to be forgotten in the future).  If we add in -ansi, we
can just as well add -std=c++* (thus -std=c*), too.

I guess that's the safest option overall, unless Jonathan has objections
from a C++ standards perspective.


No objections from me, I'm happy either way.




Re: [PR83370][AARCH64]Use tighter register constraints for sibcall patterns.

2018-01-30 Thread Renlin Li

Hi Richard,

Thanks for the review!

On 29/01/18 20:23, Richard Sandiford wrote:


The patch looks good to me FWIW.  How about adding something like
the following testcase?


/* { dg-do run } */
/* { dg-options "-O2" } */

typedef void (*fun) (void);

void __attribute__ ((noipa))
f (fun x1)
{
   register fun x2 asm ("x16");
   int arr[5000];
   int *volatile ptr = arr;
   asm ("mov %0, %1" : "=r" (x2) : "r" (x1));
   x2 ();
}

void g (void) {}

int
main (void)
{
   f (g);
}



It was hard for me to construct an test case at that time.
Your example here exactly reflect the problem. The code-gen before the change 
is:

f:
mov x16, 20016
sub sp, sp, x16
add x0, sp, 16
mov x16, 20016
str x0, [sp, 8]
add sp, sp, x16
br  x16

After the change to the register constraint:

f:
mov x16, 20016
sub sp, sp, x16
// Start of user assembly
// 9 "indirect_tail_call_reg.c" 1
mov x16, x0
// 0 "" 2
// End of user assembly
add x0, sp, 16
str x0, [sp, 8]
mov x0, x16
mov x16, 20016
add sp, sp, x16
br  x0

I updated the patch with the new test case,
the wording about the register constraint is also updated.

Thanks,
Renlin

gcc/ChangeLog:

2018-01-30  Renlin Li  

* config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
TAILCALL_ADDR_REGS.
(aarch64_register_move_cost): Likewise.
* config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to
TAILCALL_ADDR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to
TAILCALL_ADDR_REGS. Remove IP registers.
* config/aarch64/aarch64.md (Ucs): Update register constraint.

gcc/testsuite/ChangeLog:

2018-01-30  Richard Sandiford  

* gcc.target/aarch64/indirect_tail_call_reg.c: New.




[...]
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 
af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150
 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -21,8 +21,8 @@
  (define_register_constraint "k" "STACK_REG"
"@internal The stack register.")
  
-(define_register_constraint "Ucs" "CALLER_SAVE_REGS"

-  "@internal The caller save registers.")
+(define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
+  "@internal The indirect tail call address registers")
  
  (define_register_constraint "w" "FP_REGS"

"Floating point and SIMD vector registers.")


Maybe "@internal Registers suitable for an indirect tail call"?
Unlike the caller-save registers, these aren't a predefined set.

Thanks,
Richard

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 93d29b84d47b7017661a2129d61e7d740bbf7c93..322b7f4628aa69cf331c12ff2c8df351890da9ef 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -446,7 +446,7 @@ extern unsigned aarch64_architecture_version;
 enum reg_class
 {
   NO_REGS,
-  CALLER_SAVE_REGS,
+  TAILCALL_ADDR_REGS,
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
@@ -462,7 +462,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {		\
   "NO_REGS",	\
-  "CALLER_SAVE_REGS",\
+  "TAILCALL_ADDR_REGS",\
   "GENERAL_REGS",\
   "STACK_REG",	\
   "POINTER_REGS",\
@@ -475,7 +475,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
-  { 0x0007, 0x, 0x },	/* CALLER_SAVE_REGS */	\
+  { 0x0004, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
   { 0x, 0x, 0x0003 },	/* POINTER_REGS */	\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1da313f57e0eed4df36dbd15aecbae9fd73f7388..59ca95019ddf0491c382e7ee2b99966694d0b36d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6062,7 +6062,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
 {
   switch (regclass)
 {
-case CALLER_SAVE_REGS:
+case TAILCALL_ADDR_REGS:
 case POINTER_REGS:
 case GENERAL_REGS:
 case ALL_REGS:
@@ -8228,10 +8228,10 @@ aarch64_register_move_cost (machine_mode mode,
 = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == CALLER_SAVE_REGS || to == POINTER_REGS)
+  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
 to = GENERAL_REGS;
 
-  if (from == CALLER_SAVE_REGS || from == POINTER_REGS)
+  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
 from = GENERAL_REGS;
 
   /* Moving between GPR and stack cost is the same as GP2GP.  */
diff --git a/gcc/config/aarch64/constraints.md b/gcc/c

[PATCH] Fix sched-deps.c handling of frame related insns (PR rtl-optimization/83986)

2018-01-30 Thread Jakub Jelinek
Hi!

We don't want to move /f instructions across jumps in either direction,
sched_analyze_insn has code for that by adding the insn into
sched_before_next_jump list and by adding anti dependence against
pending_jump_insns.  That works fine, as long as we don't flush the
dependency lists, flush_pending_lists will flush them and just add the
insn that was flushed into last_pending_memory_flush.

For frame related insns, we are looking solely for deps->pending_jump_insns
and thus if we've flushed, we miss anti dependency against insn that has
anti dependencies against jumps.

The following patch fixes it by also adding an anti dependency against 
last_pending_memory_flush insn if any.

Bootstrapped on {x86_64,i686,powerpc64le,powerpc64}-linux and regtested
on the first 3, with powerpc64-linux regtest -m32/-m64 still pending,
ok for trunk if it passes even there?

I believe this matches what we do elsewhere, but just in case it would look
too costly (as it could add dependencies also in cases where there weren't
any pending jump insns, just insns using memory and too many dependencies
for them), there might be another option like have a flag in deps whether
in the bb/ebb we've ever added something into pending_jump_insns and
only do the newly added call if that bool is true.  Or tweak
flush_pending_lists, such that if deps->pending_jump_insns used to be
non-empty then add at the end the insn in question to
deps->pending_jump_insns in addition to deps->last_pending_memory_flush,
and tweak the:
  if (deps->pending_flush_length++ >= MAX_PENDING_LIST_LENGTH)
flush_pending_lists (deps, insn, true, true);
  else
deps->pending_jump_insns
  = alloc_INSN_LIST (insn, deps->pending_jump_insns);
so that it would for the case when calling flush_pending_lists also
do the else part if before the flush_pending_lists pending_jump_insns used
to be non-empty.

2018-01-30  Jakub Jelinek  

PR rtl-optimization/83986
* sched-deps.c (sched_analyze_insn): For frame related insns, add anti
dependence against last_pending_memory_flush in addition to
pending_jump_insns.

* gcc.dg/pr83986.c: New test.

--- gcc/sched-deps.c.jj 2018-01-27 07:27:51.723937094 +0100
+++ gcc/sched-deps.c2018-01-30 13:09:16.891173722 +0100
@@ -2922,6 +2922,8 @@ sched_analyze_insn (struct deps_desc *de
= alloc_INSN_LIST (insn, deps->sched_before_next_jump);
 
   /* Make sure epilogue insn is scheduled after preceding jumps.  */
+  add_dependence_list (insn, deps->last_pending_memory_flush, 1,
+  REG_DEP_ANTI, true);
   add_dependence_list (insn, deps->pending_jump_insns, 1, REG_DEP_ANTI,
   true);
 }
--- gcc/testsuite/gcc.dg/pr83986.c.jj   2018-01-30 13:34:13.254828800 +0100
+++ gcc/testsuite/gcc.dg/pr83986.c  2018-01-30 13:33:55.973832214 +0100
@@ -0,0 +1,14 @@
+/* PR rtl-optimization/83986 */
+/* { dg-do compile } */
+/* { dg-options "-g -O2 -fsched2-use-superblocks -funwind-tables --param 
max-pending-list-length=1" } */
+
+int v;
+
+int
+foo (int x)
+{
+  v &= !!v && !!x;
+  if (v != 0)
+foo (0);
+  return 0;
+}

Jakub


Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register

2018-01-30 Thread Aaron Sawdey
On Tue, 2018-01-30 at 14:04 +0100, Jakub Jelinek wrote:
> On IRC when discussing it with Segher this morning we've come to the
> conclusion that it would be best if rs6000 just followed what all
> other
> ports to, i.e. return a pseudo from the target hook, like:
> 
> --- gcc/config/rs6000/rs6000.c2018-01-30 12:30:27.416360076
> +0100
> +++ gcc/config/rs6000/rs6000.c2018-01-30 13:59:07.360639803
> +0100
> @@ -29602,8 +29602,9 @@ rs6000_internal_arg_pointer (void)
> emit_insn_after (pat, get_insns ());
> pop_topmost_sequence ();
>   }
> -  return plus_constant (Pmode, cfun->machine-
> >split_stack_arg_pointer,
> - FIRST_PARM_OFFSET
> (current_function_decl));
> +  rtx ret = plus_constant (Pmode, cfun->machine-
> >split_stack_arg_pointer,
> +FIRST_PARM_OFFSET
> (current_function_decl));
> +  return copy_to_reg (ret);
>  }
>return virtual_incoming_args_rtx;
>  }
> 
> copy_to_reg is what e.g. the generic or pa target hook conditionally
> uses.

This fix looks good, passes bootstrap, go tests run. 

Segher is currently regtesting on ppc64le power9. OK for trunk if tests
pass?

2018-01-30  Aaron Sawdey  

* config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return
a reg rtx.
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 257188)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -29602,8 +29602,9 @@
 	  emit_insn_after (pat, get_insns ());
 	  pop_topmost_sequence ();
 	}
-  return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
-			FIRST_PARM_OFFSET (current_function_decl));
+  rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
+			   FIRST_PARM_OFFSET (current_function_decl));
+  return copy_to_reg (ret);
 }
   return virtual_incoming_args_rtx;
 }


Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 10:21:33AM -0600, Aaron Sawdey wrote:
> 2018-01-30  Aaron Sawdey  
> 
>   * config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return

No space before ): .  Will defer ack to Segher or David.

>   a reg rtx.

> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 257188)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -29602,8 +29602,9 @@
> emit_insn_after (pat, get_insns ());
> pop_topmost_sequence ();
>   }
> -  return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
> - FIRST_PARM_OFFSET (current_function_decl));
> +  rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
> +FIRST_PARM_OFFSET (current_function_decl));
> +  return copy_to_reg (ret);
>  }
>return virtual_incoming_args_rtx;
>  }


Jakub


Re: [PATCH] PR 78534 Reinstate better string copy algorithm

2018-01-30 Thread Janne Blomqvist
PING

On Tue, Jan 23, 2018 at 3:31 PM, Janne Blomqvist
 wrote:
> As part of the change to larger character lengths, the string copy
> algorithm was temporarily pessimized to get around some spurious
> -Wstringop-overflow warnings.  Having tried a number of variations of
> this algorithm I have managed to get it down to one spurious warning,
> only with -O1 optimization, in the testsuite.  This patch reinstates
> the optimized variant and modifies this one testcase to ignore the
> warning.
>
> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>
> gcc/fortran/ChangeLog:
>
> 2018-01-23  Janne Blomqvist  
>
> PR 78534
> * trans-expr.c (fill_with_spaces): Use memset instead of
> generating loop.
> (gfc_trans_string_copy): Improve opportunity to use builtins with
> constant lengths.
>
> gcc/testsuite/ChangeLog:
>
> 2018-01-23  Janne Blomqvist  
>
> PR 78534
> * gfortran.dg/allocate_deferred_char_scalar_1.f03: Prune
> -Wstringop-overflow warnings due to spurious warning with -O1.
> * gfortran.dg/char_cast_1.f90: Update dump scan pattern.
> * gfortran.dg/transfer_intrinsic_1.f90: Likewise.
> ---
>  gcc/fortran/trans-expr.c   | 52 
> --
>  .../allocate_deferred_char_scalar_1.f03|  2 +
>  gcc/testsuite/gfortran.dg/char_cast_1.f90  |  6 +--
>  gcc/testsuite/gfortran.dg/transfer_intrinsic_1.f90 |  2 +-
>  4 files changed, 35 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
> index e90036f..4da6cee 100644
> --- a/gcc/fortran/trans-expr.c
> +++ b/gcc/fortran/trans-expr.c
> @@ -6407,8 +6407,6 @@ fill_with_spaces (tree start, tree type, tree size)
>tree i, el, exit_label, cond, tmp;
>
>/* For a simple char type, we can call memset().  */
> -  /* TODO: This code does work and is potentially more efficient, but
> - causes spurious -Wstringop-overflow warnings.
>if (compare_tree_int (TYPE_SIZE_UNIT (type), 1) == 0)
>  return build_call_expr_loc (input_location,
> builtin_decl_explicit (BUILT_IN_MEMSET),
> @@ -6416,7 +6414,6 @@ fill_with_spaces (tree start, tree type, tree size)
> build_int_cst (gfc_get_int_type (gfc_c_int_kind),
>lang_hooks.to_target_charset (' 
> ')),
> fold_convert (size_type_node, size));
> -  */
>
>/* Otherwise, we use a loop:
> for (el = start, i = size; i > 0; el--, i+= TYPE_SIZE_UNIT (type))
> @@ -6522,11 +6519,20 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
> dlength, tree dest,
>
>/* The string copy algorithm below generates code like
>
> - if (dlen > 0) {
> - memmove (dest, src, min(dlen, slen));
> - if (slen < dlen)
> - memset(&dest[slen], ' ', dlen - slen);
> - }
> + if (destlen > 0)
> +   {
> + if (srclen < destlen)
> +   {
> + memmove (dest, src, srclen);
> + // Pad with spaces.
> + memset (&dest[srclen], ' ', destlen - srclen);
> +   }
> + else
> +   {
> + // Truncate if too long.
> + memmove (dest, src, destlen);
> +   }
> +   }
>*/
>
>/* Do nothing if the destination length is zero.  */
> @@ -6555,21 +6561,16 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
> dlength, tree dest,
>else
>  src = gfc_build_addr_expr (pvoid_type_node, src);
>
> -  /* First do the memmove. */
> -  tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen,
> - slen);
> -  tmp2 = build_call_expr_loc (input_location,
> - builtin_decl_explicit (BUILT_IN_MEMMOVE),
> - 3, dest, src,
> - fold_convert (size_type_node, tmp2));
> -  stmtblock_t tmpblock2;
> -  gfc_init_block (&tmpblock2);
> -  gfc_add_expr_to_block (&tmpblock2, tmp2);
> -
> -  /* If the destination is longer, fill the end with spaces.  */
> +  /* Truncate string if source is too long.  */
>cond2 = fold_build2_loc (input_location, LT_EXPR, logical_type_node, slen,
>dlen);
>
> +  /* Copy and pad with spaces.  */
> +  tmp3 = build_call_expr_loc (input_location,
> + builtin_decl_explicit (BUILT_IN_MEMMOVE),
> + 3, dest, src,
> + fold_convert (size_type_node, slen));
> +
>/* Wstringop-overflow appears at -O3 even though this warning is not
>   explicitly available in fortran nor can it be switched off. If the
>   source length is a constant, its negative appears as a very large
> @@ -6584,14 +6585,19 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
> dlength, tree dest,
>tmp4 = fill_with_spaces (tmp4, chartype, tmp);
>
>gfc_init_block (&tempblock);
> +  gfc_add_expr_to

Re: [PATCH] Use pointer sized array indices.

2018-01-30 Thread Janne Blomqvist
PING**2

On Wed, Jan 17, 2018 at 8:02 PM, Janne Blomqvist
 wrote:
> PING
>
> On Fri, Dec 29, 2017 at 8:28 PM, Janne Blomqvist
>  wrote:
>> On Fri, Dec 29, 2017 at 7:17 PM, Thomas Koenig  wrote:
>>> Hi Janne,
>>>
 Using pointer sized variables (e.g. size_t / ptrdiff_t) when the
 variables are used as array indices allows accessing larger arrays,
 and can be a slight performance improvement due to no need for sign or
 zero extending, or masking.
>>>
>>>
>>> Unless I have missed something, all the examples are for cases where
>>> the array is of maximum size GFC_MAX_DIMENSIONS.
>>
>> Many, but not all.
>>
>>> This is why they
>>> were left as int in the first place (because it is unlikely we will have
>>> arrays of more than 2^31-1 dimensions soon :-).
>>>
>>> Do you really think this change is necessary? If not, I'd rather avoid
>>> such a change.
>>
>> I'm not planning on supporting > 2**31-1 dimensions, no. :)
>>
>> But even if we know that the maximum value is always going to be
>> smaller, by using pointer-sized variables the compiler can generate
>> slightly more efficient code.
>>
>> See e.g. https://godbolt.org/g/oAvw5L ; in the functions with a loop,
>> the ones which use pointer-sized indices have shorter preambles as
>> well as loop bodies. And for the simple functions that just index an
>> array, by using pointer-sized indices there is no need to zero the
>> upper half of the registers.
>>
>> I mean, it's not a huge improvement, but it might be a tiny one in some 
>> cases.
>>
>> Also, by moving the induction variable from the beginning of the
>> function into the loop header, it makes it easier for both readers and
>> the compiler to see the scope of the variable.
>>
>> --
>> Janne Blomqvist
>
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist


Re: [patch, libfortran] Adapt handling of derived types to new dtype

2018-01-30 Thread Janne Blomqvist
On Mon, Jan 29, 2018 at 11:51 PM, Thomas Koenig  wrote:
> Hello world,
>
> this patch fixes the library issues left over from Paul's rank-15 patch
> by removing the GFC_DTYPE_DERIVED_* macros and (mostly) handling these
> cases separately.
>
> The problem was with folding the type and size into a single word. For
> all normal data types, this is perfectly fine, because the sizes are
> known to fit. For derived types or characters, there could be a problem
> if a user specified a very large size.
>
> Regression-tested. OK for trunk?

Ok, thanks.


-- 
Janne Blomqvist


Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register

2018-01-30 Thread Segher Boessenkool
Hi!

On Tue, Jan 30, 2018 at 10:21:33AM -0600, Aaron Sawdey wrote:
> This fix looks good, passes bootstrap, go tests run. 
> 
> Segher is currently regtesting on ppc64le power9. OK for trunk if tests
> pass?
> 
> 2018-01-30  Aaron Sawdey  
> 
>   * config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return
>   a reg rtx.

Yes please, with the changelog typo fixed as Jakub said.

Thanks!  Glad this is over with :-)

I think it needs backports, too?  Those are okay as well.


Segher


> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 257188)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -29602,8 +29602,9 @@
> emit_insn_after (pat, get_insns ());
> pop_topmost_sequence ();
>   }
> -  return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
> - FIRST_PARM_OFFSET (current_function_decl));
> +  rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
> +FIRST_PARM_OFFSET (current_function_decl));
> +  return copy_to_reg (ret);
>  }
>return virtual_incoming_args_rtx;
>  }



Re: [PATCH] make -Wrestrict for strcat more meaningful (PR 83698)

2018-01-30 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01488.html

This is a minor improvement but there have been a couple of
comments lately pointing out that the numbers in the -Wrestrict
messages can make them confusing.

Jakub, since you made one of those comments (the other came up
in the context of bug 84095 - [8 Regression] false-positive
-Wrestrict warnings for memcpy within array).  Can you please
either approve this patch or suggest changes?

On 01/16/2018 05:35 PM, Martin Sebor wrote:

On 01/16/2018 02:32 PM, Jakub Jelinek wrote:

On Tue, Jan 16, 2018 at 01:36:26PM -0700, Martin Sebor wrote:

--- gcc/gimple-ssa-warn-restrict.c(revision 256752)
+++ gcc/gimple-ssa-warn-restrict.c(working copy)
@@ -384,6 +384,12 @@ builtin_memref::builtin_memref (tree expr, tree si
   base = SSA_NAME_VAR (base);
   }

+  if (DECL_P (base) && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE)
+{
+  if (offrange[0] < 0 && offrange[1] > 0)
+offrange[0] = 0;
+}


Why the 2 nested ifs?


No particular reason.  There may have been more code in there
that I ended up removing.  Or a comment.  I can remove the
extra braces when the patch is approved.




@@ -1079,14 +1085,35 @@ builtin_access::strcat_overlap ()
 return false;

   /* When strcat overlap is certain it is always a single byte:
- the terminatinn NUL, regardless of offsets and sizes.  When
+ the terminating NUL, regardless of offsets and sizes.  When
  overlap is only possible its range is [0, 1].  */
   acs.ovlsiz[0] = dstref->sizrange[0] == dstref->sizrange[1] ? 1 : 0;
   acs.ovlsiz[1] = 1;
-  acs.ovloff[0] = (dstref->sizrange[0] +
dstref->offrange[0]).to_shwi ();
-  acs.ovloff[1] = (dstref->sizrange[1] +
dstref->offrange[1]).to_shwi ();


You use to_shwi many times in the patch, do the callers or something
earlier
in this function guarantee that you aren't throwing away any bits (unlike
tree_to_shwi, to_shwi method doesn't ICE, just throws away upper bits).
Especially when you perform additions like here, even if both
wide_ints fit
into a shwi, the result might not.


No, I'm not sure.  In fact, it wouldn't surprise me if it did
happen.  It doesn't cause false positives or negatives but it
can make the offsets less than meaningful in cases where they
are within valid bounds.  There are also cases where they are
meaningless to begin with and there is little the pass can do
about that.

IMO, the ideal solution to the first problem is to add a format
specifier for wide ints to the pretty printer and get rid of
the conversions.  It's probably too late for something like
that now but I'd like to do it for GCC 9.  Unless someone
files a bug/regression, it's also too late for me to go and
try to find and fix these conversions now.

Martin

PS While looking for a case you asked about I came up with
the following.  I don't think there's any slicing involved
but the offsets are just as meaningless as if there were.
I think the way to do significantly better is to detect
out-of-bounds offsets earlier (e.g., as in this patch:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02143.html)

$ cat z.c && gcc -O2 -S -Warray-bounds -m32 z.c
extern int a[];

void f (__PTRDIFF_TYPE__ i)
{
  if (i < __PTRDIFF_MAX__ - 7 || __PTRDIFF_MAX__ - 5 < i)
i = __PTRDIFF_MAX__ -  7;

  const int *s = a + i;

  __builtin_memcpy (a, &s[i], 3);
}
z.c: In function ‘f’:
z.c:10:3: warning: ‘__builtin_memcpy’ offset [-64, -48] is out of the
bounds of object ‘a’ with type ‘int[]’ [-Warray-bounds]
   __builtin_memcpy (a, &s[i], 3);
   ^~
z.c:1:12: note: ‘a’ declared here
 extern int a[];
^





Re: [PR81611] improve auto-inc

2018-01-30 Thread Alexandre Oliva
On Jan 25, 2018, Alexandre Oliva  wrote:

> Thanks, I'll retest with the simplified test (just in case; for I can't
> recall why I ended up with all those redundant conditions),

As suspected, removing the redundant tests didn't regress anything.  I
suppose they mattered in some earlier experimental version of the patch
(I vaguely recall having a long sequence of tests within the loop at
some point), and I just failed to further simplify the final form.
Thanks for catching the further simplification opportunities!

FTR, here's the patch I'm installing, while awaiting a review from
someone else on the rtl auto-inc patch (the second and last patch in
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01994.html).


[PR81611] accept copies in simple_iv_increment_p

If there are copies between the GIMPLE_PHI at the loop body and the
increment that reaches it (presumably through a back edge), still
regard it as a simple_iv_increment, so that we won't consider the
value in the back edge eligible for forwprop.  Doing so would risk
making the phi node and the incremented conflicting value live
within the loop, and the phi node to be preserved for propagated
uses after the loop.

for  gcc/ChangeLog

PR tree-optimization/81611
* tree-ssa-dom.c (simple_iv_increment_p): Skip intervening
copies.
---
 gcc/tree-ssa-dom.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 2b371667253a..a6f176c5def0 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies 
*const_and_copies)
 /* Returns true when STMT is a simple iv increment.  It detects the
following situation:
 
-   i_1 = phi (..., i_2)
-   i_2 = i_1 +/- ...  */
+   i_1 = phi (..., i_k)
+   [...]
+   i_j = i_{j-1}  for each j : 2 <= j <= k-1
+   [...]
+   i_k = i_{k-1} +/- ...  */
 
 bool
 simple_iv_increment_p (gimple *stmt)
@@ -1305,8 +1308,15 @@ simple_iv_increment_p (gimple *stmt)
 return false;
 
   phi = SSA_NAME_DEF_STMT (preinc);
-  if (gimple_code (phi) != GIMPLE_PHI)
-return false;
+  while (gimple_code (phi) != GIMPLE_PHI)
+{
+  /* Follow trivial copies, but not the DEF used in a back edge,
+so that we don't prevent coalescing.  */
+  if (!gimple_assign_ssa_name_copy_p (phi))
+   return false;
+  preinc = gimple_assign_rhs1 (phi);
+  phi = SSA_NAME_DEF_STMT (preinc);
+}
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 if (gimple_phi_arg_def (phi, i) == lhs)


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-01-30 Thread Alexandre Oliva
On Jan 26, 2018, Jakub Jelinek  wrote:

> Having to tweak debug info consumers so that they treat DW_LLE_* of 9
> one way for .debug_loclist of version 5 and another way for .debug_loclist
> of version 6 isn't a good idea.

Maybe we don't have to do that.  The reason I implemented the proposed
format was to have a reference implementation, but since it's an
extension to a non-extensible part of DWARF5, it's hard to justify
consumer implementations for that.

> Why don't you emit the DW_LLE_GNU_view_pair stuff in .debug_loclists
> already with -gdwarf-5, if needed paired without some TU attribute that says
> that views are emitted?

Because that would make the loclists unreadable for any DWARF5-compliant
consumer.


> Haven't looked for it it in the coding standards, but it is something
> I've been asked to do in my code by various reviewers over the years and
> what I've been asking others in my patch reviews from others too.

Thanks.  It was the first (well, second; you'd requested similar changes
another time before) I heard of that.  I'm still unconvinced the way I
used is not compliant, but I'll keep that in mind.  Hopefully I won't
run into someone who insists the other one (the one you reject) is the
only correct one ;-)

> I feel strongly about indenting stuff right,

I'm with you on that, we just have different ideas about what "right"
stands for ;-)

> which if it wouldn't fit would be
>   return verylongcondition
>  && otherlongcondition__;
> rather than
>   return verylongcondition
>   && otherlongcondition__;
> or similar, it would surprise me if it is not in the coding standard.

I agree neither of these two forms is correct.

But it is my understanding that both of the following are correct:

  return (verylongcondition
  && otherlongcondition__);

  return verylongcondition
&& otherlongcondition__;

The first, because the parenthesized expression is continued with
indentation to match the parenthesis, the second because the return
statement is continued with the correct indentation for the continuation
of a statement.

>> Personally, I don't see that line breaks before operators are only
>> permitted when the operand that follows wouldn't fit; the standards are

> Yes, you can break it, but why, it doesn't make the code more readable,
> but less in this case.

I thought it made it more readable, especially in the context of the
patch.  I guess this means we'll have to agree to disagree.

> So, what is the reason not to emit the format you are proposing already with
> -gdwarf-5, perhaps with some extra attribute on the TU (of course not when
> -gstrict-dwarf)?

See above.  We don't want to create unreadable loclists for
standard-compliant consumers, do we?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH] Fix sched-deps.c handling of frame related insns (PR rtl-optimization/83986)

2018-01-30 Thread Vladimir Makarov

On 01/30/2018 11:11 AM, Jakub Jelinek wrote:

Hi!

We don't want to move /f instructions across jumps in either direction,
sched_analyze_insn has code for that by adding the insn into
sched_before_next_jump list and by adding anti dependence against
pending_jump_insns.  That works fine, as long as we don't flush the
dependency lists, flush_pending_lists will flush them and just add the
insn that was flushed into last_pending_memory_flush.
Yes, we don't think to move frame related insns.As most frame insns 
are moves of args from hard regs to pseudos it might result in troubles 
in LRA can not find a spill regs because it has limited possibilities to 
spill hard regs.   RA actually will try to remove these moves by 
assigning the same hard regs to the pseudos so scheduling them has a 
little sense.

For frame related insns, we are looking solely for deps->pending_jump_insns
and thus if we've flushed, we miss anti dependency against insn that has
anti dependencies against jumps.

The following patch fixes it by also adding an anti dependency against
last_pending_memory_flush insn if any.

Bootstrapped on {x86_64,i686,powerpc64le,powerpc64}-linux and regtested
on the first 3, with powerpc64-linux regtest -m32/-m64 still pending,
ok for trunk if it passes even there?

OK.  Thanks, Jakub.

2018-01-30  Jakub Jelinek  

PR rtl-optimization/83986
* sched-deps.c (sched_analyze_insn): For frame related insns, add anti
dependence against last_pending_memory_flush in addition to
pending_jump_insns.

* gcc.dg/pr83986.c: New test.

--- gcc/sched-deps.c.jj 2018-01-27 07:27:51.723937094 +0100
+++ gcc/sched-deps.c2018-01-30 13:09:16.891173722 +0100
@@ -2922,6 +2922,8 @@ sched_analyze_insn (struct deps_desc *de
= alloc_INSN_LIST (insn, deps->sched_before_next_jump);
  
/* Make sure epilogue insn is scheduled after preceding jumps.  */

+  add_dependence_list (insn, deps->last_pending_memory_flush, 1,
+  REG_DEP_ANTI, true);
add_dependence_list (insn, deps->pending_jump_insns, 1, REG_DEP_ANTI,
   true);
  }
--- gcc/testsuite/gcc.dg/pr83986.c.jj   2018-01-30 13:34:13.254828800 +0100
+++ gcc/testsuite/gcc.dg/pr83986.c  2018-01-30 13:33:55.973832214 +0100
@@ -0,0 +1,14 @@
+/* PR rtl-optimization/83986 */
+/* { dg-do compile } */
+/* { dg-options "-g -O2 -fsched2-use-superblocks -funwind-tables --param 
max-pending-list-length=1" } */
+
+int v;
+
+int
+foo (int x)
+{
+  v &= !!v && !!x;
+  if (v != 0)
+foo (0);
+  return 0;
+}

Jakub





[PR lto/84105] handle TYPE_IDENTIFIERs for FUNCTION_TYPEs in the gimple pretty printer

2018-01-30 Thread Aldy Hernandez

Hi!

As discussed in the PR, the ICE here happens in dump_generic_node:

case FUNCTION_TYPE:
case METHOD_TYPE:
...
  if (TYPE_NAME (node) && DECL_NAME (TYPE_NAME (node)))
dump_decl_name (pp, TYPE_NAME (node), flags);

(gdb) print node
$21 = 
(gdb) print node.type_common.name
$22 = 

TYPE_NAME is an IDENTIFIER_NODE, whereas we're expecting a DECL_P, and 
bad things happen.


OK pending tests?
gcc/

	PR lto/84105
	* tree-pretty-print.c (dump_generic_node): Handle a TYPE_NAME with
	an IDENTIFIER_NODE for FUNCTION_TYPE's.

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 54a8dfa3b6f..73eb27c8e8f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1822,7 +1822,9 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 	pp_string (pp, "");
 	  pp_colon_colon (pp);
 	}
-  if (TYPE_NAME (node) && DECL_NAME (TYPE_NAME (node)))
+  if (TYPE_IDENTIFIER (node))
+	dump_generic_node (pp, TYPE_NAME (node), spc, flags, false);
+  else if (TYPE_NAME (node) && DECL_NAME (TYPE_NAME (node)))
 	dump_decl_name (pp, TYPE_NAME (node), flags);
   else if (flags & TDF_NOUID)
 	pp_printf (pp, "");


Re: [PING^3] [PATCH] Remove CANADIAN, that break compilation for foreign target

2018-01-30 Thread Petr Ovtchenkov
On Tue, 30 Jan 2018 12:54:21 +
Jonathan Wakely  wrote:

> On 30/01/18 10:19 +0300, Petr Ovtchenkov wrote:
> >On Tue, 19 Dec 2017 11:37:43 +0300
> >Petr Ovtchenkov  wrote:
> >
> >ping^3
> 
> 
> I don't fully understand the consequences (or need) for this patch.
> 
> I asked some other people to look at it, and didn't get confirmation
> it's OK. So I'm reluctant to make the change. Especially this late in
> the GCC 8 cycle.

Hmm, almost two years for this words. Do you really think that
-I/usr/include may be correct in build of cross of some kind
(CANADIAN is just variant of cross)? Of cause, I don't consider
case with rolling target headers in /usr/include.


> 
> 
> >> On Thu, 16 Nov 2017 20:55:37 +0300
> >> Petr Ovtchenkov  wrote:
> >>
> >> > On Wed, 20 Sep 2017 13:44:59 +0300
> >> > Petr Ovtchenkov  wrote:
> >> >
> >> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71212
> >> > >
> >> > > On Fri, 20 May 2016 16:10:50 +0300
> >> > > Petr Ovtchenkov  wrote:
> >> > >
> >> > > > Some old ad-hoc (adding -I/usr/include to compiler
> >> > > > flags) break compilation of libstdc++ for foreign
> >> > > > target architecture (due to compiler see includes
> >> > > > of native).
> >> >
> >> > Reference for terms:
> >> >
> >> > https://gcc.gnu.org/onlinedocs/gccint/Configure-Terms.html
> >> >
> >> > Present of "CANADIAN=yes" lead to inclusion of
> >> > headers from build (-I/usr/include). "CANADIAN=yes" used _only_
> >> > to set "-I/usr/include".
> >> >
> >> > Inclusion of build headers in cross-compilation
> >> > process is not a mistake only in case of native (i.e. it is mistake
> >> > for cross, for canadian, for crossed native and for crossback),
> >> > but sometimes give "success".
> >> >
> >> > Note, that build/host/target may be different not only due to
> >> > different architectures, but due to different sysroots
> >> > (libc, kernel, binutils, etc.).
> >> >
> >> > CANADIAN is set to "yes" by code
> >> >
> >> > -  # If Canadian cross, then don't pick up tools from the build 
> >> > directory.
> >> > -  # Used only in GLIBCXX_EXPORT_INCLUDES.
> >> > -  if test -n "$with_cross_host" &&
> >> > - test x"$build_alias" != x"$with_cross_host" &&
> >> > - test x"$build" != x"$target";
> >> > -  then
> >> > -CANADIAN=yes
> >> > -  else
> >> > -CANADIAN=no
> >> > -  fi
> >> >
> >> > and it add "-I/usr/include" to compiler flags for building libstdc++.
> >> > This is wrong.
> >> >
> >> > Reference to patch:
> >> > https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01332.html


Re: [PATCH] Use pointer sized array indices.

2018-01-30 Thread Thomas Koenig

Hi Janne,


PING**2


OK.

Regards, Thomas


C++ PATCH for c++/84098, ICE with lambda in template NSDMI

2018-01-30 Thread Jason Merrill
When instantiating the members of a class template, we don't need to
consider lambdas, as instantiating them will be handled when we get to
tsubst_lambda_expr.  And now the new assert catches places we were
failing to ignore them, as here.

Tested x86_64-pc-linux-gnu, applied to trunk.
commit d97240d44096258f2c5218560861631b908d1024
Author: Jason Merrill 
Date:   Mon Jan 29 17:31:42 2018 -0500

PR c++/84098 - ICE with lambda in template NSDMI.

* pt.c (instantiate_class_template_1): Ignore more lambdas.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6c5d06b9ebb..9516be893aa 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10527,6 +10527,11 @@ instantiate_class_template_1 (tree type)
{
  if (TYPE_P (t))
{
+ if (LAMBDA_TYPE_P (t))
+   /* A closure type for a lambda in an NSDMI or default argument.
+  Ignore it; it will be regenerated when needed.  */
+   continue;
+
  /* Build new CLASSTYPE_NESTED_UTDS.  */
 
  tree newtag;
@@ -10594,11 +10599,10 @@ instantiate_class_template_1 (tree type)
  && DECL_OMP_DECLARE_REDUCTION_P (r))
cp_check_omp_declare_reduction (r);
}
- else if (DECL_CLASS_TEMPLATE_P (t)
+ else if ((DECL_CLASS_TEMPLATE_P (t) || DECL_IMPLICIT_TYPEDEF_P (t))
   && LAMBDA_TYPE_P (TREE_TYPE (t)))
-   /* A closure type for a lambda in a default argument for a
-  member template.  Ignore it; it will be instantiated with
-  the default argument.  */;
+   /* A closure type for a lambda in an NSDMI or default argument.
+  Ignore it; it will be regenerated when needed.  */;
  else
{
  /* Build new TYPE_FIELDS.  */
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda19.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda19.C
new file mode 100644
index 000..a16d31c59eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda19.C
@@ -0,0 +1,13 @@
+// PR c++/84098
+// { dg-options -std=c++17 }
+
+struct A{};
+
+template < typename >
+struct Test{
+static constexpr auto var = []{};
+};
+
+int main(){
+(void)Test< A >::var;
+}


C++ PATCH for c++/84091, ICE with local class in lambda in template

2018-01-30 Thread Jason Merrill
Another refinement to local class handling in determine_visibility.
If we're looking at a local class inside a lambda, it thinks it is in
a template but the lambda isn't a template instantiation.  So look for
the enclosing template context to use.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit cc2f223a50dc6477844c7658403fb4be896036e5
Author: Jason Merrill 
Date:   Tue Jan 30 07:45:01 2018 -0500

PR c++/84091 - ICE with local class in lambda in template.

* decl2.c (determine_visibility): Look for outer containing template
instantiation.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index ef7e6de41c3..2da6f9023c5 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2418,6 +2418,16 @@ determine_visibility (tree decl)
 by that.  */
  if (DECL_LANG_SPECIFIC (fn) && DECL_USE_TEMPLATE (fn))
template_decl = fn;
+ else if (template_decl)
+   {
+ /* FN must be a regenerated lambda function, since they don't
+have template arguments.  Find a containing non-lambda
+template instantiation.  */
+ tree ctx = fn;
+ while (ctx && !get_template_info (ctx))
+   ctx = get_containing_scope (ctx);
+ template_decl = ctx;
+   }
}
   else if (VAR_P (decl) && DECL_TINFO_P (decl)
   && flag_visibility_ms_compat)
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-local1.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-local1.C
new file mode 100644
index 000..a2dd350369a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-local1.C
@@ -0,0 +1,13 @@
+// PR c++/84091
+// { dg-do compile { target c++11 } }
+
+template < typename > void f ()
+{ 
+  [] { struct A {} a; } ();
+}
+
+int main ()
+{ 
+  f < int > ();
+  return 0;
+}


Re: Fix ipa-inline ICE

2018-01-30 Thread Rainer Orth
Hi Jan,

> this patch fixed ICE that was introduced by Richard Standiford's change to 
> reorder
> can and want_inline predicates to reduce amount of work done to verify 
> inlining limits.
> This bypasses check that the function is optimized that makes inliner to ICE 
> because
> function summary is missing.
>
> This patch breaks out the expensive limits checking from can predicate to new 
> one
> which makes code bit more convoluted but I hope to clean things up next 
> stage1.
[...]
>   * g++.dg/torture/pr81360.C: New testcase

the new testcase comes out UNRESOLVED everywhere:

pr81360.C   -O0   scan-ipa-dump icf "Equal symbols: 0"
+UNRESOLVED: g++.dg/torture/pr81360.C   -O1   scan-ipa-dump icf "Equal symbols: 
0"
+UNRESOLVED: g++.dg/torture/pr81360.C   -O2   scan-ipa-dump icf "Equal symbols: 
0"
+UNRESOLVED: g++.dg/torture/pr81360.C   -O2 -flto   scan-ipa-dump icf "Equal 
symbols: 0"
+UNRESOLVED: g++.dg/torture/pr81360.C   -O2 -flto -flto-partition=none   
scan-ipa-dump icf "Equal symbols: 0"
+UNRESOLVED: g++.dg/torture/pr81360.C   -O3 -g   scan-ipa-dump icf "Equal 
symbols: 0"
+UNRESOLVED: g++.dg/torture/pr81360.C   -Os   scan-ipa-dump icf "Equal symbols: 
0"

with

g++.dg/torture/pr81360.C   -O0  : dump file does not exist

in the log.  The following patch fixes that, tested with the appropriate
runtest invocation.

I guess this is obvious?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-01-30  Rainer Orth  

* g++.dg/torture/pr81360.C: Add -fdump-ipa-icf to dg-options.

diff --git a/gcc/testsuite/g++.dg/torture/pr81360.C b/gcc/testsuite/g++.dg/torture/pr81360.C
--- a/gcc/testsuite/g++.dg/torture/pr81360.C
+++ b/gcc/testsuite/g++.dg/torture/pr81360.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-early-inlining"  } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-icf"  } */
 
 template  class B;
 template  class TriaObjectAccessor;


Re: Fix ipa-inline ICE

2018-01-30 Thread Richard Biener
On January 30, 2018 9:05:31 PM GMT+01:00, Rainer Orth 
 wrote:
>Hi Jan,
>
>> this patch fixed ICE that was introduced by Richard Standiford's
>change to reorder
>> can and want_inline predicates to reduce amount of work done to
>verify inlining limits.
>> This bypasses check that the function is optimized that makes inliner
>to ICE because
>> function summary is missing.
>>
>> This patch breaks out the expensive limits checking from can
>predicate to new one
>> which makes code bit more convoluted but I hope to clean things up
>next stage1.
>[...]
>>  * g++.dg/torture/pr81360.C: New testcase
>
>the new testcase comes out UNRESOLVED everywhere:
>
>pr81360.C   -O0   scan-ipa-dump icf "Equal symbols: 0"
>+UNRESOLVED: g++.dg/torture/pr81360.C   -O1   scan-ipa-dump icf "Equal
>symbols: 0"
>+UNRESOLVED: g++.dg/torture/pr81360.C   -O2   scan-ipa-dump icf "Equal
>symbols: 0"
>+UNRESOLVED: g++.dg/torture/pr81360.C   -O2 -flto   scan-ipa-dump icf
>"Equal symbols: 0"
>+UNRESOLVED: g++.dg/torture/pr81360.C   -O2 -flto -flto-partition=none 
> scan-ipa-dump icf "Equal symbols: 0"
>+UNRESOLVED: g++.dg/torture/pr81360.C   -O3 -g   scan-ipa-dump icf
>"Equal symbols: 0"
>+UNRESOLVED: g++.dg/torture/pr81360.C   -Os   scan-ipa-dump icf "Equal
>symbols: 0"
>
>with
>
>g++.dg/torture/pr81360.C   -O0  : dump file does not exist
>
>in the log.  The following patch fixes that, tested with the
>appropriate
>runtest invocation.
>
>I guess this is obvious?

Yes. 

>   Rainer



C++ PATCH to fix ICE in generic lambda with user-defined conversion (PR c++/84125)

2018-01-30 Thread Marek Polacek
This testcase breaks since r256550 because we end up trying to build_address of
a CONSTRUCTOR, but that doesn't work because we hit
  gcc_checking_assert (TREE_CODE (t) != CONSTRUCTOR);

finish_static_assert gets {} as 'condition'.  In the testcase we have a 
user-defined conversion, so {} should be turned to false, via
perform_implicit_conversion_flags -> ... -> build_user_type_conversion_1, but
that crashes as explained above.

This only happens while processing generic lambda because 
processing_template_decl is 1,
so finish_compound_literal returns {} instead of a TARGET_EXPR.

Part of r256550 was this change:

@@ -8640,9 +8642,9 @@ finish_static_assert (tree condition, tree message, 
location_t location,
 }

   /* Fold the expression and convert it to a boolean value. */
-  condition = instantiate_non_dependent_expr (condition);
-  condition = cp_convert (boolean_type_node, condition, tf_warning_or_error);
-  condition = maybe_constant_value (condition);
+  condition = perform_implicit_conversion_flags (boolean_type_node, condition,
+complain, LOOKUP_NORMAL);
+  condition = fold_non_dependent_expr (condition);

   if (TREE_CODE (condition) == INTEGER_CST && !integer_zerop (condition))
 /* Do nothing; the condition is satisfied. */

where instantiate_non_dependent_expr turned {} into TARGET_EXPR ,
which we can process just fine.  So perhaps we need to put the call back.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-01-30  Marek Polacek  

PR c++/84125
* semantics.c (finish_static_assert): Call
instantiate_non_dependent_expr.

* g++.dg/cpp1y/lambda-generic-84125.C: New test.

diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index b758051965e..38829e327c7 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -8642,6 +8642,7 @@ finish_static_assert (tree condition, tree message, 
location_t location,
 }
 
   /* Fold the expression and convert it to a boolean value. */
+  condition = instantiate_non_dependent_expr (condition);
   condition = perform_implicit_conversion_flags (boolean_type_node, condition,
 complain, LOOKUP_NORMAL);
   condition = fold_non_dependent_expr (condition);
diff --git gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C 
gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C
index e69de29bb2d..8bf6a09652e 100644
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C
@@ -0,0 +1,10 @@
+// PR c++/84125
+// { dg-do compile { target c++14 } }
+
+struct X { constexpr operator bool() const { return true; } };
+
+int main(){
+[](auto) {
+static_assert(X{}, "");
+};
+}

Marek


Fix gnat.dg/lto20.adb XPASS

2018-01-30 Thread Rainer Orth
This patch

2018-01-30  Jan Hubicka  

   gcc:
   PR lto/83954
   * lto-symtab.c (warn_type_compatibility_p): Silence false positive
   for type match warning on arrays of pointers.

   gcc/testsuite:
   PR lto/83954
   * gcc.dg/lto/pr83954.h: New testcase.
   * gcc.dg/lto/pr83954_0.c: New testcase.
   * gcc.dg/lto/pr83954_1.c: New testcase.

(which I didn't see posted on gcc-patches yet) seems to have caused

+XPASS: gnat.dg/lto20.adb (test for excess errors)

(seen on i386-pc-solaris2.11 and sparc-sun-solaris2.11).  The original
warning was

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gnat.dg/lto20_pkg.ads:7:13: warning: 
type of 'lto20_pkg__proc' does not match original declaration 
[-Wlto-type-mismatch]

so just removing the dg-excess-errors seems the right thing to do.
Tested with the appropriate runtest invocation on
sparc-sun-solaris2.11.  Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-01-30  Rainer Orth  

PR lto/83954
* gnat.dg/lto20.adb: Remove dg-excess-errors.

diff --git a/gcc/testsuite/gnat.dg/lto20.adb b/gcc/testsuite/gnat.dg/lto20.adb
--- a/gcc/testsuite/gnat.dg/lto20.adb
+++ b/gcc/testsuite/gnat.dg/lto20.adb
@@ -1,6 +1,5 @@
 -- { dg-do run }
 -- { dg-options "-flto" { target lto } }
--- { dg-excess-errors "does not match original declaration" }
 
 with Lto20_Pkg;
 


patch to fix PR84112

2018-01-30 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84112

  The patch was successfully bootstrapped and tested on x86-64 and ppc64.

  Committed as rev. 257204.


Index: ChangeLog
===
--- ChangeLog	(revision 257203)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2018-01-30  Vladimir Makarov  
+
+	PR target/84112
+	* lra-constraints.c (curr_insn_transform): Process AND in the
+	address.
+
 2018-01-30  Jakub Jelinek  
 
 	PR rtl-optimization/83986
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 257203)
+++ lra-constraints.c	(working copy)
@@ -4216,7 +4216,17 @@ curr_insn_transform (bool check_only_p)
 GET_MODE_SIZE (GET_MODE (op)));
 	  else if (get_reload_reg (OP_IN, Pmode, *loc, rclass, FALSE,
    "offsetable address", &new_reg))
-	lra_emit_move (new_reg, *loc);
+	{
+	  rtx addr = *loc;
+	  enum rtx_code code = GET_CODE (addr);
+	  
+	  if (code == AND && CONST_INT_P (XEXP (addr, 1)))
+		/* (and ... (const_int -X)) is used to align to X bytes.  */
+		addr = XEXP (*loc, 0);
+	  lra_emit_move (new_reg, addr);
+	  if (addr != *loc)
+		emit_move_insn (new_reg, gen_rtx_AND (GET_MODE (new_reg), new_reg, XEXP (*loc, 1)));
+	}
 	  before = get_insns ();
 	  end_sequence ();
 	  *loc = new_reg;
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 257203)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-01-30  Vladimir Makarov  
+
+	PR target/84112
+	* pr84112.c: New.
+
 2018-01-30  Jakub Jelinek  
 
 	PR rtl-optimization/83986
Index: testsuite/gcc.target/powerpc/pr84112.c
===
--- testsuite/gcc.target/powerpc/pr84112.c	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr84112.c	(working copy)
@@ -0,0 +1,33 @@
+/* { dg-do compile { target powerpc*-*-* } }*/
+/* { dg-options "-mcpu=power8 -O3 -fstack-protector-strong -fpic" } */
+
+char *b;
+int c, d, e, f;
+
+void
+foo (char *h, int k, int l, int m, int j, int q)
+{
+  char *i = b;
+  int n = d;
+  int s = e;
+  while (c)
+{
+  for (; j <= 0; j += 12)
+	{
+	  i[j] = n & k - h[j] >> 31 | q & ~(k - h[j] >> 31);
+	  i[j + 1] = n & l - h[j + 1] >> 31 | q & ~(l - h[j + 1] >> 31);
+	  i[j + 2] = n & m - h[j + 2] >> 31 | s & ~(m - h[j + 2] >> 31);
+	  i[j + 3] = n & k - h[j + 3] >> 31 | q & ~(k - h[j + 3] >> 31);
+	  i[j + 4] = n & l - h[j + 4] >> 31 | q & ~(l - h[j + 4] >> 31);
+	  i[j + 5] = n & m - h[j + 5] >> 31 | s & ~(m - h[j + 5] >> 31);
+	  i[j + 6] = n & k - h[j + 6] >> 31 | q & ~(k - h[j + 6] >> 31);
+	  i[j + 7] = n & l - h[j + 7] >> 31 | q & ~(l - h[j + 7] >> 31);
+	  i[j + 8] = n & m - h[j + 8] >> 31 | s & ~(m - h[j + 8] >> 31);
+	  i[j + 9] = n & k - h[j + 9] >> 31 | q & ~(k - h[j + 9] >> 31);
+	  i[j + 10] = n & l - h[j + 10] >> 31 | q & ~(l - h[j + 10] >> 31);
+	  i[j + 11] = n & m - h[j + 11] >> 31 | s & ~(m - h[j + 11] >> 31);
+	}
+  while (j < f)
+	;
+}
+}


Re: patch to fix PR84112

2018-01-30 Thread Rainer Orth
Hi Vladimir,

>   The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84112
>
>   The patch was successfully bootstrapped and tested on x86-64 and ppc64.
>
>   Committed as rev. 257204.
[...]
> Index: testsuite/ChangeLog
> ===
> --- testsuite/ChangeLog   (revision 257203)
> +++ testsuite/ChangeLog   (working copy)
> @@ -1,3 +1,8 @@
> +2018-01-30  Vladimir Makarov  
> +
> + PR target/84112
> + * pr84112.c: New.

this should be gcc.target/powerpc/pr84112.c instead.

> Index: testsuite/gcc.target/powerpc/pr84112.c

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: patch to fix PR84112

2018-01-30 Thread Vladimir Makarov

On 01/30/2018 03:37 PM, Rainer Orth wrote:

Hi Vladimir,


   The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84112

   The patch was successfully bootstrapped and tested on x86-64 and ppc64.

   Committed as rev. 257204.

[...]

Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog (revision 257203)
+++ testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2018-01-30  Vladimir Makarov  
+
+   PR target/84112
+   * pr84112.c: New.

this should be gcc.target/powerpc/pr84112.c instead.


Thank you for fixing my changelog entry.



[patch, fortran, committed] Fix PR 84133, regression with matmul

2018-01-30 Thread Thomas Koenig

Hello world,

I have just committed the attached patch as obvious, after regtesting.

Regards

Thomas

2018-01-30  Thomas Koenig  

PR fortran/84133
* frontend-passes (matmul_to_var_expr): Return early if
in association list.
(inline_matmul_assign): Likewise.

2018-01-30  Thomas Koenig  

PR fortran/84133
* gfortran.dg/inline_matmul_21.f90: New test case.

Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 257131)
+++ frontend-passes.c	(Arbeitskopie)
@@ -2763,7 +2763,7 @@ matmul_to_var_expr (gfc_expr **ep, int *walk_subtr
 return 0;
 
   if (forall_level > 0 || iterator_level > 0 || in_omp_workshare
-  || in_where)
+  || in_where || in_assoc_list)
 return 0;
 
   /* Check if this is already in the form c = matmul(a,b).  */
@@ -3728,7 +3728,7 @@ inline_matmul_assign (gfc_code **c, int *walk_subt
   if (co->op != EXEC_ASSIGN)
 return 0;
 
-  if (in_where)
+  if (in_where || in_assoc_list)
 return 0;
 
   /* The BLOCKS generated for the temporary variables and FORALL don't
! { dg-do compile }
! { dg-additional-options "-ffrontend-optimize" }
! PR 84133 - this used to ICE. Original test case by
! Gerhard Steinmetz.

program p
   real :: x(2,2) = 1.0
   real :: z(2,2)
   associate (y => matmul(x,x))
  z = y
   end associate
   print *, z
end

  


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-01-30 Thread Richard Sandiford
Alexandre Oliva  writes:
> On Jan 26, 2018, Jakub Jelinek  wrote:
>
>> Having to tweak debug info consumers so that they treat DW_LLE_* of 9
>> one way for .debug_loclist of version 5 and another way for .debug_loclist
>> of version 6 isn't a good idea.
>
> Maybe we don't have to do that.  The reason I implemented the proposed
> format was to have a reference implementation, but since it's an
> extension to a non-extensible part of DWARF5, it's hard to justify
> consumer implementations for that.
>
>> Why don't you emit the DW_LLE_GNU_view_pair stuff in .debug_loclists
>> already with -gdwarf-5, if needed paired without some TU attribute that says
>> that views are emitted?
>
> Because that would make the loclists unreadable for any DWARF5-compliant
> consumer.
>
>
>> Haven't looked for it it in the coding standards, but it is something
>> I've been asked to do in my code by various reviewers over the years and
>> what I've been asking others in my patch reviews from others too.
>
> Thanks.  It was the first (well, second; you'd requested similar changes
> another time before) I heard of that.  I'm still unconvinced the way I
> used is not compliant, but I'll keep that in mind.  Hopefully I won't
> run into someone who insists the other one (the one you reject) is the
> only correct one ;-)
>
>> I feel strongly about indenting stuff right,
>
> I'm with you on that, we just have different ideas about what "right"
> stands for ;-)
>
>> which if it wouldn't fit would be
>>   return verylongcondition
>> && otherlongcondition__;
>> rather than
>>   return verylongcondition
>>   && otherlongcondition__;
>> or similar, it would surprise me if it is not in the coding standard.
>
> I agree neither of these two forms is correct.
>
> But it is my understanding that both of the following are correct:
>
>   return (verylongcondition
>   && otherlongcondition__);
>
>   return verylongcondition
> && otherlongcondition__;
>
> The first, because the parenthesized expression is continued with
> indentation to match the parenthesis, the second because the return
> statement is continued with the correct indentation for the continuation
> of a statement.

Thought it had to be the first.  When they talk about indenting leading
operators, the conventions say:

  Insert extra parentheses so that Emacs will indent the code properly.

which at least implies that not inserting parantheses and indenting by
two spaces isn't "properly".

And I think most GCC code does stick to that (i.e. use the bracketed form).

Thanks,
Richard


Re: Link with correct values-*.o files on Solaris (PR target/40411)

2018-01-30 Thread Rainer Orth
Hi Jonathan,

> On 30/01/18 15:51 +0100, Rainer Orth wrote:
>>Hi Joseph,
>>
>>> On Tue, 30 Jan 2018, Rainer Orth wrote:
>>>
 So it seems the following snippet

 #define STARTFILE_ARCH_SPEC \
 [...]
  %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} 
 \

 seems like the right thing to do, as you said.
>>>
>>> That version would need updating when we add -std=c2x (once there's a C2x
>>> working draft with features being added to it).  If you use std=c* instead
>>> of separate std=c9* and std=c1* you'd avoid needing such a change - but
>>> then of course it would cover -std=c++* for C++.
>>
>>I know, that why I used the current form.  Admittedly it's a maintenance
>>problem (likely to be forgotten in the future).  If we add in -ansi, we
>>can just as well add -std=c++* (thus -std=c*), too.
>>
>>I guess that's the safest option overall, unless Jonathan has objections
>>from a C++ standards perspective.
>
> No objections from me, I'm happy either way.

thanks.  So I've opted to include -ansi for C++ backwards compatibility
and thus -std=c* for ease of maintenance.

Here's what I've commited after another round of testing.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-01-29  Rainer Orth  

PR target/40411
* config/sol2.h (STARTFILE_ARCH_SPEC): Use -std=c*,
-std=iso9899:199409 instead of -pedantic to select values-Xc.o.

# HG changeset patch
# Parent  53e957fef7f285d1a5bb84f370d5158d238b2ceb
Fix use of Solaris values-Xc.o (PR target/40411)

diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -180,7 +180,10 @@ along with GCC; see the file COPYING3.  
The values-X[ac].o objects set the variable _lib_version.  The Studio C
compilers use values-Xc.o with either -Xc or (since Studio 12.6)
-pedantic to select strictly conformant ISO C behaviour, otherwise
-   values-Xa.o.
+   values-Xa.o.  Since -pedantic is a diagnostic option only in GCC, we
+   need to specifiy the -std=c* options and -std=iso9899:199409.  We
+   traditionally include -ansi, which affects C and C++, and also -std=c++*
+   for consistency.
 
The values-xpg[46].o objects define either or both __xpg[46] variables,
selecting XPG4 mode (__xpg4) and conforming C99/SUSv3 behavior (__xpg6).
@@ -195,7 +198,7 @@ along with GCC; see the file COPYING3.  
 #undef STARTFILE_ARCH_SPEC
 #define STARTFILE_ARCH_SPEC \
   "%{!shared:%{!symbolic: \
- %{pedantic:values-Xc.o%s; :values-Xa.o%s} \
+ %{ansi|std=c*|std=iso9899\\:199409:values-Xc.o%s; :values-Xa.o%s} \
  %{std=c90|std=gnu90:values-xpg4.o%s; :values-xpg6.o%s}}}"
 
 #if defined(HAVE_LD_PIE) && defined(HAVE_SOLARIS_CRTS)


[Patch, Fortran, committed] fix bracket in dejagnu directive

2018-01-30 Thread Janus Weil
Hi all,

I have just committed an obvious patch that adds a closing bracket in
a dg-options directive in the testsuite as r257208:

Index: gcc/testsuite/gfortran.dg/pr68318_1.f90
===
--- gcc/testsuite/gfortran.dg/pr68318_1.f90(revision 257207)
+++ gcc/testsuite/gfortran.dg/pr68318_1.f90(working copy)
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O0"
+! { dg-options "-O0" }
 ! PR fortran/68318
 ! Original code submitted by Gerhard Steinmetz
 ! 


Cheers,
Janus


2018-01-30  Janus Weil  

* gfortran.dg/pr68318_1.f90: Add closing bracket in dejagnu directive.


Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR c++/83938)

2018-01-30 Thread François Dumont
Your change looks good, don't hesitate to re-submit a proper patch with 
the point 1 below fix. You should add gcc-patches@gcc.gnu.org when you 
do so.


Remaining questions will be:
Copyright assignment, your change seems limited enough to avoid it but 
you will need an official maintainer confirmation.


Stage 1: We are in a Fix-Only development phase. Your change brings such 
an important improvement that it could be considered as a bug fix but 
once again an official maintainer will have to decide. Otherwise you 
will have to wait for a couple of month to see you patch committed.


François

On 25/01/2018 23:37, chang jc wrote:

Hi:

1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
__len = (__len == 1) ? 0 : ((__len + 1) / 2);

2. The coding gain has been shown  PR c++/83938; I re-post here




   21
   20
   19
   18
   17
   16


   0.471136
   0.625695
   0.767262
   0.907461
   1.04838
   1.19508


   0.340845
   0.48651
   0.639139
   0.770133
   0.898454
   1.04632

it means when Merge [0, 4325376, 16777216); A is a sorted integer with
4325376 & B with 12451840 elements, total with 16M integers

The proposed method has the speed up under given buffer size =, ex
2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
given sizof(int)*64K bytes.

3. As your suggestion, _TmpBuf __buf should be rewrite.

4. It represents a fact that the intuitive idea to split from larger
part is wrong.

For example, if you have an input sorted array A & B, A has 8 integers
& B has 24 integers. Given tmp buffer whose capacity = 4 integers.

Current it tries to split from B, right?

Then we have:

A1 | A2  B1 | B2

B1 & B2 has 12 integers each, right?

Current algorithm selects pivot as 13th integer from B, right?

If the corresponding upper bound of A is 6th integer.

Then it split in

A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12

After rotate, we have two arrays to merge

[A1 = 5 | B1 = 12]  & [A2 = 3 | B2 = 12]

Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.

Sadly, [A1 = 5 | B1 = 12] CANNOT.

So we do rotate again, split & merge the two split arrays from [A1 = 5
| B1 = 12] again.


But wait, if we always split from the smaller one instead of larger one.

After rotate, it promises two split arrays both contain ceiling[small/2].

Since tmp buffer also allocate by starting from sizeof(small) &
recursively downgrade by ceiling[small/2^(# of fail allocate)].

It means the allocated tmp buffer promises to be sufficient at the
level of (# of fail allocate).

Instead, you can see if split from large at level (# of fail allocate)
several split array still CANNOT use  tmp buffer to do buffered merge.


As you know, buffered merge is far speed then (split, rotate, and
merge two sub-arrays) (PR c++/83938 gives the profiling figures),

the way should provide speedup.


Thanks.










On 24/01/2018 18:23, François Dumont wrote:

Hi


 It sounds like a very sensitive change to make but nothing worth figures.
Do you have any bench showing the importance of the gain ?

 At least the memory usage optimization is obvious.

On 19/01/2018 10:43, chang jc wrote:

Current std::inplace_merge() suffers from performance issue by inefficient

logic under limited memory,

It leads to performance downgrade.

Please help to review it.

Index: include/bits/stl_algo.h
===
--- include/bits/stl_algo.h(revision 256871)
+++ include/bits/stl_algo.h(working copy)
@@ -2437,7 +2437,7 @@
 _BidirectionalIterator __second_cut = __middle;
 _Distance __len11 = 0;
 _Distance __len22 = 0;
-  if (__len1 > __len2)
+  if (__len1 < __len2)
   {
 __len11 = __len1 / 2;
 std::advance(__first_cut, __len11);
@@ -2539,9 +2539,15 @@
 const _DistanceType __len1 = std::distance(__first, __middle);
 const _DistanceType __len2 = std::distance(__middle, __last);

+

 typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
_TmpBuf;

-  _TmpBuf __buf(__first, __last);
-
+  _BidirectionalIterator __start, __end;
+  if (__len1 < __len2) {
+__start = __first; __end = __middle;
+  } else {
+__start = __middle; __end = __last;
+  }
+  _TmpBuf __buf(__start, ___end);

Note another optimization, we could introduce a _Temporary_buffer<> constructor
in order to write:

_TmpBuf __buf(std::min(__len1, __len2), __first);

So that std::distance do not need to be called again.

 if (__buf.begin() == 0)
   std::__merge_without_buffer
 (__first, __middle, __last, __len1, __len2, __comp);
Index: include/bits/stl_tempbuf.h
===
--- include/bits/stl_tempbuf.h(revision 256871)
+++ include/bits/stl_tempbuf.h(working copy)
@@ -95,7 +95,7 @@
   std::nothrow));
 if (__tmp != 0)
   return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
-

[PR tree-optimization/84047] missing -Warray-bounds on an out-of-bounds index

2018-01-30 Thread Aldy Hernandez

Hi!

[Note: Jakub has mentioned that missing -Warray-bounds regressions 
should be punted to GCC 9.  I think this particular one is easy 
pickings, but if this and/or the rest of the -Warray-bounds regressions 
should be marked as GCC 9 material, please let me know so we can adjust 
all relevant PRs.]


This is a -Warray-bounds regression that happens because the IL now has 
an MEM_REF instead on ARRAY_REF.


Previously we had an ARRAY_REF we could diagnose:

  D.2720_5 = "12345678"[1073741824];

But now this is represented as:

  _1 = MEM[(const char *)"12345678" + 1073741824B];

I think we can just allow check_array_bounds() to handle MEM_REF's and 
everything should just work.


The attached patch fixes both regressions mentioned in the PR.

Tested on x86-64 Linux.

OK?
gcc/

	PR tree-optimization/84047
	* tree-vrp.c (search_for_addr_array): Handle ADDR_EXPRs.
	(check_array_bounds): Same.

diff --git a/gcc/testsuite/gcc.dg/pr84047.c b/gcc/testsuite/gcc.dg/pr84047.c
new file mode 100644
index 000..9f9c6ca4a9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr84047.c
@@ -0,0 +1,18 @@
+/* { dg-compile } */
+/* { dg-options "-O2 -Warray-bounds" } */
+
+int f (void)
+{
+  const char *s = "12345678";
+  int i = 1U << 30;
+  return s[i]; /* { dg-warning "array bounds" } */
+}
+
+char a[8];
+
+int g (void)
+{
+  const char *s = a + 4;
+  int i = 8;
+  return s[i];   /* { dg-warning "array bounds" } */
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 3af81f70872..07d39bab8a4 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -4922,14 +4922,15 @@ void
 vrp_prop::search_for_addr_array (tree t, location_t location)
 {
   /* Check each ARRAY_REFs in the reference chain. */
-  do
-{
-  if (TREE_CODE (t) == ARRAY_REF)
-	check_array_ref (location, t, true /*ignore_off_by_one*/);
+  if (TREE_CODE (t) == ADDR_EXPR)
+do
+  {
+	if (TREE_CODE (t) == ARRAY_REF)
+	  check_array_ref (location, t, true /*ignore_off_by_one*/);
 
-  t = TREE_OPERAND (t, 0);
-}
-  while (handled_component_p (t));
+	t = TREE_OPERAND (t, 0);
+  }
+while (handled_component_p (t));
 
   if (TREE_CODE (t) == MEM_REF
   && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR
@@ -5012,7 +5013,9 @@ check_array_bounds (tree *tp, int *walk_subtree, void *data)
   if (TREE_CODE (t) == ARRAY_REF)
 vrp_prop->check_array_ref (location, t, false /*ignore_off_by_one*/);
 
-  else if (TREE_CODE (t) == ADDR_EXPR)
+  else if (TREE_CODE (t) == ADDR_EXPR
+	   /* Handle (MEM_REF (ADDR_EXPR STRING_CST) INTEGER_CST).  */
+	   || TREE_CODE (t) == MEM_REF)
 {
   vrp_prop->search_for_addr_array (t, location);
   *walk_subtree = FALSE;


Re: [PATCH] PR 78534 Reinstate better string copy algorithm

2018-01-30 Thread Steve Kargl
OK.  Thought I already sent OK, but must of got side track with work.

-- 
steve

On Tue, Jan 30, 2018 at 06:25:31PM +0200, Janne Blomqvist wrote:
> PING
> 
> On Tue, Jan 23, 2018 at 3:31 PM, Janne Blomqvist
>  wrote:
> > As part of the change to larger character lengths, the string copy
> > algorithm was temporarily pessimized to get around some spurious
> > -Wstringop-overflow warnings.  Having tried a number of variations of
> > this algorithm I have managed to get it down to one spurious warning,
> > only with -O1 optimization, in the testsuite.  This patch reinstates
> > the optimized variant and modifies this one testcase to ignore the
> > warning.
> >
> > Regtested on x86_64-pc-linux-gnu, Ok for trunk?
> >
> > gcc/fortran/ChangeLog:
> >
> > 2018-01-23  Janne Blomqvist  
> >
> > PR 78534
> > * trans-expr.c (fill_with_spaces): Use memset instead of
> > generating loop.
> > (gfc_trans_string_copy): Improve opportunity to use builtins with
> > constant lengths.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-01-23  Janne Blomqvist  
> >
> > PR 78534
> > * gfortran.dg/allocate_deferred_char_scalar_1.f03: Prune
> > -Wstringop-overflow warnings due to spurious warning with -O1.
> > * gfortran.dg/char_cast_1.f90: Update dump scan pattern.
> > * gfortran.dg/transfer_intrinsic_1.f90: Likewise.
> > ---
> >  gcc/fortran/trans-expr.c   | 52 
> > --
> >  .../allocate_deferred_char_scalar_1.f03|  2 +
> >  gcc/testsuite/gfortran.dg/char_cast_1.f90  |  6 +--
> >  gcc/testsuite/gfortran.dg/transfer_intrinsic_1.f90 |  2 +-
> >  4 files changed, 35 insertions(+), 27 deletions(-)
> >
> > diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
> > index e90036f..4da6cee 100644
> > --- a/gcc/fortran/trans-expr.c
> > +++ b/gcc/fortran/trans-expr.c
> > @@ -6407,8 +6407,6 @@ fill_with_spaces (tree start, tree type, tree size)
> >tree i, el, exit_label, cond, tmp;
> >
> >/* For a simple char type, we can call memset().  */
> > -  /* TODO: This code does work and is potentially more efficient, but
> > - causes spurious -Wstringop-overflow warnings.
> >if (compare_tree_int (TYPE_SIZE_UNIT (type), 1) == 0)
> >  return build_call_expr_loc (input_location,
> > builtin_decl_explicit (BUILT_IN_MEMSET),
> > @@ -6416,7 +6414,6 @@ fill_with_spaces (tree start, tree type, tree size)
> > build_int_cst (gfc_get_int_type 
> > (gfc_c_int_kind),
> >lang_hooks.to_target_charset (' 
> > ')),
> > fold_convert (size_type_node, size));
> > -  */
> >
> >/* Otherwise, we use a loop:
> > for (el = start, i = size; i > 0; el--, i+= TYPE_SIZE_UNIT (type))
> > @@ -6522,11 +6519,20 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
> > dlength, tree dest,
> >
> >/* The string copy algorithm below generates code like
> >
> > - if (dlen > 0) {
> > - memmove (dest, src, min(dlen, slen));
> > - if (slen < dlen)
> > - memset(&dest[slen], ' ', dlen - slen);
> > - }
> > + if (destlen > 0)
> > +   {
> > + if (srclen < destlen)
> > +   {
> > + memmove (dest, src, srclen);
> > + // Pad with spaces.
> > + memset (&dest[srclen], ' ', destlen - srclen);
> > +   }
> > + else
> > +   {
> > + // Truncate if too long.
> > + memmove (dest, src, destlen);
> > +   }
> > +   }
> >*/
> >
> >/* Do nothing if the destination length is zero.  */
> > @@ -6555,21 +6561,16 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
> > dlength, tree dest,
> >else
> >  src = gfc_build_addr_expr (pvoid_type_node, src);
> >
> > -  /* First do the memmove. */
> > -  tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen,
> > - slen);
> > -  tmp2 = build_call_expr_loc (input_location,
> > - builtin_decl_explicit (BUILT_IN_MEMMOVE),
> > - 3, dest, src,
> > - fold_convert (size_type_node, tmp2));
> > -  stmtblock_t tmpblock2;
> > -  gfc_init_block (&tmpblock2);
> > -  gfc_add_expr_to_block (&tmpblock2, tmp2);
> > -
> > -  /* If the destination is longer, fill the end with spaces.  */
> > +  /* Truncate string if source is too long.  */
> >cond2 = fold_build2_loc (input_location, LT_EXPR, logical_type_node, 
> > slen,
> >dlen);
> >
> > +  /* Copy and pad with spaces.  */
> > +  tmp3 = build_call_expr_loc (input_location,
> > + builtin_decl_explicit (BUILT_IN_MEMMOVE),
> > + 3, dest, src,
> > + fold_convert (size_type_node, slen));
> > +
> >/* Wstringop-overflow appears a

[patch, fortran, committed] Fix PR 84134

2018-01-30 Thread Thomas Koenig

Hello world,

another obvious fix, this time for an ice-on-invalid regression,
committed after regression-testing.

Regards

Thomas

2017-01-30  Thomas Koenig  

PR fortran/84134
* array.c (gfc_ref_dimen_size): Whitespace fixes.  If stride is
zero, return false.

2017-01-30  Thomas Koenig  

PR fortran/84134
Index: array.c
===
--- array.c	(Revision 257131)
+++ array.c	(Arbeitskopie)
@@ -2245,9 +2245,12 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen,
   else
 	{
 	  stride_expr = gfc_copy_expr(ar->stride[dimen]); 
+
 	  if(!gfc_simplify_expr(stride_expr, 1))
 	gfc_internal_error("Simplification error");
-	  if (stride_expr->expr_type != EXPR_CONSTANT)
+
+	  if (stride_expr->expr_type != EXPR_CONSTANT
+	  || mpz_cmp_ui (stride_expr->value.integer, 0) == 0)
 	{
 	  mpz_clear (stride);
 	  return false;
! { dg-do compile }
! PR fortran/84134 - this used to ICE.
! Test case by Gerhard Steinmetz

program p
   integer :: i, x(3)
   data (x(i+1:i+2:i),i=0,1) /1,2,3/ ! { dg-error "Nonconstant array section" }
end


Re: Fix ipa-inline ICE

2018-01-30 Thread Jakub Jelinek
On Tue, Jan 30, 2018 at 09:05:31PM +0100, Rainer Orth wrote:
> > this patch fixed ICE that was introduced by Richard Standiford's change to 
> > reorder
> > can and want_inline predicates to reduce amount of work done to verify 
> > inlining limits.
> > This bypasses check that the function is optimized that makes inliner to 
> > ICE because
> > function summary is missing.
> >
> > This patch breaks out the expensive limits checking from can predicate to 
> > new one
> > which makes code bit more convoluted but I hope to clean things up next 
> > stage1.
> [...]
> > * g++.dg/torture/pr81360.C: New testcase
> 
> the new testcase comes out UNRESOLVED everywhere:
> 
> pr81360.C   -O0   scan-ipa-dump icf "Equal symbols: 0"
> +UNRESOLVED: g++.dg/torture/pr81360.C   -O1   scan-ipa-dump icf "Equal 
> symbols: 0"
> +UNRESOLVED: g++.dg/torture/pr81360.C   -O2   scan-ipa-dump icf "Equal 
> symbols: 0"
> +UNRESOLVED: g++.dg/torture/pr81360.C   -O2 -flto   scan-ipa-dump icf "Equal 
> symbols: 0"
> +UNRESOLVED: g++.dg/torture/pr81360.C   -O2 -flto -flto-partition=none   
> scan-ipa-dump icf "Equal symbols: 0"
> +UNRESOLVED: g++.dg/torture/pr81360.C   -O3 -g   scan-ipa-dump icf "Equal 
> symbols: 0"
> +UNRESOLVED: g++.dg/torture/pr81360.C   -Os   scan-ipa-dump icf "Equal 
> symbols: 0"
> 
> with
> 
> g++.dg/torture/pr81360.C   -O0  : dump file does not exist
> 
> in the log.  The following patch fixes that, tested with the appropriate
> runtest invocation.
> 
> I guess this is obvious?

Yes, but I wonder why the test is in g++.dg/torture/, that makes
no sense for a test with -O2 in dg-options which overrides all the
optimization options it cycles through.  Move it into g++.dg/ipa/ instead?

Jakub


[PATCH, rs6000] Fix VSX/altivec assumptions in altivec-13.c testcase

2018-01-30 Thread Will Schmidt
Hi,
  Some VSX function has previosly crept into the altivec-13 testcase.  In 
particular,
anything 'vector long long' and 'vector double', causing issues on platforms 
without
VSX support.   So, break that content out into it's own testcase, allowing 
better
testcase coverage.

Sniff-tested on Power6, altivec-13.c test now passes there, where it used to 
fail.

Still plan to do a sniff-test on AIX, but have expectations that this should 
help.

OK for trunk?

Thanks,
-Will

[testsuite]

2018-01-30  Will Schmidt  

* gcc.target/powerpc/altivec-13.c: Remove VSX-requiring built-ins.
* gcc.target/powerpc/vsx-13.c:  New.

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-13.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-13.c
index 2315f6e..31ff509 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-13.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-13.c
@@ -6,54 +6,47 @@
 
 /* This test case exercises intrinsic/argument combinations that,
while not in the Motorola AltiVec PIM, have nevertheless crept
into the AltiVec vernacular over the years.  */
 
+/* Tests requiring VSX support (vector long long and vector double) have
+   been moved over to vsx-13.c.  */
+
 #include 
 
-void foo (void) 
+void foo (void)
 {
   vector bool int boolVec1 = (vector bool int) vec_splat_u32(3);
   vector bool short boolVec2 = (vector bool short) vec_splat_u16(3);
   vector bool char boolVec3 = (vector bool char) vec_splat_u8(3);
   vector signed char vsc1, vsc2, vscz;
   vector unsigned char vuc1, vuc2, vucz;
   vector signed short int vssi1, vssi2, vssiz;
   vector signed int vsi1, vsi2, vsiz;
   vector unsigned int vui1, vui2, vuiz;
   vector unsigned short int vusi1, vusi2, vusiz;
-  vector bool long long vubll1, vubll2, vubllz;
-  vector signed int long long vsill1, vsill2, vsillz;
-  vector unsigned int long long vuill1, vuill2, vuillz;
   vector pixel vp1, vp2, vpz;
   vector float vf1, vf2, vfz;
-  vector double vd1, vd2, vdz;
   
   boolVec1 = vec_sld( boolVec1, boolVec1, 4 );
   boolVec2 = vec_sld( boolVec2, boolVec2, 2 );
   boolVec3 = vec_sld( boolVec3, boolVec3, 1 );
 
   vscz = vec_sld( vsc1, vsc2, 1 );
   vucz = vec_sld( vuc1, vuc2, 1 );
   vsiz = vec_sld( vsi1, vsi2, 1 );
   vuiz = vec_sld( vui1, vui2, 1 );
-  vubllz = vec_sld( vubll1, vubll2, 1 );
-  vsillz = vec_sld( vsill1, vsill2, 1 );
-  vuillz = vec_sld( vuill1, vuill2, 1 );
   vssiz = vec_sld( vssi1, vssi2, 1 );
   vusiz = vec_sld( vusi1, vusi2, 1 );
   
   vfz = vec_sld( vf1, vf2, 1 );
-  vdz = vec_sld( vd1, vd2, 1 );
 
   vpz = vec_sld( vp1, vp2, 1 );
 
   vucz = vec_srl(vuc1, vuc2);
   vsiz = vec_srl(vsi1, vuc2);
   vuiz = vec_srl(vui1, vuc2);
-  vsillz = vec_srl(vsill1, vuc2);
-  vuillz = vec_srl(vuill1, vuc2);
   vpz = vec_srl(vp1, vuc2);
   vssiz = vec_srl(vssi1, vuc2);
   vusiz = vec_srl(vusi1, vuc2);
 
   vscz = vec_sro(vsc1, vsc2);
@@ -62,14 +55,10 @@ void foo (void)
   vucz = vec_sro(vuc1, vuc2);
   vsiz = vec_sro(vsi1, vsc2);
   vsiz = vec_sro(vsi1, vuc2);
   vuiz = vec_sro(vui1, vsc2);
   vuiz = vec_sro(vui1, vuc2);
-  vsillz = vec_sro(vsill1, vsc2);
-  vsillz = vec_sro(vsill1, vuc2);
-  vuillz = vec_sro(vuill1, vsc2);
-  vuillz = vec_sro(vuill1, vuc2);
   vpz = vec_sro(vp1, vsc2);
   vpz = vec_sro(vp1, vuc2);
   vssiz = vec_sro(vssi1, vsc2);
   vssiz = vec_sro(vssi1, vuc2);
   vusiz = vec_sro(vusi1, vsc2);
@@ -81,8 +70,8 @@ void foo (void)
 /* Expected results:
vec_sld  vsldoi
vec_srl  vsr
vec_sro  vsro  */
 
-/* { dg-final { scan-assembler-times "vsldoi" 15 } } */
-/* { dg-final { scan-assembler-times "vsr " 8 } } */
-/* { dg-final { scan-assembler-times "vsro" 20 } } */
+/* { dg-final { scan-assembler-times "vsldoi" 11 } } */
+/* { dg-final { scan-assembler-times "vsr " 6 } } */
+/* { dg-final { scan-assembler-times "vsro" 16 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-13.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-13.c
new file mode 100644
index 000..5b4eb68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-13.c
@@ -0,0 +1,42 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mvsx" } */
+
+/* Variations of tests that require VSX support.  This is a variation of
+   the altivec-13.c testcase.  */
+
+#include 
+
+void foo (void)
+{
+
+  vector signed char vsc1, vsc2, vscz;
+  vector unsigned char vuc1, vuc2, vucz;
+  vector bool long long vubll1, vubll2, vubllz;
+  vector signed int long long vsill1, vsill2, vsillz;
+  vector unsigned int long long vuill1, vuill2, vuillz;
+  vector double vd1, vd2, vdz;
+
+  vubllz = vec_sld( vubll1, vubll2, 1 );
+  vsillz = vec_sld( vsill1, vsill2, 1 );
+  vuillz = vec_sld( vuill1, vuill2, 1 );
+
+  vsillz = vec_srl(vsill1, vuc2);
+  vuillz = vec_srl(vuill1, vuc2);
+
+  vsillz = vec_sro(vsill1, vsc2);
+  vsillz = vec_sro(vsill1, vuc2);
+  vuillz = vec_sro(vuill1, vsc2);
+  vuillz = vec_sro(vuill1, vuc2);
+
+  vdz = vec_sld( vd1, vd2, 1 );

[PATCH] Mark falign-*= as Optimization (PR c/84100)

2018-01-30 Thread Jakub Jelinek
Hi!

While -falign-{functions,jumps,labels,loops} are marked Optimization,
-falign-{functions,jumps,labels,loops}= are not.  They use the same Var(),
for that it makes no difference, but it means we reject them in optimize
attribute starting with r237174.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-01-30  Jakub Jelinek  

PR c/84100
* common.opt (falign-functions=, falign-jumps=, falign-labels=,
falign-loops=): Add Optimization flag.

* gcc.dg/pr84100.c: New test.

--- gcc/common.opt.jj   2018-01-18 21:11:58.664207128 +0100
+++ gcc/common.opt  2018-01-30 14:53:25.091842202 +0100
@@ -954,7 +954,7 @@ Common Report Var(align_functions,0) Opt
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions)
+Common RejectNegative Joined UInteger Var(align_functions) Optimization
 
 flimit-function-alignment
 Common Report Var(flag_limit_function_alignment) Optimization Init(0)
@@ -964,21 +964,21 @@ Common Report Var(align_jumps,0) Optimiz
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps)
+Common RejectNegative Joined UInteger Var(align_jumps) Optimization
 
 falign-labels
 Common Report Var(align_labels,0) Optimization UInteger
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels)
+Common RejectNegative Joined UInteger Var(align_labels) Optimization
 
 falign-loops
 Common Report Var(align_loops,0) Optimization UInteger
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops)
+Common RejectNegative Joined UInteger Var(align_loops) Optimization
 
 fargument-alias
 Common Ignore
--- gcc/testsuite/gcc.dg/pr84100.c.jj   2018-01-30 15:01:59.243703278 +0100
+++ gcc/testsuite/gcc.dg/pr84100.c  2018-01-30 15:00:30.569726685 +0100
@@ -0,0 +1,14 @@
+/* PR c/84100 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void bar (void);
+
+__attribute__((optimize ("align-loops=16", "align-jumps=16",
+"align-labels=16", "align-functions=16")))
+void
+foo (void)
+{  /* { dg-bogus "bad option" } */
+  for (int i = 0; i < 1024; ++i)
+bar ();
+}

Jakub


[PATCH] Fix -traditional-cpp preprocessing ICE (PR preprocessor/69869)

2018-01-30 Thread Jakub Jelinek
Hi!

This restores GCC 3.3 behavior on this testcase, before 3.4 started ICEing
on it when it started to use the common block comment skipping code for
-traditional-cpp and just a simple function for macro block comments has
been added.  Even the macro block comments can be not properly terminated
and we shouldn't crash on that, just diagnose it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-30  Jakub Jelinek  

PR preprocessor/69869
* traditional.c (skip_macro_block_comment): Return bool, true if
the macro block comment is unterminated.
(copy_comment): Use return value from skip_macro_block_comment instead
of always false.

* gcc.dg/cpp/trad/pr69869.c: New test.

--- libcpp/traditional.c.jj 2018-01-03 10:42:55.965763452 +0100
+++ libcpp/traditional.c2018-01-30 17:31:00.251952284 +0100
@@ -120,7 +120,7 @@ check_output_buffer (cpp_reader *pfile,
 
 /* Skip a C-style block comment in a macro as a result of -CC.
Buffer->cur points to the initial asterisk of the comment.  */
-static void
+static bool
 skip_macro_block_comment (cpp_reader *pfile)
 {
   const uchar *cur = pfile->buffer->cur;
@@ -131,10 +131,15 @@ skip_macro_block_comment (cpp_reader *pf
 
   /* People like decorating comments with '*', so check for '/'
  instead for efficiency.  */
-  while(! (*cur++ == '/' && cur[-2] == '*') )
-;
+  while (! (*cur++ == '/' && cur[-2] == '*'))
+if (cur[-1] == '\n')
+  {
+   pfile->buffer->cur = cur - 1;
+   return true;
+  }
 
   pfile->buffer->cur = cur;
+  return false;
 }
 
 /* CUR points to the asterisk introducing a comment in the current
@@ -158,7 +163,7 @@ copy_comment (cpp_reader *pfile, const u
 
   buffer->cur = cur;
   if (pfile->context->prev)
-unterminated = false, skip_macro_block_comment (pfile);
+unterminated = skip_macro_block_comment (pfile);
   else
 unterminated = _cpp_skip_block_comment (pfile);
 
--- gcc/testsuite/gcc.dg/cpp/trad/pr69869.c.jj  2018-01-30 17:41:41.309544998 
+0100
+++ gcc/testsuite/gcc.dg/cpp/trad/pr69869.c 2018-01-30 17:45:06.783501149 
+0100
@@ -0,0 +1,8 @@
+/* PR preprocessor/69869 */
+/* { dg-do preprocess } */
+/* { dg-options "-traditional-cpp" } */
+
+#define C(a,b)a/**/b
+C (foo/,**/)
+C (foo/,*)
+/* { dg-error "-:unterminated comment" "" {target "*-*-*"} .-1 } */

Jakub


[PATCH] Fix debug info after the fortran descriptor changes (PR debug/84131)

2018-01-30 Thread Jakub Jelinek
Hi!

DW_AT_data_location needs to be the data pointer, which is at offset 0,
but we were emitting a bogus +8 in that case (+4 for 32-bit targets).

With this patch, the change on -g -dA is:
-   .uleb128 0x4# DW_AT_data_location
+   .uleb128 0x2# DW_AT_data_location
.byte   0x97# DW_OP_push_object_address
-   .byte   0x23# DW_OP_plus_uconst
-   .uleb128 0x8
.byte   0x6 # DW_OP_deref
and gdb can debug the array in there again.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-30  Jakub Jelinek  

PR debug/84131
* trans-array.c (gfc_get_descriptor_offsets_for_info): Set *data_off
to DATA_FIELD's offset rather than OFFSET_FIELD's offset.

--- gcc/fortran/trans-array.c.jj2018-01-26 12:43:25.164922494 +0100
+++ gcc/fortran/trans-array.c   2018-01-30 19:34:01.844232363 +0100
@@ -511,7 +511,7 @@ gfc_get_descriptor_offsets_for_info (con
   tree type;
 
   type = TYPE_MAIN_VARIANT (desc_type);
-  field = gfc_advance_chain (TYPE_FIELDS (type), OFFSET_FIELD);
+  field = gfc_advance_chain (TYPE_FIELDS (type), DATA_FIELD);
   *data_off = byte_position (field);
   field = gfc_advance_chain (TYPE_FIELDS (type), DTYPE_FIELD);
   *dtype_off = byte_position (field);

Jakub


Re: [PATCH] Fix debug info after the fortran descriptor changes (PR debug/84131)

2018-01-30 Thread Steve Kargl
On Tue, Jan 30, 2018 at 11:53:40PM +0100, Jakub Jelinek wrote:
> 
> DW_AT_data_location needs to be the data pointer, which is at offset 0,
> but we were emitting a bogus +8 in that case (+4 for 32-bit targets).
> 
> With this patch, the change on -g -dA is:
> -   .uleb128 0x4# DW_AT_data_location
> +   .uleb128 0x2# DW_AT_data_location
> .byte   0x97# DW_OP_push_object_address
> -   .byte   0x23# DW_OP_plus_uconst
> -   .uleb128 0x8
> .byte   0x6 # DW_OP_deref
> and gdb can debug the array in there again.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 

Yes.  Thanks for looking at this!

-- 
Steve


[PATCH] correct -Wrestrict handling of arrays of arrays (PR 84095)

2018-01-30 Thread Martin Sebor

Testing GCC 8 with recent Linux kernel sources has uncovered
a bug in the handling of arrays of arrays by the -Wrestrict
checker where it fails to take references to different array
elements into consideration, issuing false positives.

The attached patch corrects this mistake.

In addition, to make warnings involving excessive offset bounds
more meaningful (less confusing), I've made a cosmetic change
to constrain them to the bounds of the accessed object.  I've
done this in response to multiple comments indicating that
the warnings are hard to interpret.  This change is meant to
be applied on top of the patch for bug 83698 (submitted mainly
to improve the readability of the offsets):

  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01488.html

Martin
PR middle-end/84095 - false-positive -Wrestrict warnings for memcpy within array

gcc/ChangeLog:

	PR middle-end/84095
	* gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Handle
	inner references.

gcc/testsuite/ChangeLog:

	PR middle-end/84095
	* c-c++-common/Warray-bounds-3.c: Adjust text of expected warnings.
	* c-c++-common/Wrestrict.c: Same.
	* gcc.dg/Wrestrict-6.c: Same.
	* gcc.dg/Wrestrict-8.c: New test.

diff --git a/gcc/gimple-ssa-warn-restrict.c b/gcc/gimple-ssa-warn-restrict.c
index 528eb5b..8bdf928 100644
--- a/gcc/gimple-ssa-warn-restrict.c
+++ b/gcc/gimple-ssa-warn-restrict.c
@@ -311,19 +311,40 @@ builtin_memref::builtin_memref (tree expr, tree size)
 
   if (TREE_CODE (expr) == ADDR_EXPR)
 {
-  poly_int64 off;
   tree op = TREE_OPERAND (expr, 0);
 
-  /* Determine the base object or pointer of the reference
-	 and its constant offset from the beginning of the base.  */
-  base = get_addr_base_and_unit_offset (op, &off);
+  /* Determine the base object or pointer of the reference and
+	 the constant bit offset from the beginning of the base.
+	 If the offset has a non-constant component, it will be in
+	 VAR_OFF.  MODE, SIGN, REVERSE, and VOL are write only and
+	 unused here.  */
+  poly_int64 bitsize, bitpos;
+  tree var_off;
+  machine_mode mode;
+  int sign, reverse, vol;
+  base = get_inner_reference (op, &bitsize, &bitpos, &var_off,
+  &mode, &sign, &reverse, &vol);
+
+  poly_int64 bytepos = exact_div (bitpos, BITS_PER_UNIT);
 
   HOST_WIDE_INT const_off;
-  if (base && off.is_constant (&const_off))
+  if (base && bytepos.is_constant (&const_off))
 	{
 	  offrange[0] += const_off;
 	  offrange[1] += const_off;
 
+	  if (var_off)
+	{
+	  if (TREE_CODE (var_off) == INTEGER_CST)
+		{
+		  offset_int cstoff = wi::to_offset (var_off);
+		  offrange[0] += cstoff;
+		  offrange[1] += cstoff;
+		}
+	  else
+		offrange[1] += maxobjsize;
+	}
+
 	  /* Stash the reference for offset validation.  */
 	  ref = op;
 
@@ -375,12 +396,6 @@ builtin_memref::builtin_memref (tree expr, tree size)
 	}
   }
 
-  if (DECL_P (base) && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE)
-{
-  if (offrange[0] < 0 && offrange[1] > 0)
-	offrange[0] = 0;
-}
-
   if (size)
 {
   tree range[2];
@@ -396,6 +411,29 @@ builtin_memref::builtin_memref (tree expr, tree size)
 }
   else
 sizrange[1] = maxobjsize;
+
+  tree basetype = TREE_TYPE (base);
+  if (DECL_P (base) && TREE_CODE (basetype) == ARRAY_TYPE)
+{
+  /* If the offset could be in range of the referenced object
+	 constrain its bounds so neither exceeds those of the obhect.  */
+  if (offrange[0] < 0 && offrange[1] > 0)
+	offrange[0] = 0;
+
+  offset_int maxoff = maxobjsize;
+  if (ref && array_at_struct_end_p (ref))
+	;   /* Use the maximum possible offset for last member arrays.  */
+  else if (tree basesize = TYPE_SIZE_UNIT (basetype))
+	maxoff = wi::to_offset (basesize);
+
+  if (offrange[0] >= 0)
+	{
+	  if (offrange[1] < 0)
+	offrange[1] = offrange[0] <= maxoff ? maxoff : maxobjsize;
+	  else if (offrange[0] <= maxoff && offrange[1] > maxoff)
+	offrange[1] = maxoff;
+	}
+}
 }
 
 /* Return error_mark_node if the signed offset exceeds the bounds
diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-3.c b/gcc/testsuite/c-c++-common/Warray-bounds-3.c
index 3445b95..2ee8146 100644
--- a/gcc/testsuite/c-c++-common/Warray-bounds-3.c
+++ b/gcc/testsuite/c-c++-common/Warray-bounds-3.c
@@ -61,7 +61,7 @@ void test_memcpy_bounds (char *d, const char *s, size_t n)
  they appear as large positive in the source.  It would be nice
  if they retained their type but unfortunately that's not how
  it works so be prepared for both in case it even gets fixed.  */
-  T (char, 1, a + UR (3, SIZE_MAX - 1), s, n);   /* { dg-warning "offset \\\[3, -2] is out of the bounds \\\[0, 1] of object" "memcpy" } */
+  T (char, 1, a + UR (3, SIZE_MAX - 1), s, n);   /* { dg-warning "offset \\\[3, -?\[0-9\]+] is out of the bounds \\\[0, 1] of object" "memcpy" } */
 
   /* Verify that invalid offsets into an array of unknown size are
  detected.  */
@@ -226,7 +226,7 @@ T (

Re: [PR tree-optimization/84047] missing -Warray-bounds on an out-of-bounds index

2018-01-30 Thread Martin Sebor

On 01/30/2018 03:11 PM, Aldy Hernandez wrote:

Hi!

[Note: Jakub has mentioned that missing -Warray-bounds regressions
should be punted to GCC 9.  I think this particular one is easy
pickings, but if this and/or the rest of the -Warray-bounds regressions
should be marked as GCC 9 material, please let me know so we can adjust
all relevant PRs.]


Let me first say that I have no concern with adopting this
patch for GCC 8 (and older) and deferring more comprehensive
solutions for the remaining regressions until GCC 9.

With that said, as we discussed last week in IRC, the patch
I submitted for one of these regressions, bug 83776, also
happens to fix one of the two test cases submitted for this
bug:

  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02144.html

I put the patch together because it's a P2 bug and has a Target
Milestone set to 6.5.  I took that to mean that it's expected
to be fixed in 8.1 in addition to 7.x and 6.x, but in light of
Jakub's reservations and Ady's question above I'm not sure my
understanding is correct, or if it is, that there necessarily
is consensus to fix them in GCC 8.

I also spent some time over the weekend extending my patch to
handle the additional test case from comment #2 on bug 84047.
That one is also due to the same underlying root cause.  I'm
close to done with it but I had to set it aside to handle
a bug in my own code that was reported yesterday.

I mention this because I too would find it helpful if it were
made clear whether these regressions are expected to be fixed
in GCC 8, so we can focus on the right bugs and avoid
duplicating effort.  If we want to fix all the reported
-Warray-bounds regressions in GCC 8 then I'll finish my patch
in progress and post it as a followup.

Martin



This is a -Warray-bounds regression that happens because the IL now has
an MEM_REF instead on ARRAY_REF.

Previously we had an ARRAY_REF we could diagnose:

  D.2720_5 = "12345678"[1073741824];

But now this is represented as:

  _1 = MEM[(const char *)"12345678" + 1073741824B];

I think we can just allow check_array_bounds() to handle MEM_REF's and
everything should just work.

The attached patch fixes both regressions mentioned in the PR.

Tested on x86-64 Linux.

OK?




Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround

2018-01-30 Thread Eric Botcazou
> I would like to commit the following comment about LEON3-FT errata now
> available in GCC-7.3.

Fine with me, thanks.

-- 
Eric Botcazou


Re: Fix LRA subreg calculation for big-endian targets

2018-01-30 Thread Eric Botcazou
> OK.  Makes me wonder how many big endian LRA targets are getting significant
> use.

Debian still has an active SPARC64 port now based on GCC 7 with LRA.

-- 
Eric Botcazou


New Finnish PO file for 'cpplib' (version 8.1-b20180128)

2018-01-30 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Finnish team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/fi.po

(This file, 'cpplib-8.1-b20180128.fi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-8.1-b20180128.fi.po'

2018-01-30 Thread Translation Project Robot


cpplib-8.1-b20180128.fi.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [PATCH] Mark falign-*= as Optimization (PR c/84100)

2018-01-30 Thread Joseph Myers
On Tue, 30 Jan 2018, Jakub Jelinek wrote:

> Hi!
> 
> While -falign-{functions,jumps,labels,loops} are marked Optimization,
> -falign-{functions,jumps,labels,loops}= are not.  They use the same Var(),
> for that it makes no difference, but it means we reject them in optimize
> attribute starting with r237174.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix -traditional-cpp preprocessing ICE (PR preprocessor/69869)

2018-01-30 Thread Joseph Myers
On Tue, 30 Jan 2018, Jakub Jelinek wrote:

> Hi!
> 
> This restores GCC 3.3 behavior on this testcase, before 3.4 started ICEing
> on it when it started to use the common block comment skipping code for
> -traditional-cpp and just a simple function for macro block comments has
> been added.  Even the macro block comments can be not properly terminated
> and we shouldn't crash on that, just diagnose it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK with the comment on skip_macro_block_comment updated to document the 
semantics of the return value.

-- 
Joseph S. Myers
jos...@codesourcery.com


Go patch committed: Function_type and Backend_function_type are not identical

2018-01-30 Thread Ian Lance Taylor
This patch by Cherry Zhang fixes the Go frontend so that Function_type
and Backend_function_type are not identical, since they have different
backend representations.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 257171)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-bbce8a9af264b25c5f70bafb2ce95d4fed158d68
+a347356d0f432cafb69f0cc5833d27663736a042
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 257069)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -4442,6 +4442,9 @@ Function_type::is_identical(const Functi
Cmp_tags cmp_tags, bool errors_are_identical,
std::string* reason) const
 {
+  if (this->is_backend_function_type() != t->is_backend_function_type())
+return false;
+
   if (!ignore_receiver)
 {
   const Typed_identifier* r1 = this->receiver();
Index: gcc/go/gofrontend/types.h
===
--- gcc/go/gofrontend/types.h   (revision 257069)
+++ gcc/go/gofrontend/types.h   (working copy)
@@ -2074,6 +2074,11 @@ class Function_type : public Type
   Btype*
   get_backend_fntype(Gogo*);
 
+  // Return whether this is a Backend_function_type.
+  virtual bool
+  is_backend_function_type() const
+  { return false; }
+
  protected:
   int
   do_traverse(Traverse*);
@@ -2167,6 +2172,12 @@ class Backend_function_type : public Fun
   : Function_type(receiver, parameters, results, location)
   { }
 
+  // Return whether this is a Backend_function_type. This overrides
+  // Function_type::is_backend_function_type.
+  bool
+  is_backend_function_type() const
+  { return true; }
+
  protected:
   Btype*
   do_get_backend(Gogo* gogo)


[PATCH] libgcc: xtensa: fix build with -mtext-section-literals

2018-01-30 Thread Max Filippov
libgcc/
2018-01-31  Max Filippov  

* config/xtensa/ieee754-df.S (__adddf3_aux): Add
.literal_position directive.
* config/xtensa/ieee754-sf.S (__addsf3_aux): Likewise.
---
 libgcc/config/xtensa/ieee754-df.S | 1 +
 libgcc/config/xtensa/ieee754-sf.S | 1 +
 2 files changed, 2 insertions(+)

diff --git a/libgcc/config/xtensa/ieee754-df.S 
b/libgcc/config/xtensa/ieee754-df.S
index 2662a6600751..a997c1b42632 100644
--- a/libgcc/config/xtensa/ieee754-df.S
+++ b/libgcc/config/xtensa/ieee754-df.S
@@ -55,6 +55,7 @@ __negdf2:
 
 #ifdef L_addsubdf3
 
+   .literal_position
/* Addition */
 __adddf3_aux:

diff --git a/libgcc/config/xtensa/ieee754-sf.S 
b/libgcc/config/xtensa/ieee754-sf.S
index d48b230a7588..695c67b0fc8f 100644
--- a/libgcc/config/xtensa/ieee754-sf.S
+++ b/libgcc/config/xtensa/ieee754-sf.S
@@ -55,6 +55,7 @@ __negsf2:
 
 #ifdef L_addsubsf3
 
+   .literal_position
/* Addition */
 __addsf3_aux:
 
-- 
2.1.4



Re: [PATCH][PR target/84064] Fix assertion with -fstack-clash-protection

2018-01-30 Thread Jeff Law
On 01/30/2018 12:23 AM, Uros Bizjak wrote:
> On Tue, Jan 30, 2018 at 5:48 AM, Jeff Law  wrote:
>>
>>
>>
>> stack-clash-protection will sometimes force a prologue to use pushes to
>> save registers.  We try to limit how often that happens as it constrains
>> options for generating efficient prologue sequences for common cases.
>>
>> We check the value of TO_ALLOCATE in ix86_compute_frame_layout and if
>> it's greater than the probing interval, then we force the prologue to
>> use pushes to save integer registers.  That is conservatively correct
>> and allows freedom in the most common cases.  This is good.
>>
>> In expand_prologue we assert that the integer registers were saved via a
>> push anytime we *might* generate a probe, even if the size of the
>> allocated frame is too small to require explicit probing.  Naturally,
>> this conflicts with the code in ix86_compute_frame_layout that tries to
>> avoid forcing pushes instead of moves.
>>
>> This patch moves the assertion inside expand_prologue down to the points
>> where it actually needs to be true.  Specifically when we're generating
>> probes in a loop for -fstack-check or -fstack-clash-protection.
>>
>> [ Probing via calls to chkstk_ms is not affected by this change. ]
>>
>> Sorry to have to change this stuff for the 3rd time!
> 
> Now you see how many paths stack frame formation code has ;)
Never any doubt about that...  When it became clear that we were going
to need to mitigate stack-clash with new prologue sequences I nearly
cried knowing how painful this would be, particularly across multiple
architectures.

The good news is that with it being turned on by default in F28 we're
finding some of the dusty corners...

Jeff


[PATCH][RFA][PR target/84128] Fix restore of scratch register used in probing loops

2018-01-30 Thread Jeff Law
Whee, fun, this appears to have been broken for a very long time,
possibly since the introduction of -fstack-check in 2010.  Thankfully it
only affects 32 bit and only in relatively constrained circumstances.

-fstack-check and -fstack-clash both potentially create a loop for stack
probing.  In that case they both need a scratch register to hold the
loop upper bound.

The code to allocate a scratch register first starts with the
caller-saved registers as they're zero cost.  Then it'll use any callee
saved register that is actually saved.  If neither is available (say
because all the caller-saved regs are used for parameter passing and
there are no callee saved register used), then the allocation routine
will push %eax on the stack and the deallocation routine will pop it off
to restore its value.

Of course there is a *stack allocation* between those two points.  So
the pop ends up restoring garbage into the register.

Obviously the restore routine needs to use reg+d addressing to get to
the stack slot and the allocated space needs to be deallocated by the
epilogue.  But sadly there's enough assertions sprinkled around that
prevent that from working as-is.

So what this patch does is continue to use the push to allocate the
register.  And it uses reg+d to restore the register after the probing
loops.  The "trick" is that the space allocated by the push becomes part
of the normal stack frame after we restore the scratch register's value.
 Ie, if we push a 4 byte register, then we reduce the size of the main
allocation request by 4 bytes.  And everything just works.

Clearly this code had not be exercised.  So I hacked up things so that
we always generated a probing loop and always assumed that we needed to
save/restore a scratch register and enabled stack clash protections by
default.  I bootstrapped that and compared testsuite runs against a run
which just had stack clash protection on by default.  That did turn up
an issue, but it was with my testing hacks, not with this patch :-)


I (of course) also did the usual bootstrap and regression tests, using
x86_64 and i686.  Hopefully this is the last iteration on the x86/x86_64
stack clash bits :-)

The one concern I have is do we need to tell the CFI machinery that
%eax's value was restored to its entry value?

OK for the trunk?  Or do we need some magic CFI bit to describe the
restore of %eax?

Thanks,
Jeff
PR target/84128
* config/i386/i386.c (release_scratch_register_on_entry): Add new
OFFSET argument.  Use it to restore the scratch register rathern
than a pop insn.
(ix86_adjust_stack_and_probe_stack_clash): Un-constify SIZE.
If we have to save a temporary register, decrement SIZE appropriately.
Pass SIZE as the offset to find the saved register into
release_scratch_register_on_entry.
(ix86_adjust_stack_and_probe): Likewise.
(ix86_emit_probe_stack_range): Likewise.


PR target/84128
* gcc.target/i386/pr84128.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fef34a1..93ce79c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12567,22 +12567,25 @@ get_scratch_register_on_entry (struct scratch_reg *sr)
 }
 }
 
-/* Release a scratch register obtained from the preceding function.  */
+/* Release a scratch register obtained from the preceding function.
+
+   Note there will always be some kind of stack adjustment between
+   allocation and releasing the scratch register.  So we can't just
+   pop the scratch register off the stack if we were forced to save it
+   (the stack pointer itself has a different value).
+
+   Instead we're passed the offset into the stack where the value will
+   be found and the space becomes part of the local frame that is
+   deallocated by the epilogue.  */
 
 static void
-release_scratch_register_on_entry (struct scratch_reg *sr)
+release_scratch_register_on_entry (struct scratch_reg *sr, HOST_WIDE_INT 
offset)
 {
   if (sr->saved)
 {
-  struct machine_function *m = cfun->machine;
-  rtx x, insn = emit_insn (gen_pop (sr->reg));
-
-  /* The RTX_FRAME_RELATED_P mechanism doesn't know about pop.  */
-  RTX_FRAME_RELATED_P (insn) = 1;
-  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (UNITS_PER_WORD));
-  x = gen_rtx_SET (stack_pointer_rtx, x);
-  add_reg_note (insn, REG_FRAME_RELATED_EXPR, x);
-  m->fs.sp_offset -= UNITS_PER_WORD;
+  rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset));
+  x = gen_rtx_SET (sr->reg, gen_rtx_MEM (word_mode, x));
+  emit_insn (x);
 }
 }
 
@@ -12597,7 +12600,7 @@ release_scratch_register_on_entry (struct scratch_reg 
*sr)
pushed on the stack.  */
 
 static void
-ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size,
+ix86_adjust_stack_and_probe_stack_clash (HOST_WIDE_INT size,
 const bool int_registers_saved)
 {
   struct machine_fun

Go patch committed: Use VIEW_CONVERT_EXPR where needed

2018-01-30 Thread Ian Lance Taylor
This patch to the Go frontend's GCC interface uses VIEW_CONVERT_EXPR
where needed.  The code was using fold_convert_loc in
constructor_expression and temporary_variable, which almost always
worked.  However, it failed in https://golang.org/issue/23606.  This
patch fixes the problem.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian


2018-01-30  Ian Lance Taylor  

* go-gcc.cc (Gcc_backend::convert_tree): New private method.
(Gcc_backend::constructor_expression): Call it.
(Gcc_backend::assignment_statement): Likewise.
(Gcc_backend::temporary_variable): Likewise.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 256593)
+++ gcc/go/go-gcc.cc(working copy)
@@ -541,6 +541,9 @@ class Gcc_backend : public Backend
   tree
   non_zero_size_type(tree);
 
+  tree
+  convert_tree(tree, tree, Location);
+
 private:
   void
   define_builtin(built_in_function bcode, const char* name, const char* 
libname,
@@ -1785,8 +1788,7 @@ Gcc_backend::constructor_expression(Btyp
   constructor_elt empty = {NULL, NULL};
   constructor_elt* elt = init->quick_push(empty);
   elt->index = field;
-  elt->value = fold_convert_loc(location.gcc_location(), TREE_TYPE(field),
-val);
+  elt->value = this->convert_tree(TREE_TYPE(field), val, location);
   if (!TREE_CONSTANT(elt->value))
is_constant = false;
 }
@@ -2055,29 +2057,7 @@ Gcc_backend::assignment_statement(Bfunct
 return this->compound_statement(this->expression_statement(bfn, lhs),
this->expression_statement(bfn, rhs));
 
-  // Sometimes the same unnamed Go type can be created multiple times
-  // and thus have multiple tree representations.  Make sure this does
-  // not confuse the middle-end.
-  if (TREE_TYPE(lhs_tree) != TREE_TYPE(rhs_tree))
-{
-  tree lhs_type_tree = TREE_TYPE(lhs_tree);
-  gcc_assert(TREE_CODE(lhs_type_tree) == TREE_CODE(TREE_TYPE(rhs_tree)));
-  if (POINTER_TYPE_P(lhs_type_tree)
- || INTEGRAL_TYPE_P(lhs_type_tree)
- || SCALAR_FLOAT_TYPE_P(lhs_type_tree)
- || COMPLEX_FLOAT_TYPE_P(lhs_type_tree))
-   rhs_tree = fold_convert_loc(location.gcc_location(), lhs_type_tree,
-   rhs_tree);
-  else if (TREE_CODE(lhs_type_tree) == RECORD_TYPE
-  || TREE_CODE(lhs_type_tree) == ARRAY_TYPE)
-   {
- gcc_assert(int_size_in_bytes(lhs_type_tree)
-== int_size_in_bytes(TREE_TYPE(rhs_tree)));
- rhs_tree = fold_build1_loc(location.gcc_location(),
-VIEW_CONVERT_EXPR,
-lhs_type_tree, rhs_tree);
-   }
-}
+  rhs_tree = this->convert_tree(TREE_TYPE(lhs_tree), rhs_tree, location);
 
   return this->make_statement(fold_build2_loc(location.gcc_location(),
   MODIFY_EXPR,
@@ -2507,6 +2487,43 @@ Gcc_backend::non_zero_size_type(tree typ
   gcc_unreachable();
 }
 
+// Convert EXPR_TREE to TYPE_TREE.  Sometimes the same unnamed Go type
+// can be created multiple times and thus have multiple tree
+// representations.  Make sure this does not confuse the middle-end.
+
+tree
+Gcc_backend::convert_tree(tree type_tree, tree expr_tree, Location location)
+{
+  if (type_tree == TREE_TYPE(expr_tree))
+return expr_tree;
+
+  if (type_tree == error_mark_node
+  || expr_tree == error_mark_node
+  || TREE_TYPE(expr_tree) == error_mark_node)
+return error_mark_node;
+
+  gcc_assert(TREE_CODE(type_tree) == TREE_CODE(TREE_TYPE(expr_tree)));
+  if (POINTER_TYPE_P(type_tree)
+  || INTEGRAL_TYPE_P(type_tree)
+  || SCALAR_FLOAT_TYPE_P(type_tree)
+  || COMPLEX_FLOAT_TYPE_P(type_tree))
+return fold_convert_loc(location.gcc_location(), type_tree, expr_tree);
+  else if (TREE_CODE(type_tree) == RECORD_TYPE
+  || TREE_CODE(type_tree) == ARRAY_TYPE)
+{
+  gcc_assert(int_size_in_bytes(type_tree)
+== int_size_in_bytes(TREE_TYPE(expr_tree)));
+  if (TYPE_MAIN_VARIANT(type_tree)
+ == TYPE_MAIN_VARIANT(TREE_TYPE(expr_tree)))
+   return fold_build1_loc(location.gcc_location(), NOP_EXPR,
+  type_tree, expr_tree);
+  return fold_build1_loc(location.gcc_location(), VIEW_CONVERT_EXPR,
+type_tree, expr_tree);
+}
+
+  gcc_unreachable();
+}
+
 // Make a global variable.
 
 Bvariable*
@@ -2717,8 +2734,7 @@ Gcc_backend::temporary_variable(Bfunctio
 }
 
   if (this->type_size(btype) != 0 && init_tree != NULL_TREE)
-DECL_INITIAL(var) = fold_convert_loc(location.gcc_location(), type_tree,
- init_tree);
+DECL_INITIAL(var) = this->convert_tree(type_tree, init_tree, location);
 
   if (is_address_taken)
 TREE_ADDRESSABLE(var) = 1;