Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Aldy Hernandez




The tricky part starts in the prologue for

   if (vr0->undefined_p ())
 {
   vr0->deep_copy (vr1);
   return;
 }

but yes, we probably can factor out a bit more common code
here.  I'll see to followup with more minor cleanups this
week (noticed a few details myself).


Like this?  (untested)


I would inline value_range_base::union_helper into 
value_range_base::union_, and remove all the undefined/varying/etc stuff 
from value_range::union_.


If should work because upon return from value_range_base::union_, in the 
this->undefined_p case, the base class will copy everything but the 
equivalences.  Then the derived union_ only has to nuke the equivalences 
if this->undefined or this->varying, and the equivalences' IOR just works.


For instance, in the case where other->undefined_p, there shouldn't be 
anything in the equivalences so the IOR won't copy anything to this as 
expected.  Similarly for this->varying_p.


In the case of other->varying, this will already been set to varying so 
neither this nor other should have anything in their equivalence fields, 
so the IOR won't do anything.


I think I covered all of them...the bitmap math should just work.  What 
do you think?


Finally, as I've hinted before, I think we need to be careful that any 
time we change state to VARYING / UNDEFINED from a base method, that the 
derived class is in a sane state (there are no equivalences set per the 
API contract).  This was not originally enforced in VRP, and I wouldn't 
be surprised if there are dragons if we enforce honesty.  I suppose, 
since we have an API, we could enforce this lazily: any time equiv() is 
called, clear the equivalences or return NULL if it's varying or 
undefined?  Just a thought.


Thanks for more cleanups in this area.
Aldy


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Richard Biener
On Mon, Nov 12, 2018 at 4:16 AM Andi Kleen  wrote:
>
> On Sun, Nov 11, 2018 at 10:06:21AM +0100, Richard Biener wrote:
> > That is, usually debuggers look for a location list of a variable
> > and find, say, %rax.  But for ptwrite the debugger needs to
> > examine all active location lists for, say, %rax and figure out
> > that it contains the value for variable 'a'?
>
> In dwarf output you end up with a list of
>
> start-IP...stop-IP ...  variable locations
>
> Both the original load/store and PTWRITE are in the same scope,
> and the debugger just looks it up based on the IP,
> so it all works without any extra modifications.

Yes, that's how I thought it would work.

> I even had an earlier version of this that instrumented
> assembler output of the compiler with PTWRITE in a separate script,
> and it worked fine too.

Apart from eventually messing up branch range restrictions I guess ;)

> >
> > When there isn't any such relation between the ptwrite stored
> > value and any variable the ptwrite is useless, right?
>
> A programmer might still be able to make use of it
> based on the context or the order.

OK.

> e.g. if you don't instrument everything, but only specific
> variables, or you only instrument arguments and returns or
> similar then it could be still useful just based on the IP->symbol
> resolution. If you instrument too many things yes it will be
> hard to use without debug info resolution.

Did you gather any statistics on how many ptwrite instructions
that are generated by your patch are not covered by any
location range & expr?  I assume ptwrite is writing from register
input only so you probably should avoid instrumenting writes
of constants (will require an extra register)?

How does the .text size behave say for cc1 when you enable
the various granularities of instrumentation?  How many
ptwrite instructions are there per 100 regular instructions?

> > I hope you don't mind if this eventually slips to GCC 10 given
> > as you say there is no HW available right now.  (still waiting
> > for a CPU with CET ...)
>
> :-/
>
> Actually there is.  Gemini Lake Atom Hardware with Goldmont Plus
> is shipping for some time and you can buy them.

Ah, interesting.

Can we get an updated patch based on my review?

I still think we should eventually move the pass later and somehow
avoid instrumenting places we'll not have any meaningful locations
in the debug info - if only to reduce required trace bandwith.

Thanks,
Richard.

> -Andi


Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Richard Biener
On Tue, 13 Nov 2018, Aldy Hernandez wrote:

> 
> > > The tricky part starts in the prologue for
> > > 
> > >if (vr0->undefined_p ())
> > >  {
> > >vr0->deep_copy (vr1);
> > >return;
> > >  }
> > > 
> > > but yes, we probably can factor out a bit more common code
> > > here.  I'll see to followup with more minor cleanups this
> > > week (noticed a few details myself).
> > 
> > Like this?  (untested)
> 
> I would inline value_range_base::union_helper into value_range_base::union_,
> and remove all the undefined/varying/etc stuff from value_range::union_.
> 
> If should work because upon return from value_range_base::union_, in the
> this->undefined_p case, the base class will copy everything but the
> equivalences.  Then the derived union_ only has to nuke the equivalences if
> this->undefined or this->varying, and the equivalences' IOR just works.
> 
> For instance, in the case where other->undefined_p, there shouldn't be
> anything in the equivalences so the IOR won't copy anything to this as
> expected.  Similarly for this->varying_p.
> 
> In the case of other->varying, this will already been set to varying so
> neither this nor other should have anything in their equivalence fields, so
> the IOR won't do anything.
> 
> I think I covered all of them...the bitmap math should just work.  What do you
> think?

I think the only case that will not work is the case when this->undefined
(when we need the deep copy).  Because we'll not get the bitmap from
other in that case.  So I've settled with the thing below (just
special-casing that very case)

> Finally, as I've hinted before, I think we need to be careful that any time we
> change state to VARYING / UNDEFINED from a base method, that the derived class
> is in a sane state (there are no equivalences set per the API contract).  This
> was not originally enforced in VRP, and I wouldn't be surprised if there are
> dragons if we enforce honesty.  I suppose, since we have an API, we could
> enforce this lazily: any time equiv() is called, clear the equivalences or
> return NULL if it's varying or undefined?  Just a thought.

I have updated ->update () to adjust equiv when we update to VARYING
or UNDEFINED.

Richard.

Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c (revision 266056)
+++ gcc/tree-ssanames.c (working copy)
@@ -401,7 +401,7 @@ set_range_info (tree name, enum value_ra
 /* Store range information for NAME from a value_range.  */
 
 void
-set_range_info (tree name, const value_range &vr)
+set_range_info (tree name, const value_range_base &vr)
 {
   wide_int min = wi::to_wide (vr.min ());
   wide_int max = wi::to_wide (vr.max ());
@@ -434,7 +434,7 @@ get_range_info (const_tree name, wide_in
in a value_range VR.  Returns the value_range_kind.  */
 
 enum value_range_kind
-get_range_info (const_tree name, value_range &vr)
+get_range_info (const_tree name, value_range_base &vr)
 {
   tree min, max;
   wide_int wmin, wmax;
Index: gcc/tree-ssanames.h
===
--- gcc/tree-ssanames.h (revision 266056)
+++ gcc/tree-ssanames.h (working copy)
@@ -69,11 +69,11 @@ struct GTY ((variable_size)) range_info_
 /* Sets the value range to SSA.  */
 extern void set_range_info (tree, enum value_range_kind, const wide_int_ref &,
const wide_int_ref &);
-extern void set_range_info (tree, const value_range &);
+extern void set_range_info (tree, const value_range_base &);
 /* Gets the value range from SSA.  */
 extern enum value_range_kind get_range_info (const_tree, wide_int *,
 wide_int *);
-extern enum value_range_kind get_range_info (const_tree, value_range &);
+extern enum value_range_kind get_range_info (const_tree, value_range_base &);
 extern void set_nonzero_bits (tree, const wide_int_ref &);
 extern wide_int get_nonzero_bits (const_tree);
 extern bool ssa_name_has_boolean_range (tree);
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 266056)
+++ gcc/tree-vrp.c  (working copy)
@@ -6084,125 +6084,73 @@ value_range::intersect (const value_rang
 }
 }
 
-/* Meet operation for value ranges.  Given two value ranges VR0 and
-   VR1, store in VR0 a range that contains both VR0 and VR1.  This
-   may not be the smallest possible such range.  */
-
-void
-value_range_base::union_ (const value_range_base *other)
-{
-  if (other->undefined_p ())
-{
-  /* this already has the resulting range.  */
-  return;
-}
+/* Helper for meet operation for value ranges.  Given two value ranges VR0 and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{
+  /* VR0 h

Re: cleanups and unification of value_range dumping code

2018-11-13 Thread Richard Biener
On Mon, Nov 12, 2018 at 10:50 AM Aldy Hernandez  wrote:
>
> I have rebased my value_range dumping patch after your value_range_base
> changes.
>
> I know you are not a fan of the gimple-pretty-print.c chunk, but I still
> think having one set of dumping code is preferable to catering to
> possible GC corruption while debugging.  If you're strongly opposed (as,
> I'm putting my foot down), I can remove it as well as the relevant
> pretty_printer stuff.

I'd say we do not want to change the gimple-pretty-print.c stuff also because
we'll miss the leading #.  I'd rather see a simple wide-int-range class
wrapping the interesting bits up.  I guess I'll come up with one then ;)

> The patch looks bigger than it is because I moved all the dump routines
> into one place.
>
> OK?
>
> p.s. After your changes, perhaps get_range_info(const_tree, value_range
> &) should take a value_range_base instead?

Yes, I missed that and am now testing this change.

Btw, the patch needs updating again (sorry).  If you leave out the
gimple-pretty-print.c stuff there's no requirement to use the pretty-printer
API, right?

Richard.


Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Richard Biener
On Mon, Nov 12, 2018 at 1:00 PM Alexandre Oliva  wrote:
>
> gnattools build machinery uses just-build xgcc and xg++ as $(CC) and
> $(CXX) in native builds.  However, if C and C++ languages are not
> enabled, it won't find them.  So, enable C and C++ if Ada is enabled.
> Most of the time, this is probably no big deal: C is always enabled
> anyway, and C++ is already enabled for bootstraps.
>
> We need not enable those for cross builds, however.  At first I just
> took the logic from gnattools/configure, but found it to be lacking:
> it would use the just-built tools even in cross-back settings, whose
> tools just built for the host would not run on the build machine.  So
> I've narrowed down the test to rely on autoconf-detected cross-ness
> (build->host only), but also to ensure that host matches build, and
> that target matches host.
>
> I've considered sourcing ada/config-lang.in from within
> gnattools/configure, and testing lang_requires as set by it, so as to
> avoid a duplication of tests that ought to remain in sync, but decided
> it would be too fragile, as ada/config-lang.in does not expect srcdir
> to refer to gnattools.
>
>
> Please let me know if there are objections to this change in the next
> few days, e.g., if enabling C and C++ for an Ada-only build is too
> onerous.  It is certainly possible to rework gnattools build machinery
> so that it uses CC and CXX as detected by the top-level configure if we
> can't find xgcc and xg++ in ../gcc.

I really wonder why we not _always_ do this for consistency given we
already require a host Ada compiler.  It really makes things more messy
than required...

>  At least in cross builds, we
> already require build-time Ada tools to have the same version as that
> we're cross-building, so we might as well use preexisting gcc and g++
> under the same requirements.
>
>
> for  gcc/ada/gcc-interface/ChangeLog
>
> PR ada/81878
> * config-lang.in (lang_requires): Set to "c c++" when
> gnattools wants it.
>
> for  gnattools/ChangeLog
>
> PR ada/81878
> * configure.ac (default_gnattools_target): Do not mistake
> just-built host tools as native in cross-back toolchains.
> * configure: Rebuilt.
> ---
>  gcc/ada/gcc-interface/config-lang.in |9 +
>  gnattools/configure  |   32 ++--
>  gnattools/configure.ac   |   30 +-
>  3 files changed, 52 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/ada/gcc-interface/config-lang.in 
> b/gcc/ada/gcc-interface/config-lang.in
> index 5dc77df282ce..8eacf7bb870e 100644
> --- a/gcc/ada/gcc-interface/config-lang.in
> +++ b/gcc/ada/gcc-interface/config-lang.in
> @@ -34,6 +34,15 @@ gtfiles="\$(srcdir)/ada/gcc-interface/ada-tree.h 
> \$(srcdir)/ada/gcc-interface/gi
>
>  outputs="ada/gcc-interface/Makefile ada/Makefile"
>
> +# gnattools native builds use both $(CC) and $(CXX), see PR81878.
> +# This is not too onerous: C is always enabled anyway, and C++ is
> +# always enabled for bootstrapping.  Use here the same logic used in
> +# gnattools/configure to decide whether to use -native or -cross tools
> +# for the build.
> +if test "x$cross_compiling/$build/$host" = "xno/$host/$target" ; then
> +  lang_requires="c c++"
> +fi
> +
>  target_libs="target-libada"
>  lang_dirs="gnattools"
>
> diff --git a/gnattools/configure b/gnattools/configure
> index ccb512e39b6b..c2d755b723a9 100755
> --- a/gnattools/configure
> +++ b/gnattools/configure
> @@ -584,6 +584,7 @@ PACKAGE_URL=
>  ac_unique_file="Makefile.in"
>  ac_subst_vars='LTLIBOBJS
>  LIBOBJS
> +default_gnattools_target
>  warn_cflags
>  OBJEXT
>  EXEEXT
> @@ -595,7 +596,6 @@ CC
>  ADA_CFLAGS
>  EXTRA_GNATTOOLS
>  TOOLS_TARGET_PAIRS
> -default_gnattools_target
>  LN_S
>  target_noncanonical
>  host_noncanonical
> @@ -2050,15 +2050,6 @@ $as_echo "no, using $LN_S" >&6; }
>  fi
>
>
> -# Determine what to build for 'gnattools'
> -if test $build = $target ; then
> -  # Note that build=target is almost certainly the wrong test; FIXME
> -  default_gnattools_target="gnattools-native"
> -else
> -  default_gnattools_target="gnattools-cross"
> -fi
> -
> -
>  # Target-specific stuff (defaults)
>  TOOLS_TARGET_PAIRS=
>
> @@ -2134,6 +2125,8 @@ esac
>  # From user or toplevel makefile.
>
>
> +# This is testing the CC passed from the toplevel Makefile, not the
> +# one we will select below.
>  ac_ext=c
>  ac_cpp='$CPP $CPPFLAGS'
>  ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
> @@ -2929,6 +2922,25 @@ if test "x$GCC" = "xyes"; then
>  fi
>
>
> +# Determine what to build for 'gnattools'.  Test after the above,
> +# because testing for CC sets the final value of cross_compiling, even
> +# if we end up using a different CC.  We want to build
> +# gnattools-native when: (a) this is a native build, i.e.,
> +# cross_compiling=no, otherwise we know we cannot run binaries
> +# produced by the toolchain used for the build, not even the binaries
> +#

[PATCH] Fix debug stmt handling in optimize_recip_sqrt (PR tree-optimization/87977)

2018-11-13 Thread Jakub Jelinek
Hi!

During analysis, we correctly ignore debug stmt uses, but if we don't
release the ssa name we stopped using, the debug stmts uses are left in the
IL.  The reset_debug_uses call is needed because the code modifies the
division stmt in place.  Perhaps better would be not to do that, i.e.
create a new division and gsi_replace it, then just the release_ssa_name
would be enough and the generic code could add e.g. the 1.0 / _1 replacement
for the debug stmt?  Though, in this particular case the sqrt call is
optimized away, so it wouldn't make a difference.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, or
should I do the gimple_build_assign + gsi_replace change?

2018-11-13  Jakub Jelinek  

PR tree-optimization/87977
* tree-ssa-math-opts.c (optimize_recip_sqrt): Call reset_debug_uses
on stmt before changing the lhs if !has_other_use.  Formatting fix.
Call release_ssa_name (x) if !has_other_use and !delete_div.

* gcc.dg/recip_sqrt_mult_1.c: Add -fcompare-debug to dg-options.
* gcc.dg/recip_sqrt_mult_2.c: Likewise.
* gcc.dg/recip_sqrt_mult_3.c: Likewise.
* gcc.dg/recip_sqrt_mult_4.c: Likewise.
* gcc.dg/recip_sqrt_mult_5.c: Likewise.

--- gcc/tree-ssa-math-opts.c.jj 2018-10-23 10:13:22.64096 +0200
+++ gcc/tree-ssa-math-opts.c2018-11-12 16:55:45.468734060 +0100
@@ -652,6 +652,8 @@ optimize_recip_sqrt (gimple_stmt_iterato
  print_gimple_stmt (dump_file, stmt, 0, TDF_NONE);
  fprintf (dump_file, "with new division\n");
}
+  if (!has_other_use)
+   reset_debug_uses (stmt);
   gimple_assign_set_lhs (stmt, sqr_ssa_name);
   gimple_assign_set_rhs2 (stmt, a);
   fold_stmt_inplace (def_gsi);
@@ -704,7 +706,7 @@ optimize_recip_sqrt (gimple_stmt_iterato
 
   gimple *new_stmt
= gimple_build_assign (x, MULT_EXPR,
-   orig_sqrt_ssa_name, sqr_ssa_name);
+  orig_sqrt_ssa_name, sqr_ssa_name);
   gsi_insert_after (def_gsi, new_stmt, GSI_NEW_STMT);
   update_stmt (stmt);
 }
@@ -715,6 +717,8 @@ optimize_recip_sqrt (gimple_stmt_iterato
   gsi_remove (&gsi2, true);
   release_defs (stmt);
 }
+  else
+release_ssa_name (x);
 }
 
 /* Look for floating-point divisions among DEF's uses, and try to
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c.jj 2018-09-11 18:12:24.876207127 
+0200
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c2018-11-12 16:57:53.301634244 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-recip" } */
+/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
 
 double res, res2, tmp;
 void
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c.jj 2018-09-11 18:12:24.876207127 
+0200
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c2018-11-12 16:58:01.250503686 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-options "-Ofast -fdump-tree-optimized -fcompare-debug" } */
 
 float
 foo (float a)
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c.jj 2018-09-11 18:12:24.876207127 
+0200
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c2018-11-12 16:58:12.302322133 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-options "-Ofast -fdump-tree-optimized -fcompare-debug" } */
 
 double
 foo (double a)
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c.jj 2018-09-11 18:12:24.877207110 
+0200
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c2018-11-12 16:58:35.748936997 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-recip" } */
+/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
 
 /* The main path doesn't have any multiplications.
Avoid introducing them in the recip pass.  */
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c.jj 2018-09-11 18:12:24.875207143 
+0200
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c2018-11-12 16:58:41.950835127 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-recip" } */
+/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
 
 /* We want to do the recip_sqrt transformations here there is already
a multiplication on the main path.  */

Jakub


Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-13 Thread Richard Biener
On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
 wrote:
>
>
> On 12/11/18 14:10, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
> >  wrote:
> >> On 09/11/18 12:18, Richard Biener wrote:
> >>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >>>  wrote:
>  Hi all,
> 
>  In this testcase the codegen for VLA SVE is worse than it could be due 
>  to unrolling:
> 
>  fully_peel_me:
>    mov x1, 5
>    ptrue   p1.d, all
>    whilelo p0.d, xzr, x1
>    ld1dz0.d, p0/z, [x0]
>    faddz0.d, z0.d, z0.d
>    st1dz0.d, p0, [x0]
>    cntdx2
>    addvl   x3, x0, #1
>    whilelo p0.d, x2, x1
>    beq .L1
>    ld1dz0.d, p0/z, [x0, #1, mul vl]
>    faddz0.d, z0.d, z0.d
>    st1dz0.d, p0, [x3]
>    cntwx2
>    incbx0, all, mul #2
>    whilelo p0.d, x2, x1
>    beq .L1
>    ld1dz0.d, p0/z, [x0]
>    faddz0.d, z0.d, z0.d
>    st1dz0.d, p0, [x0]
>  .L1:
>    ret
> 
>  In this case, due to the vector-length-agnostic nature of SVE the 
>  compiler doesn't know the loop iteration count.
>  For such loops we don't want to unroll if we don't end up eliminating 
>  branches as this just bloats code size
>  and hurts icache performance.
> 
>  This patch introduces a new unroll-known-loop-iterations-only param that 
>  disables cunroll when the loop iteration
>  count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for 
>  SVE VLA code, but it does help some
>  Advanced SIMD cases as well where loops with an unknown iteration count 
>  are not unrolled when it doesn't eliminate
>  the branches.
> 
>  So for the above testcase we generate now:
>  fully_peel_me:
>    mov x2, 5
>    mov x3, x2
>    mov x1, 0
>    whilelo p0.d, xzr, x2
>    ptrue   p1.d, all
>  .L2:
>    ld1dz0.d, p0/z, [x0, x1, lsl 3]
>    faddz0.d, z0.d, z0.d
>    st1dz0.d, p0, [x0, x1, lsl 3]
>    incdx1
>    whilelo p0.d, x1, x3
>    bne .L2
>    ret
> 
>  Not perfect still, but it's preferable to the original code.
>  The new param is enabled by default on aarch64 but disabled for other 
>  targets, leaving their behaviour unchanged
>  (until other target people experiment with it and set it, if 
>  appropriate).
> 
>  Bootstrapped and tested on aarch64-none-linux-gnu.
>  Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
>  performance.
> 
>  Ok for trunk?
> >>> Hum.  Why introduce a new --param and not simply key on
> >>> flag_peel_loops instead?  That is
> >>> enabled by default at -O3 and with FDO but you of course can control
> >>> that in your targets
> >>> post-option-processing hook.
> >> You mean like this?
> >> It's certainly a simpler patch, but I was just a bit hesitant of making 
> >> this change for all targets :)
> >> But I suppose it's a reasonable change.
> > No, that change is backward.  What I said is that peeling is already
> > conditional on
> > flag_peel_loops and that is enabled by -O3.  So you want to disable
> > flag_peel_loops for
> > SVE instead in the target.
>
> Sorry, I got confused by the similarly named functions.
> I'm talking about try_unroll_loop_completely when run as part of 
> canonicalize_induction_variables i.e. the "ivcanon" pass
> (sorry about blaming cunroll here). This doesn't get called through the 
> try_unroll_loops_completely path.

Well, peeling gets disabled.  From your patch I see you want to
disable "unrolling" when
the number of loop iteration is not constant.  That is called peeling
where we need to
emit the loop exit test N times.

Did you check your testcases with -fno-peel-loops?

> try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or 
> -fno-unroll-loops.
> Maybe disabling peeling inside try_unroll_loop_completely itself when 
> !flag_peel_loops is viable?
>
> Thanks,
> Kyrill
>
> >>> It might also make sense to have more fine-grained control for this
> >>> and allow a target
> >>> to say whether it wants to peel a specific loop or not when the
> >>> middle-end thinks that
> >>> would be profitable.
> >> Can be worth looking at as a follow-up. Do you envisage the target 
> >> analysing
> >> the gimple statements of the loop to figure out its cost?
> > Kind-of.  Sth like
> >
> >bool targetm.peel_loop (struct loop *);
> >
> > I have no idea whether you can easily detect a SVE vectorized loop though.
> > Maybe there's always a special IV or so (the mask?)
> >
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >

[PATCH] Fix simplify_merge_mask (PR rtl-optimization/87918)

2018-11-13 Thread Jakub Jelinek
Hi!

The BINARY_P predicate is true not just for arithmetic binary ops, but
for relational ones too; for the latter, we must not call
simplify_gen_binary, but simplify_gen_relational instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-11-13  Jakub Jelinek  

PR rtl-optimization/87918
* simplify-rtx.c (simplify_merge_mask): For COMPARISON_P, use
simplify_gen_relational rather than simplify_gen_binary.

* gcc.target/i386/pr87918.c: New test.

--- gcc/simplify-rtx.c.jj   2018-11-08 18:07:08.716852316 +0100
+++ gcc/simplify-rtx.c  2018-11-12 18:20:22.196580659 +0100
@@ -5647,9 +5647,19 @@ simplify_merge_mask (rtx x, rtx mask, in
   rtx top0 = simplify_merge_mask (XEXP (x, 0), mask, op);
   rtx top1 = simplify_merge_mask (XEXP (x, 1), mask, op);
   if (top0 || top1)
-   return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
-   top0 ? top0 : XEXP (x, 0),
-   top1 ? top1 : XEXP (x, 1));
+   {
+ if (COMPARISON_P (x))
+   return simplify_gen_relational (GET_CODE (x), GET_MODE (x),
+   GET_MODE (XEXP (x, 0)) != VOIDmode
+   ? GET_MODE (XEXP (x, 0))
+   : GET_MODE (XEXP (x, 1)),
+   top0 ? top0 : XEXP (x, 0),
+   top1 ? top1 : XEXP (x, 1));
+ else
+   return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
+   top0 ? top0 : XEXP (x, 0),
+   top1 ? top1 : XEXP (x, 1));
+   }
 }
   if (GET_RTX_CLASS (GET_CODE (x)) == RTX_TERNARY
   && VECTOR_MODE_P (GET_MODE (XEXP (x, 0)))
--- gcc/testsuite/gcc.target/i386/pr87918.c.jj  2018-11-12 18:17:49.075088126 
+0100
+++ gcc/testsuite/gcc.target/i386/pr87918.c 2018-11-12 18:18:36.239316318 
+0100
@@ -0,0 +1,14 @@
+/* PR rtl-optimization/87918 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+#include 
+
+__m128 b, c, d;
+
+void
+foo (float f)
+{
+  c = _mm_set_ss (f);
+  d = _mm_cmple_ss (c, b);
+}

Jakub


Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Eric Botcazou
> This has passed bootstrap and regtesting on powerpc64le-linux with no
> regressions.  Is this ok for mainline?
> 
> Peter
> 
> gcc/
>   PR rtl-optimization/87507
>   * lower-subreg.c (simple_move_operator): New function.
>   (simple_move): Strip simple operators.
>   (find_pseudo_copy): Likewise.
>   (resolve_simple_move): Strip simple operators and swap operands.
> 
> gcc/testsuite/
>   PR rtl-optimization/87507
>   * gcc.target/powerpc/pr87507.c: New test.
>   * gcc.target/powerpc/pr68805.c: Update expected results.

OK for mainline modulo the following nits:

> Index: gcc/lower-subreg.c
> ===
> --- gcc/lower-subreg.c(revision 265971)
> +++ gcc/lower-subreg.c(working copy)
> @@ -320,6 +320,24 @@ simple_move_operand (rtx x)
>return true;
>  }
> 
> +/* If X is an operator that can be treated as a simple move that we
> +   can split, then return the operand that is operated on.  */
> +
> +static rtx
> +simple_move_operator (rtx x)
> +{
> +  /* A word sized rotate of a register pair is equivalent to swapping
> + the registers in the register pair.  */
> +  if (GET_CODE (x) == ROTATE
> +  && GET_MODE (x) == twice_word_mode
> +  && simple_move_operand (XEXP (x, 0))
> +  && CONST_INT_P (XEXP (x, 1))
> +  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
> +return XEXP (x, 0);;
> +
> +  return NULL_RTX;
> +}
> +

Superfluous semi-colon.  Given that the function returns an operand, its name 
is IMO misleading, so maybe [get_]operand_for_simple_move_operator.

> @@ -876,6 +901,33 @@ resolve_simple_move (rtx set, rtx_insn *
> 
>real_dest = NULL_RTX;
> 
> +  if ((src_op = simple_move_operator (src)) != NULL_RTX)
> +{
> +  if (resolve_reg_p (dest))
> + {
> +   /* DEST is a CONCATN, so swap its operands and strip
> +  SRC's operator.  */
> +   rtx concatn = copy_rtx (dest);
> +   rtx op0 = XVECEXP (concatn, 0, 0);
> +   rtx op1 = XVECEXP (concatn, 0, 1);
> +   XVECEXP (concatn, 0, 0) = op1;
> +   XVECEXP (concatn, 0, 1) = op0;
> +   dest = concatn;
> +   src = src_op;
> + }
> +  else if (resolve_reg_p (src_op))
> + {
> +   /* SRC is an operation on a CONCATN, so strip the operator and
> +  swap the CONCATN's operands.  */
> +   rtx concatn = copy_rtx (src_op);
> +   rtx op0 = XVECEXP (concatn, 0, 0);
> +   rtx op1 = XVECEXP (concatn, 0, 1);
> +   XVECEXP (concatn, 0, 0) = op1;
> +   XVECEXP (concatn, 0, 1) = op0;
> +   src = concatn;
> + }
> +}
> +

Can we factor out the duplicate manipulation into a function here, for example 
resolve_operand_for_simple_move_operator?

-- 
Eric Botcazou


Re: [PATCH] Fix simplify_merge_mask (PR rtl-optimization/87918)

2018-11-13 Thread Eric Botcazou
> 2018-11-13  Jakub Jelinek  
> 
>   PR rtl-optimization/87918
>   * simplify-rtx.c (simplify_merge_mask): For COMPARISON_P, use
>   simplify_gen_relational rather than simplify_gen_binary.
> 
>   * gcc.target/i386/pr87918.c: New test.

OK, thanks.


-- 
Eric Botcazou


Re: [PATCH] Fix debug stmt handling in optimize_recip_sqrt (PR tree-optimization/87977)

2018-11-13 Thread Richard Biener
On Tue, 13 Nov 2018, Jakub Jelinek wrote:

> Hi!
> 
> During analysis, we correctly ignore debug stmt uses, but if we don't
> release the ssa name we stopped using, the debug stmts uses are left in the
> IL.  The reset_debug_uses call is needed because the code modifies the
> division stmt in place.  Perhaps better would be not to do that, i.e.
> create a new division and gsi_replace it, then just the release_ssa_name
> would be enough and the generic code could add e.g. the 1.0 / _1 replacement
> for the debug stmt?

Yeah, that's how we usually deal with this.  I hope the re-use of the
LHS is with the same value?

>  Though, in this particular case the sqrt call is
> optimized away, so it wouldn't make a difference.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, or
> should I do the gimple_build_assign + gsi_replace change?

I think that would be cleaner.

Richard.

> 2018-11-13  Jakub Jelinek  
> 
>   PR tree-optimization/87977
>   * tree-ssa-math-opts.c (optimize_recip_sqrt): Call reset_debug_uses
>   on stmt before changing the lhs if !has_other_use.  Formatting fix.
>   Call release_ssa_name (x) if !has_other_use and !delete_div.
> 
>   * gcc.dg/recip_sqrt_mult_1.c: Add -fcompare-debug to dg-options.
>   * gcc.dg/recip_sqrt_mult_2.c: Likewise.
>   * gcc.dg/recip_sqrt_mult_3.c: Likewise.
>   * gcc.dg/recip_sqrt_mult_4.c: Likewise.
>   * gcc.dg/recip_sqrt_mult_5.c: Likewise.
> 
> --- gcc/tree-ssa-math-opts.c.jj   2018-10-23 10:13:22.64096 +0200
> +++ gcc/tree-ssa-math-opts.c  2018-11-12 16:55:45.468734060 +0100
> @@ -652,6 +652,8 @@ optimize_recip_sqrt (gimple_stmt_iterato
> print_gimple_stmt (dump_file, stmt, 0, TDF_NONE);
> fprintf (dump_file, "with new division\n");
>   }
> +  if (!has_other_use)
> + reset_debug_uses (stmt);
>gimple_assign_set_lhs (stmt, sqr_ssa_name);
>gimple_assign_set_rhs2 (stmt, a);
>fold_stmt_inplace (def_gsi);
> @@ -704,7 +706,7 @@ optimize_recip_sqrt (gimple_stmt_iterato
>  
>gimple *new_stmt
>   = gimple_build_assign (x, MULT_EXPR,
> - orig_sqrt_ssa_name, sqr_ssa_name);
> +orig_sqrt_ssa_name, sqr_ssa_name);
>gsi_insert_after (def_gsi, new_stmt, GSI_NEW_STMT);
>update_stmt (stmt);
>  }
> @@ -715,6 +717,8 @@ optimize_recip_sqrt (gimple_stmt_iterato
>gsi_remove (&gsi2, true);
>release_defs (stmt);
>  }
> +  else
> +release_ssa_name (x);
>  }
>  
>  /* Look for floating-point divisions among DEF's uses, and try to
> --- gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c.jj   2018-09-11 
> 18:12:24.876207127 +0200
> +++ gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c  2018-11-12 16:57:53.301634244 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-recip" } */
> +/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
>  
>  double res, res2, tmp;
>  void
> --- gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c.jj   2018-09-11 
> 18:12:24.876207127 +0200
> +++ gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c  2018-11-12 16:58:01.250503686 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-optimized" } */
> +/* { dg-options "-Ofast -fdump-tree-optimized -fcompare-debug" } */
>  
>  float
>  foo (float a)
> --- gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c.jj   2018-09-11 
> 18:12:24.876207127 +0200
> +++ gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c  2018-11-12 16:58:12.302322133 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-optimized" } */
> +/* { dg-options "-Ofast -fdump-tree-optimized -fcompare-debug" } */
>  
>  double
>  foo (double a)
> --- gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c.jj   2018-09-11 
> 18:12:24.877207110 +0200
> +++ gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c  2018-11-12 16:58:35.748936997 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-recip" } */
> +/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
>  
>  /* The main path doesn't have any multiplications.
> Avoid introducing them in the recip pass.  */
> --- gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c.jj   2018-09-11 
> 18:12:24.875207143 +0200
> +++ gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c  2018-11-12 16:58:41.950835127 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-recip" } */
> +/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
>  
>  /* We want to do the recip_sqrt transformations here there is already
> a multiplication on the main path.  */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [Ada] Fix wrong code for loops with convoluted control flow

2018-11-13 Thread Eric Botcazou
> Tested on x86_64-suse-linux, applied on the mainline, 8 and 7 branches.
> 
> 
> 2018-11-08  Eric Botcazou  
> 
>   * fe.h (Suppress_Checks): Declare.
>   * gcc-interface/misc.c (gnat_init_gcc_eh): Set -fnon-call-exceptions
>   only if checks are not suppressed and -faggressive-loop-optimizations
>   only if they are.
>   * gcc-interface/trans.c (struct loop_info_d): Remove has_checks and
>   warned_aggressive_loop_optimizations fields.
>   (gigi): Do not clear warn_aggressive_loop_optimizations here.
>   (Raise_Error_to_gnu): Do not set has_checks.
>   (gnat_to_gnu) : Remove support for aggressive
>   loop optimizations.

The following small adjustment is needed on some platforms.  Applied.


* gcc-interface/misc.c (gnat_init_gcc_eh): Set -fnon-call-exceptions
for the runtime on platforms where System.Machine_Overflow is true.

-- 
Eric BotcazouIndex: gcc-interface/misc.c
===
--- gcc-interface/misc.c	(revision 266029)
+++ gcc-interface/misc.c	(working copy)
@@ -405,10 +405,15 @@ gnat_init_gcc_eh (void)
  as permitted in Ada.
  Turn off -faggressive-loop-optimizations because it may optimize away
  out-of-bound array accesses that we want to be able to catch.
- If checks are disabled, we use the same settings as the C++ compiler.  */
+ If checks are disabled, we use the same settings as the C++ compiler,
+ except for the runtime on platforms where S'Machine_Overflow is true
+ because the runtime depends on FP (hardware) checks being properly
+ handled despite being compiled in -gnatp mode.  */
   flag_exceptions = 1;
   flag_delete_dead_exceptions = 1;
-  if (!Suppress_Checks)
+  if (Suppress_Checks)
+flag_non_call_exceptions = Machine_Overflows_On_Target && GNAT_Mode;
+  else
 {
   flag_non_call_exceptions = 1;
   flag_aggressive_loop_optimizations = 0;


[PATCH, csky] Update linux-unwind.h

2018-11-13 Thread 瞿仙淼
Hi, 
I have submitted a patch to update linux-unwind for C-SKY


Index: libgcc/ChangeLog
===
--- libgcc/ChangeLog(revision 266056)
+++ libgcc/ChangeLog(working copy)
@@ -1,3 +1,9 @@
+2018-11-13  Xianmiao Qu  
+
+   * config/csky/linux-unwind.h (_sig_ucontext_t): Remove.
+   (csky_fallback_frame_state): Modify the check of the 
+   instructions to adapt to changes in the kernel
+
 2018-11-09  Stafford Horne  
Richard Henderson  
 
Index: libgcc/config/csky/linux-unwind.h
===
--- libgcc/config/csky/linux-unwind.h   (revision 266056)
+++ libgcc/config/csky/linux-unwind.h   (working copy)
@@ -25,23 +25,34 @@
 
 #ifndef inhibit_libc
 
-/* Do code reading to identify a signal frame, and set the frame
-   state data appropriately.  See unwind-dw2.c for the structs.  */
+/*
+ * Do code reading to identify a signal frame, and set the frame state data
+ * appropriately.  See unwind-dw2.c for the structs.
+ */
 
 #include 
 #include 
 
-/* The third parameter to the signal handler points to something with
-   this structure defined in asm/ucontext.h, but the name clashes with
-   struct ucontext from sys/ucontext.h so this private copy is used.  */
-typedef struct _sig_ucontext {
-  unsigned long uc_flags;
-  struct _sig_ucontext  *uc_link;
-  stack_t   uc_stack;
-  struct sigcontext uc_mcontext;
-  sigset_t  uc_sigmask;
-} _sig_ucontext_t;
+#define TRAP0_V1 0x0008
+#define MOVI_R1_119_V1 (0x6000 + (119 << 4) + 1)
+#define MOVI_R1_127_V1 (0x6000 + (127 << 4) + 1)
+#define ADDI_R1_32_V1 (0x2000 + (31 << 4) + 1)
+#define ADDI_R1_14_V1 (0x2000 + (13 << 4) + 1)
+#define ADDI_R1_12_V1 (0x2000 + (11 << 4) + 1)
 
+#define TRAP0_V2_PART0 0xc000
+#define TRAP0_V2_PART1 0x2020
+#define MOVI_R7_119_V2_PART0 0xea07
+#define MOVI_R7_119_V2_PART1 119
+#define MOVI_R7_173_V2_PART0 0xea07
+#define MOVI_R7_173_V2_PART1 173
+#define MOVI_R7_139_V2_PART0 0xea07
+#define MOVI_R7_139_V2_PART1 139
+
+#define sc_pt_regs(x) (sc->sc_##x)
+#define sc_pt_regs_lr (sc->sc_r15)
+#define sc_pt_regs_tls(x) (sc->sc_exregs[15])
+
 #define MD_FALLBACK_FRAME_STATE_FOR csky_fallback_frame_state
 
 static _Unwind_Reason_Code
@@ -54,78 +65,66 @@ csky_fallback_frame_state (struct _Unwind_Context
   int i;
 
   /* movi r7, __NR_rt_sigreturn; trap 0  */
-  if ((*(pc+0) == 0xea07) && (*(pc+1) == 119)
-  && (*(pc+2) == 0xc000) &&  (*(pc+3) == 0x2020))
-  {
-struct sigframe
+  if ((*(pc + 0) == MOVI_R7_139_V2_PART0)
+  && (*(pc + 1) == MOVI_R7_139_V2_PART1) && (*(pc + 2) == TRAP0_V2_PART0)
+  && (*(pc + 3) == TRAP0_V2_PART1))
 {
-  int sig;
-  int code;
-  struct sigcontext *psc;
-  unsigned long extramask[2]; /* _NSIG_WORDS */
-  struct sigcontext sc;
-} *_rt = context->cfa;
-sc = _rt->psc; // &(_rt->sc);
-  }
-  /* movi r7, __NR_rt_sigreturn; trap 0  */
-  else if ((*(pc+0) == 0xea07) && (*(pc+1) == 173)
-  && (*(pc+2) == 0xc000) &&  (*(pc+3) == 0x2020))
-  {
-struct rt_sigframe
-{
-  int sig;
-  struct siginfo *pinfo;
-  void* puc;
-  siginfo_t info;
-  struct ucontext uc;
-} *_rt = context->cfa;
-sc = &(_rt->uc.uc_mcontext);
-  }
+  struct rt_sigframe
+  {
+   int sig;
+   struct siginfo *pinfo;
+   void *puc;
+   siginfo_t info;
+   ucontext_t uc;
+  } *_rt = context->cfa;
+  sc = &(_rt->uc.uc_mcontext);
+}
   else
 return _URC_END_OF_STACK;
 
-  new_cfa = (_Unwind_Ptr) sc->sc_usp;
+  new_cfa = (_Unwind_Ptr) sc_pt_regs (usp);
   fs->regs.cfa_how = CFA_REG_OFFSET;
   fs->regs.cfa_reg = STACK_POINTER_REGNUM;
   fs->regs.cfa_offset = new_cfa - (_Unwind_Ptr) context->cfa;
 
   fs->regs.reg[0].how = REG_SAVED_OFFSET;
-  fs->regs.reg[0].loc.offset = (_Unwind_Ptr)&(sc->sc_a0) - new_cfa;
+  fs->regs.reg[0].loc.offset = (_Unwind_Ptr) & sc_pt_regs (a0) - new_cfa;
 
   fs->regs.reg[1].how = REG_SAVED_OFFSET;
-  fs->regs.reg[1].loc.offset = (_Unwind_Ptr)&(sc->sc_a1) - new_cfa;
+  fs->regs.reg[1].loc.offset = (_Unwind_Ptr) & sc_pt_regs (a1) - new_cfa;
 
   fs->regs.reg[2].how = REG_SAVED_OFFSET;
-  fs->regs.reg[2].loc.offset = (_Unwind_Ptr)&(sc->sc_a2) - new_cfa;
+  fs->regs.reg[2].loc.offset = (_Unwind_Ptr) & sc_pt_regs (a2) - new_cfa;
 
   fs->regs.reg[3].how = REG_SAVED_OFFSET;
-  fs->regs.reg[3].loc.offset = (_Unwind_Ptr)&(sc->sc_a3) - new_cfa;
+  fs->regs.reg[3].loc.offset = (_Unwind_Ptr) & sc_pt_regs (a3) - new_cfa;
 
   for (i = 4; i < 14; i++)
 {
   fs->regs.reg[i].how = REG_SAVED_OFFSET;
-  fs->regs.reg[i].loc.offset = ((_Unwind_Ptr)&(sc->sc_regs[i - 4])
-   - new_cfa);
+  fs->regs.reg[i].loc.offset =
+   (_Unwind_Ptr) & sc_pt_regs (regs[i - 4]) - new_cfa;
 }
 
-  for (i = 16; i < 32; i++)
+  for (i = 16; i < 31; i++)
 {
   fs->regs.reg[i].how = REG_SAVED_OFFSET;

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-13 Thread Kyrill Tkachov

Hi Richard,

On 13/11/18 08:24, Richard Biener wrote:

On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
 wrote:


On 12/11/18 14:10, Richard Biener wrote:

On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
 wrote:

On 09/11/18 12:18, Richard Biener wrote:

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:

Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
   mov x1, 5
   ptrue   p1.d, all
   whilelo p0.d, xzr, x1
   ld1dz0.d, p0/z, [x0]
   faddz0.d, z0.d, z0.d
   st1dz0.d, p0, [x0]
   cntdx2
   addvl   x3, x0, #1
   whilelo p0.d, x2, x1
   beq .L1
   ld1dz0.d, p0/z, [x0, #1, mul vl]
   faddz0.d, z0.d, z0.d
   st1dz0.d, p0, [x3]
   cntwx2
   incbx0, all, mul #2
   whilelo p0.d, x2, x1
   beq .L1
   ld1dz0.d, p0/z, [x0]
   faddz0.d, z0.d, z0.d
   st1dz0.d, p0, [x0]
.L1:
   ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
   mov x2, 5
   mov x3, x2
   mov x1, 0
   whilelo p0.d, xzr, x2
   ptrue   p1.d, all
.L2:
   ld1dz0.d, p0/z, [x0, x1, lsl 3]
   faddz0.d, z0.d, z0.d
   st1dz0.d, p0, [x0, x1, lsl 3]
   incdx1
   whilelo p0.d, x1, x3
   bne .L2
   ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

You mean like this?
It's certainly a simpler patch, but I was just a bit hesitant of making this 
change for all targets :)
But I suppose it's a reasonable change.

No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.

Sorry, I got confused by the similarly named functions.
I'm talking about try_unroll_loop_completely when run as part of 
canonicalize_induction_variables i.e. the "ivcanon" pass
(sorry about blaming cunroll here). This doesn't get called through the 
try_unroll_loops_completely path.

Well, peeling gets disabled.  From your patch I see you want to
disable "unrolling" when
the number of loop iteration is not constant.  That is called peeling
where we need to
emit the loop exit test N times.

Did you check your testcases with -fno-peel-loops?


-fno-peel-loops doesn't help in the testcases. The code that does this peeling 
(try_unroll_loop_completely)
can be called through two paths, only one of which is gated on flag_peel_loops.

Thanks,
Kyrill



try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or 
-fno-unroll-loops.
Maybe disabling peeling inside try_unroll_loop_completely itself when 
!flag_peel_loops is viable?

Thanks,
Kyrill


It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Can be worth looking at as a follow-up. Do you envisage the target analysing
the gimple statements of the loop to figure out its cost?

Kind-of.  Sth like

bool targetm.peel_loop (struct loop *);

I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)

Richard.


Thanks,
Kyrill


2018-11-09  Kyrylo Tkachov  

  * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
  loop when number of iterations is not known and flag_peel_loops is in
  effect.

2018-11-09  Kyrylo Tkachov  

  * gcc.target/aarch64/sve/unroll-1.c: New test.





Re: Simplify enumerate and array types

2018-11-13 Thread Richard Biener
On Sun, 11 Nov 2018, Jan Hubicka wrote:

> Hi,
> this patch should be last patch for type simplification (modulo possible bits
> that needs clearing I still notice).  It does the following
>  1) enables the patch to simplify aggregates also for enums.
> While this does not affect C++, in C we want pointers to incomplete
> and complete variant of enums the same.
>  2) turns arrays to arrays of incomplete type. This is used for pointers
> to arrays
>  3) turns arrays to arrays of simplified type. This is used for arrays of
> pointers to agreggates.
> 
> The patch extends fld_type_variant_equal_p and fld_type_variant to live with
> fact that arrays may have different TREE_TYPE within the TYPE_NEXT_VARIANT
> lists and also introduced simplified_type_hash that holds the simplified 
> arrays.
> If build_array_type_1 was able to share arrays with TYPELESS_STORAGE flag
> this hash would not be necessary. But I am unsure why that is done in there.
> 
> With this there are cca 9k types in Firefox (out of 80k) with duplicates and
> 20k type duplicates overall.  I inspected manually some of them and the 
> reasons
> can be tracked down to differences in VAR_DECL flags for vtables and
> inherently by referring such types by TYPE_CONTEXT or as a field.
> I will try to cleanup visibilities next stage1, but I think practically this
> is quite non-critical as it would be very hard to make number of copies to
> explide (there is never more than 3 types in the duplicate chain for Firefox
> and we used to have more than 4k duplicates of single type before the changes)
> 
> Bootstrapped/regtested x86_64-linux.
> OK?
> 
> Honza
>   * ipa-devirt.c (add_type_duplicate): Do not ICE on incomplete enums.
>   * tree.c (build_array_type_1): Forward declare.
>   (fld_type_variant_equal_p): Add INNER_TYPE parameter.
>   (fld_type_variant): Likewise.
>   (fld_simplified_types): New hash.
>   (fld_incomplete_type_of): Handle array and enumeration types.
>   (fld_simplified_type): Handle simplification of arrays.
>   (free_lang_data): Allocate and free simplified types hash.
> Index: ipa-devirt.c
> ===
> --- ipa-devirt.c  (revision 266011)
> +++ ipa-devirt.c  (working copy)
> @@ -1705,7 +1705,8 @@ add_type_duplicate (odr_type val, tree t
>else if (!COMPLETE_TYPE_P (val->type) && COMPLETE_TYPE_P (type))
>  {
>prevail = true;
> -  build_bases = TYPE_BINFO (type);
> +  if (TREE_CODE (type) == RECORD_TYPE)
> +build_bases = TYPE_BINFO (type);
>  }
>else if (COMPLETE_TYPE_P (val->type) && !COMPLETE_TYPE_P (type))
>  ;
> Index: tree.c
> ===
> --- tree.c(revision 266011)
> +++ tree.c(working copy)
> @@ -265,6 +265,8 @@ static void print_type_hash_statistics (
>  static void print_debug_expr_statistics (void);
>  static void print_value_expr_statistics (void);
>  
> +static tree build_array_type_1 (tree, tree, bool, bool);
> +
>  tree global_trees[TI_MAX];
>  tree integer_types[itk_none];
>  
> @@ -5108,10 +5110,11 @@ fld_simplified_type_name (tree type)
>  
>  /* Do same comparsion as check_qualified_type skipping lang part of type
> and be more permissive about type names: we only care that names are
> -   same (for diagnostics) and that ODR names are the same.  */
> +   same (for diagnostics) and that ODR names are the same.
> +   If INNER_TYPE is non-NULL, be sure that TREE_TYPE match it.  */
>  
>  static bool
> -fld_type_variant_equal_p (tree t, tree v)
> +fld_type_variant_equal_p (tree t, tree v, tree inner_type)
>  {
>if (TYPE_QUALS (t) != TYPE_QUALS (v)
>/* We want to match incomplete variants with complete types.
> @@ -5121,21 +5124,24 @@ fld_type_variant_equal_p (tree t, tree v
> || TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (v)))
>|| fld_simplified_type_name (t) != fld_simplified_type_name (v)
>|| !attribute_list_equal (TYPE_ATTRIBUTES (t),
> - TYPE_ATTRIBUTES (v)))
> + TYPE_ATTRIBUTES (v))
> +  || (inner_type && TREE_TYPE (v) != inner_type))
>  return false;
> - 
> +
>return true;
>  }
>  
> -/* Find variant of FIRST that match T and create new one if necessary.  */
> +/* Find variant of FIRST that match T and create new one if necessary.
> +   Set TREE_TYPE to INNER_TYPE if non-NULL.  */
>  
>  static tree
> -fld_type_variant (tree first, tree t, struct free_lang_data_d *fld)
> +fld_type_variant (tree first, tree t, struct free_lang_data_d *fld,
> +   tree inner_type = NULL)
>  {
>if (first == TYPE_MAIN_VARIANT (t))
>  return t;
>for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> -if (fld_type_variant_equal_p (t, v))
> +if (fld_type_variant_equal_p (t, v, inner_type))
>return v;
>tree v = build_variant_type_copy (first);
>TYPE_READONLY (v) = TYPE_R

Re: [PATCH] Fix debug stmt handling in optimize_recip_sqrt (PR tree-optimization/87977)

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 10:04:03AM +0100, Richard Biener wrote:
> On Tue, 13 Nov 2018, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > During analysis, we correctly ignore debug stmt uses, but if we don't
> > release the ssa name we stopped using, the debug stmts uses are left in the
> > IL.  The reset_debug_uses call is needed because the code modifies the
> > division stmt in place.  Perhaps better would be not to do that, i.e.
> > create a new division and gsi_replace it, then just the release_ssa_name
> > would be enough and the generic code could add e.g. the 1.0 / _1 replacement
> > for the debug stmt?
> 
> Yeah, that's how we usually deal with this.  I hope the re-use of the
> LHS is with the same value?

The lhs isn't reused.
  gimple_assign_set_lhs (stmt, sqr_ssa_name);
  gimple_assign_set_rhs2 (stmt, a);
changes both lhs and rhs2 to some other operand.  Without the
reset_debug_uses and not actually reusing the stmt we get wrong debug
info (1.0 / a), i.e. with the new rhs2, rather than old.

> >  Though, in this particular case the sqrt call is
> > optimized away, so it wouldn't make a difference.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, or
> > should I do the gimple_build_assign + gsi_replace change?
> 
> I think that would be cleaner.

Ok, will do.  Thanks.

Jakub


Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 10:15 AM Kyrill Tkachov
 wrote:
>
> Hi Richard,
>
> On 13/11/18 08:24, Richard Biener wrote:
> > On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
> >  wrote:
> >>
> >> On 12/11/18 14:10, Richard Biener wrote:
> >>> On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
> >>>  wrote:
>  On 09/11/18 12:18, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >  wrote:
> >> Hi all,
> >>
> >> In this testcase the codegen for VLA SVE is worse than it could be due 
> >> to unrolling:
> >>
> >> fully_peel_me:
> >>mov x1, 5
> >>ptrue   p1.d, all
> >>whilelo p0.d, xzr, x1
> >>ld1dz0.d, p0/z, [x0]
> >>faddz0.d, z0.d, z0.d
> >>st1dz0.d, p0, [x0]
> >>cntdx2
> >>addvl   x3, x0, #1
> >>whilelo p0.d, x2, x1
> >>beq .L1
> >>ld1dz0.d, p0/z, [x0, #1, mul vl]
> >>faddz0.d, z0.d, z0.d
> >>st1dz0.d, p0, [x3]
> >>cntwx2
> >>incbx0, all, mul #2
> >>whilelo p0.d, x2, x1
> >>beq .L1
> >>ld1dz0.d, p0/z, [x0]
> >>faddz0.d, z0.d, z0.d
> >>st1dz0.d, p0, [x0]
> >> .L1:
> >>ret
> >>
> >> In this case, due to the vector-length-agnostic nature of SVE the 
> >> compiler doesn't know the loop iteration count.
> >> For such loops we don't want to unroll if we don't end up eliminating 
> >> branches as this just bloats code size
> >> and hurts icache performance.
> >>
> >> This patch introduces a new unroll-known-loop-iterations-only param 
> >> that disables cunroll when the loop iteration
> >> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often 
> >> for SVE VLA code, but it does help some
> >> Advanced SIMD cases as well where loops with an unknown iteration 
> >> count are not unrolled when it doesn't eliminate
> >> the branches.
> >>
> >> So for the above testcase we generate now:
> >> fully_peel_me:
> >>mov x2, 5
> >>mov x3, x2
> >>mov x1, 0
> >>whilelo p0.d, xzr, x2
> >>ptrue   p1.d, all
> >> .L2:
> >>ld1dz0.d, p0/z, [x0, x1, lsl 3]
> >>faddz0.d, z0.d, z0.d
> >>st1dz0.d, p0, [x0, x1, lsl 3]
> >>incdx1
> >>whilelo p0.d, x1, x3
> >>bne .L2
> >>ret
> >>
> >> Not perfect still, but it's preferable to the original code.
> >> The new param is enabled by default on aarch64 but disabled for other 
> >> targets, leaving their behaviour unchanged
> >> (until other target people experiment with it and set it, if 
> >> appropriate).
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences 
> >> in performance.
> >>
> >> Ok for trunk?
> > Hum.  Why introduce a new --param and not simply key on
> > flag_peel_loops instead?  That is
> > enabled by default at -O3 and with FDO but you of course can control
> > that in your targets
> > post-option-processing hook.
>  You mean like this?
>  It's certainly a simpler patch, but I was just a bit hesitant of making 
>  this change for all targets :)
>  But I suppose it's a reasonable change.
> >>> No, that change is backward.  What I said is that peeling is already
> >>> conditional on
> >>> flag_peel_loops and that is enabled by -O3.  So you want to disable
> >>> flag_peel_loops for
> >>> SVE instead in the target.
> >> Sorry, I got confused by the similarly named functions.
> >> I'm talking about try_unroll_loop_completely when run as part of 
> >> canonicalize_induction_variables i.e. the "ivcanon" pass
> >> (sorry about blaming cunroll here). This doesn't get called through the 
> >> try_unroll_loops_completely path.
> > Well, peeling gets disabled.  From your patch I see you want to
> > disable "unrolling" when
> > the number of loop iteration is not constant.  That is called peeling
> > where we need to
> > emit the loop exit test N times.
> >
> > Did you check your testcases with -fno-peel-loops?
>
> -fno-peel-loops doesn't help in the testcases. The code that does this 
> peeling (try_unroll_loop_completely)
> can be called through two paths, only one of which is gated on 
> flag_peel_loops.

I don't see the obvious here so I have to either sit down with a
non-SVE specific testcase
showing this, or I am misunderstanding the actual transform that you
want to avoid.
allow_peel is false when called from canonicalize_induction_variables.
There's the slight
chance that UL_NO_GRO

[PATCH] Fix aarch64_compare_and_swap* constraints (PR target/87839)

2018-11-13 Thread Jakub Jelinek
Hi!

The following testcase ICEs because the predicate and constraints on one of
the operands of @aarch64_compare_and_swapdi aren't consistent.  The RA which
goes according to constraints
(insn 15 13 16 2 (set (reg:DI 104)
(const_int 8589934595 [0x20003])) "pr87839.c":15:3 47 
{*movdi_aarch64}
 (expr_list:REG_EQUIV (const_int 8589934595 [0x20003])
(nil)))
(insn 16 15 21 2 (parallel [
(set (reg:CC 66 cc)
(unspec_volatile:CC [
(const_int 0 [0])
] UNSPECV_ATOMIC_CMPSW))
(set (reg:DI 101)
(mem/v:DI (reg/f:DI 99) [-1  S8 A64]))
(set (mem/v:DI (reg/f:DI 99) [-1  S8 A64])
(unspec_volatile:DI [
(reg:DI 104)
(reg:DI 103)
(const_int 0 [0])
(const_int 32773 [0x8005]) repeated x2
] UNSPECV_ATOMIC_CMPSW))
(clobber (scratch:SI))
]) "pr87839.c":15:3 3532 {aarch64_compare_and_swapdi}
 (expr_list:REG_UNUSED (reg:DI 101)
(expr_list:REG_UNUSED (reg:CC 66 cc)
(nil
when seeing n constraint puts the 0x20003 constant directly into the
atomic instruction, but the predicate requires that it is either a register,
or shifted positive or negative 12-bit constant and so it fails to split.
The positive shifted constant apparently has I constraint and negative one
J, and other uses of aarch64_plus_operand that have some constraint use
rIJ (or r):
config/aarch64/aarch64.md: (match_operand:GPI 2 
"aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md:(match_operand:SI 2 
"aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md: (match_operand:GPI 1 
"aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md: (match_operand:GPI 1 
"aarch64_plus_operand" "r"))
config/aarch64/aarch64.md:  (match_operand:GPI 1 
"aarch64_plus_operand" "r,I,J")))]

I don't have a setup to easily bootstrap/regtest aarch64-linux ATM, could
somebody please include it in their bootstrap/regtest?  Thanks.

2018-11-13  Jakub Jelinek  

PR target/87839
* config/aarch64/atomics.md (@aarch64_compare_and_swap): Use
rIJ constraint for aarch64_plus_operand rather than rn.

* gcc.target/aarch64/pr87839.c: New test.

--- gcc/config/aarch64/atomics.md.jj2018-11-01 12:06:43.469963662 +0100
+++ gcc/config/aarch64/atomics.md   2018-11-13 09:59:35.660185116 +0100
@@ -71,7 +71,7 @@ (define_insn_and_split "@aarch64_compare
 (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
(set (match_dup 1)
 (unspec_volatile:GPI
-  [(match_operand:GPI 2 "aarch64_plus_operand" "rn")   ;; expect
+  [(match_operand:GPI 2 "aarch64_plus_operand" "rIJ")  ;; expect
(match_operand:GPI 3 "aarch64_reg_or_zero" "rZ");; 
desired
(match_operand:SI 4 "const_int_operand");; 
is_weak
(match_operand:SI 5 "const_int_operand");; mod_s
--- gcc/testsuite/gcc.target/aarch64/pr87839.c.jj   2018-11-13 
10:13:44.353309416 +0100
+++ gcc/testsuite/gcc.target/aarch64/pr87839.c  2018-11-13 10:13:05.496944699 
+0100
@@ -0,0 +1,29 @@
+/* PR target/87839 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+
+long long b[64];
+void foo (void);
+int bar (void (*) (void));
+void qux (long long *, long long) __attribute__((noreturn));
+void quux (long long *, long long);
+
+void
+baz (void)
+{
+  __sync_val_compare_and_swap (b, 4294967298LL, 78187493520LL);
+  __sync_bool_compare_and_swap (b + 1, 8589934595LL, 21474836489LL);
+  __sync_fetch_and_xor (b, 60129542145LL);
+  quux (b, 42949672967LL);
+  __sync_xor_and_fetch (b + 22, 60129542145LL);
+  quux (b + 23, 42949672967LL);
+  if (bar (baz))
+__builtin_abort ();
+  foo ();
+  __sync_val_compare_and_swap (b, 4294967298LL, 0);
+  __sync_bool_compare_and_swap (b + 1, 8589934595LL, 78187493520LL);
+  if (__sync_or_and_fetch (b, 21474836489LL) != 21474836489LL)
+qux (b + 22, 60129542145LL);
+  __atomic_fetch_nand (b + 23, 42949672967LL, __ATOMIC_RELAXED);
+  bar (baz);
+}

Jakub


[PATCH] value_range_base leftovers

2018-11-13 Thread Richard Biener


Factors more from the union code and fixes a few leftovers.

Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-13  Richard Biener  

* tree-ssanames.h (set_range_info): Use value_range_base.
(get_range_info): Likewise.
* tree-ssanames.c (set_range_info): Likewise.
(get_range_info): Likewise.
* tree-vrp.c (value_range_base::union_helper): Split
out common parts of value_range[_base]::union_.
(value_range_base::union_): Update.
(value_range::union_): Likewise.
(determine_value_range_1): Use value_range_base.
(determine_value_range): Likewise.
* tree-vrp.h (value_range_base::union_helper): Move ...
(value_range::union_helper): ... from here.

Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c (revision 266056)
+++ gcc/tree-ssanames.c (working copy)
@@ -401,7 +401,7 @@ set_range_info (tree name, enum value_ra
 /* Store range information for NAME from a value_range.  */
 
 void
-set_range_info (tree name, const value_range &vr)
+set_range_info (tree name, const value_range_base &vr)
 {
   wide_int min = wi::to_wide (vr.min ());
   wide_int max = wi::to_wide (vr.max ());
@@ -434,7 +434,7 @@ get_range_info (const_tree name, wide_in
in a value_range VR.  Returns the value_range_kind.  */
 
 enum value_range_kind
-get_range_info (const_tree name, value_range &vr)
+get_range_info (const_tree name, value_range_base &vr)
 {
   tree min, max;
   wide_int wmin, wmax;
Index: gcc/tree-ssanames.h
===
--- gcc/tree-ssanames.h (revision 266056)
+++ gcc/tree-ssanames.h (working copy)
@@ -69,11 +69,11 @@ struct GTY ((variable_size)) range_info_
 /* Sets the value range to SSA.  */
 extern void set_range_info (tree, enum value_range_kind, const wide_int_ref &,
const wide_int_ref &);
-extern void set_range_info (tree, const value_range &);
+extern void set_range_info (tree, const value_range_base &);
 /* Gets the value range from SSA.  */
 extern enum value_range_kind get_range_info (const_tree, wide_int *,
 wide_int *);
-extern enum value_range_kind get_range_info (const_tree, value_range &);
+extern enum value_range_kind get_range_info (const_tree, value_range_base &);
 extern void set_nonzero_bits (tree, const wide_int_ref &);
 extern wide_int get_nonzero_bits (const_tree);
 extern bool ssa_name_has_boolean_range (tree);
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 266056)
+++ gcc/tree-vrp.c  (working copy)
@@ -6084,125 +6084,73 @@ value_range::intersect (const value_rang
 }
 }
 
-/* Meet operation for value ranges.  Given two value ranges VR0 and
-   VR1, store in VR0 a range that contains both VR0 and VR1.  This
-   may not be the smallest possible such range.  */
-
-void
-value_range_base::union_ (const value_range_base *other)
-{
-  if (other->undefined_p ())
-{
-  /* this already has the resulting range.  */
-  return;
-}
+/* Helper for meet operation for value ranges.  Given two value ranges VR0 and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{
+  /* VR0 has the resulting range if VR1 is undefined or VR0 is varying.  */
+  if (vr1->undefined_p ()
+  || vr0->varying_p ())
+return *vr0;
+
+  /* VR1 has the resulting range if VR0 is undefined or VR1 is varying.  */
+  if (vr0->undefined_p ()
+  || vr1->varying_p ())
+return *vr1;
 
-  if (this->undefined_p ())
-{
-  *this = *other;
-  return;
-}
+  value_range_kind vr0type = vr0->kind ();
+  tree vr0min = vr0->min ();
+  tree vr0max = vr0->max ();
+  union_ranges (&vr0type, &vr0min, &vr0max,
+   vr1->kind (), vr1->min (), vr1->max ());
 
-  if (this->varying_p ())
-{
-  /* Nothing to do.  VR0 already has the resulting range.  */
-  return;
-}
+  /* Work on a temporary so we can still use vr0 when union returns varying.  
*/
+  value_range tem;
+  tem.set_and_canonicalize (vr0type, vr0min, vr0max);
 
-  if (other->varying_p ())
+  /* Failed to find an efficient meet.  Before giving up and setting
+ the result to VARYING, see if we can at least derive a useful
+ anti-range.  */
+  if (tem.varying_p ()
+  && range_includes_zero_p (vr0) == 0
+  && range_includes_zero_p (vr1) == 0)
 {
-  this->set_varying ();
-  return;
+  tem.set_nonnull (vr0->type ());
+  return tem;
 }
 
-  value_range saved (*this);
-  value_range_kind vr0type = this->kind ();
-  tree vr0min = this->min ();
-  tree vr0max = this->max ();
-  union_ranges (&vr0type, &vr0m

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-13 Thread Kyrill Tkachov



On 13/11/18 09:28, Richard Biener wrote:

On Tue, Nov 13, 2018 at 10:15 AM Kyrill Tkachov
 wrote:

Hi Richard,

On 13/11/18 08:24, Richard Biener wrote:

On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
 wrote:

On 12/11/18 14:10, Richard Biener wrote:

On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
 wrote:

On 09/11/18 12:18, Richard Biener wrote:

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:

Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
mov x1, 5
ptrue   p1.d, all
whilelo p0.d, xzr, x1
ld1dz0.d, p0/z, [x0]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x0]
cntdx2
addvl   x3, x0, #1
whilelo p0.d, x2, x1
beq .L1
ld1dz0.d, p0/z, [x0, #1, mul vl]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x3]
cntwx2
incbx0, all, mul #2
whilelo p0.d, x2, x1
beq .L1
ld1dz0.d, p0/z, [x0]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x0]
.L1:
ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
mov x2, 5
mov x3, x2
mov x1, 0
whilelo p0.d, xzr, x2
ptrue   p1.d, all
.L2:
ld1dz0.d, p0/z, [x0, x1, lsl 3]
faddz0.d, z0.d, z0.d
st1dz0.d, p0, [x0, x1, lsl 3]
incdx1
whilelo p0.d, x1, x3
bne .L2
ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

You mean like this?
It's certainly a simpler patch, but I was just a bit hesitant of making this 
change for all targets :)
But I suppose it's a reasonable change.

No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.

Sorry, I got confused by the similarly named functions.
I'm talking about try_unroll_loop_completely when run as part of 
canonicalize_induction_variables i.e. the "ivcanon" pass
(sorry about blaming cunroll here). This doesn't get called through the 
try_unroll_loops_completely path.

Well, peeling gets disabled.  From your patch I see you want to
disable "unrolling" when
the number of loop iteration is not constant.  That is called peeling
where we need to
emit the loop exit test N times.

Did you check your testcases with -fno-peel-loops?

-fno-peel-loops doesn't help in the testcases. The code that does this peeling 
(try_unroll_loop_completely)
can be called through two paths, only one of which is gated on flag_peel_loops.

I don't see the obvious here so I have to either sit down with a
non-SVE specific testcase
showing this, or I am misunderstanding the actual transform that you
want to avoid.
allow_peel is false when called from canonicalize_induction_variables.
There's the slight
chance that UL_NO_GROWTH lets through cases - is your case one of
that?  That is,
does the following help?

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 266056)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -724,7 +724,7 @@ try_unroll_loop_completely (struct loop
  exit = NULL;

/* See if we can improve our estimate by using recorded loop bounds.  */
-  if ((allow_peel || maxiter == 0 || ul == UL_NO_GROWTH)
+  if ((allow_peel || maxiter == 0)
&& maxiter >= 0
&& (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
  {

IIRC I allowed that case when adding allow_peel simply because it avoided some
testsuite regressions.  Thi

Re: [C++ PATCH] Implement P0722R3, destroying operator delete.

2018-11-13 Thread Jonathan Wakely
On Tue, 13 Nov 2018 at 04:39, Jason Merrill wrote:
> Tested x86_64-pc-linux-gnu, applying to trunk.  Can someone from the
> libstdc++ team clean up my libsupc++ change if it should be formatted
> differently?

Looks fine to me, thanks.


Do not generate stack usage files with -flto

2018-11-13 Thread Eric Botcazou
They are empty.  Tested on x86-64/Linux, applied on the mainline as obvious.


2018-11-13  Eric Botcazou  

* toplev.c (output_stack_usage): Turn test on flag_stack_usage into
test on stack_usage_file.
(lang_dependent_init): Do not open the su file if generating LTO.

-- 
Eric BotcazouIndex: toplev.c
===
--- toplev.c	(revision 266029)
+++ toplev.c	(working copy)
@@ -989,7 +989,7 @@ output_stack_usage (void)
   stack_usage += current_function_dynamic_stack_size;
 }
 
-  if (flag_stack_usage)
+  if (stack_usage_file)
 {
   expanded_location loc
 	= expand_location (DECL_SOURCE_LOCATION (current_function_decl));
@@ -1934,7 +1934,7 @@ lang_dependent_init (const char *name)
   init_asm_output (name);
 
   /* If stack usage information is desired, open the output file.  */
-  if (flag_stack_usage)
+  if (flag_stack_usage && !flag_generate_lto)
 	stack_usage_file = open_auxiliary_file ("su");
 }
 


Re: [PATCH 4/6] [ARC] Add peephole rules to combine store/loads into double store/loads

2018-11-13 Thread Andrew Burgess
* claz...@gmail.com  [2018-10-31 10:33:33 +0200]:

> Thank you for your review. Please find attached a new respin patch with
> your feedback in.
> 
> Please let me know if it is ok,
> Claudiu 

> From 4ff7d8419783eceeffbaf27df017d0a93c3af942 Mon Sep 17 00:00:00 2001
> From: Claudiu Zissulescu 
> Date: Thu, 9 Aug 2018 14:29:05 +0300
> Subject: [PATCH] [ARC] Add peephole rules to combine store/loads into double
>  store/loads
> 
> Simple peephole rules which combines multiple ld/st instructions into
> 64-bit load/store instructions. It only works for architectures which
> are having double load/store option on.
> 
> gcc/
>   Claudiu Zissulescu  
> 
>   * config/arc/arc-protos.h (gen_operands_ldd_std): Add.
>   * config/arc/arc.c (operands_ok_ldd_std): New function.
>   (mem_ok_for_ldd_std): Likewise.
>   (gen_operands_ldd_std): Likewise.
>   * config/arc/arc.md: Add peephole2 rules for std/ldd.

Looks good.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc-protos.h |   1 +
>  gcc/config/arc/arc.c| 161 
>  gcc/config/arc/arc.md   |  69 
>  3 files changed, 231 insertions(+)
> 
> diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
> index 24bea6e1efb..55f8ed4c643 100644
> --- a/gcc/config/arc/arc-protos.h
> +++ b/gcc/config/arc/arc-protos.h
> @@ -46,6 +46,7 @@ extern int arc_return_address_register (unsigned int);
>  extern unsigned int arc_compute_function_type (struct function *);
>  extern bool arc_is_uncached_mem_p (rtx);
>  extern bool arc_lra_p (void);
> +extern bool gen_operands_ldd_std (rtx *operands, bool load, bool commute);
>  #endif /* RTX_CODE */
>  
>  extern unsigned int arc_compute_frame_size (int);
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 18dd0de6af7..daf785dbdb8 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -10803,6 +10803,167 @@ arc_cannot_substitute_mem_equiv_p (rtx)
>return true;
>  }
>  
> +/* Checks whether the operands are valid for use in an LDD/STD
> +   instruction.  Assumes that RT, and RT2 are REG.  This is guaranteed
> +   by the patterns.  Assumes that the address in the base register RN
> +   is word aligned.  Pattern guarantees that both memory accesses use
> +   the same base register, the offsets are constants within the range,
> +   and the gap between the offsets is 4.  If reload complete then
> +   check that registers are legal.  */
> +
> +static bool
> +operands_ok_ldd_std (rtx rt, rtx rt2, HOST_WIDE_INT offset)
> +{
> +  unsigned int t, t2;
> +
> +  if (!reload_completed)
> +return true;
> +
> +  if (!(SMALL_INT_RANGE (offset, (GET_MODE_SIZE (DImode) - 1) & (~0x03),
> +  (offset & (GET_MODE_SIZE (DImode) - 1) & 3
> +   ? 0 : -(-GET_MODE_SIZE (DImode) | (~0x03)) >> 1
> +return false;
> +
> +  t = REGNO (rt);
> +  t2 = REGNO (rt2);
> +
> +  if ((t2 == PROGRAM_COUNTER_REGNO)
> +  || (t % 2 != 0)/* First destination register is not even.  */
> +  || (t2 != t + 1))
> +  return false;
> +
> +  return true;
> +}
> +
> +/* Helper for gen_operands_ldd_std.  Returns true iff the memory
> +   operand MEM's address contains an immediate offset from the base
> +   register and has no side effects, in which case it sets BASE and
> +   OFFSET accordingly.  */
> +
> +static bool
> +mem_ok_for_ldd_std (rtx mem, rtx *base, rtx *offset)
> +{
> +  rtx addr;
> +
> +  gcc_assert (base != NULL && offset != NULL);
> +
> +  /* TODO: Handle more general memory operand patterns, such as
> + PRE_DEC and PRE_INC.  */
> +
> +  if (side_effects_p (mem))
> +return false;
> +
> +  /* Can't deal with subregs.  */
> +  if (GET_CODE (mem) == SUBREG)
> +return false;
> +
> +  gcc_assert (MEM_P (mem));
> +
> +  *offset = const0_rtx;
> +
> +  addr = XEXP (mem, 0);
> +
> +  /* If addr isn't valid for DImode, then we can't handle it.  */
> +  if (!arc_legitimate_address_p (DImode, addr,
> + reload_in_progress || reload_completed))
> +return false;
> +
> +  if (REG_P (addr))
> +{
> +  *base = addr;
> +  return true;
> +}
> +  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == MINUS)
> +{
> +  *base = XEXP (addr, 0);
> +  *offset = XEXP (addr, 1);
> +  return (REG_P (*base) && CONST_INT_P (*offset));
> +}
> +
> +  return false;
> +}
> +
> +/* Called from peephole2 to replace two word-size accesses with a
> +   single LDD/STD instruction.  Returns true iff we can generate a new
> +   instruction sequence.  That is, both accesses use the same base
> +   register and the gap between constant offsets is 4.  OPERANDS are
> +   the operands found by the peephole matcher; OPERANDS[0,1] are
> +   register operands, and OPERANDS[2,3] are the corresponding memory
> +   operands.  LOAD indicates whether the access is load or store.  */
> +
> +bool
> +gen_operands_ldd_std (rtx *operands, bool load, bool commute)
> +{

Re: [PATCH] Fortran include line fixes and -fdec-include support

2018-11-13 Thread Mark Eggleston

Jakub,

I've applied this patch and tried it out.

The following fail to compile:

In free form with or without the optional initial & on the continuation 
line:


subroutine one()
  include &
  &"include_4.inc'
  integer(i4) :: i
end subroutine one

In fixed form:

  subroutine one()
  integer include
  include
 +"include_16.inc'
  write (*,*) include
  end subroutine one

Both give "Error: Unclassifiable statement at (1)" with 1 at the start 
of the include line.


These cases are not covered by the test cases in the patch.

Is there any particular reason that include line continuation is 
isolated in -fdec-include? Is there a specific restriction regarding 
include lines in the Fortran standard such that continuations in free 
form and fixed form can't be used with include lines?


regards,

Mark Eggleston

On 12/11/2018 14:51, Jakub Jelinek wrote:

Hi!

In fortran97.pdf I read:
"Except in a character context, blanks are insignificant and may be used freely 
throughout the program."
and while we handle that in most cases, we don't allow spaces in INCLUDE
lines in fixed form, while e.g. ifort does.

Another thing, which I haven't touched in the PR except covering it with a
testcase is that we allow INLINE line in fixed form to start even in columns
1 to 6, while ifort rejects that.  Is say
  include 'omp_lib.h'
valid in fixed form?  i in column 6 normally means a continuation line,
though not sure if anything can in a valid program contain nclude
followed by character literal.  Shall we reject that, or at least warn that
it won't be portable?

The last thing, biggest part of the patch, is that for legacy DEC
compatibility, the DEC manuals document INCLUDE as a statement, not a line,
the
"An INCLUDE line is not a Fortran statement."
and
"An INCLUDE line shall appear on a single source line where a statement may 
appear; it shall be
the only nonblank text on this line other than an optional trailing comment. 
Thus, a statement
label is not allowed."
bullets don't apply, but instead there is:
"The INCLUDE statement takes one of the following forms:"
"An INCLUDE statement can appear anywhere within a scoping unit. The statement
can span more than one source line, but no other statement can appear on the 
same
line. The source line cannot be labeled."

This means there can be (as can be seen in the following testcases)
continuations in both forms, and in fixed form there can be 0 in column 6.

In order not to duplicate all the handling of continuations, comment
skipping etc., the patch just adjusts the include_line routine so that it
signals if the current line is a possible start of a valid INCLUDE statement
when in -fdec-include mode, and if so, whenever it reads a further line it
retries to parse it using
gfc_next_char/gfc_next_char_literal/gfc_gobble_whitespace APIs as an INCLUDE
stmt.  If it is found not to be a valid INCLUDE statement line or set of
lines, it returns 0, if it is valid, it returns 1 together with load_file
like include_line does and clears all the lines containint the INCLUDE
statement.  If the reading stops because we don't have enough lines, -1 is
returned and the caller tries again with more lines.

Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

In addition to the above mentioned question about include in columns 1-6 in
fixed form, another thing is that we support
   print *, 'abc''def'
   print *, "hij""klm"
which prints abc'def and hij"klm.  Shall we support that for INCLUDE lines
and INCLUDE statements too?

2018-11-12  Jakub Jelinek  
Mark Eggleston  

* lang.opt (fdec-include): New option.
* options.c (set_dec_flags): Set also flag_dec_include.
* scanner.c (include_line): Change return type from bool to int.
In fixed form allow spaces in between include keyword letters.
For -fdec-include, allow in fixed form 0 in column 6.  With
-fdec-include return -1 if the parsed line is not full include
statement and it could be successfully completed on continuation
lines.
(include_stmt): New function.
(load_file): Adjust include_line caller.  If it returns -1, keep
trying include_stmt until it stops returning -1 whenever adding
further line of input.

* gfortran.dg/include_10.f: New test.
* gfortran.dg/include_10.inc: New file.
* gfortran.dg/include_11.f: New test.
* gfortran.dg/include_12.f: New test.
* gfortran.dg/include_13.f90: New test.
* gfortran.dg/gomp/include_1.f: New test.
* gfortran.dg/gomp/include_1.inc: New file.
* gfortran.dg/gomp/include_2.f90: New test.

--- gcc/fortran/lang.opt.jj 2018-07-18 22:57:15.227785894 +0200
+++ gcc/fortran/lang.opt2018-11-12 09:35:03.185259773 +0100
@@ -440,6 +440,10 @@ fdec
  Fortran Var(flag_dec)
  Enable all DEC language extensions.
  
+fdec-include

+Fortran Var(flag_dec_include)
+Enable legacy parsi

Re: [PATCH 1/3] [ARC] Update EH code.

2018-11-13 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-11-12 13:25:11 +0200]:

> Our ABI says the blink is pushed first on stack followed by an unknown
> number of register saves, and finally by fp.  Hence we cannot use the
> EH_RETURN_ADDRESS macro as the stack is not finalized at that moment.
> The alternative is to use the eh_return pattern and to initialize all
> the bits after register allocation when the stack layout is finalized.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_eh_return_address_location): Repurpose it
>   to fit the eh_return pattern.
>   * config/arc/arc.md (eh_return): Define.
>   (VUNSPEC_ARC_EH_RETURN): Likewise.
>   * config/arc/arc-protos.h (arc_eh_return_address_location): Match
>   new implementation.
>   * config/arc/arc.h (EH_RETURN_HANDLER_RTX): Remove it.

Looks good.

Thanks,
Andrew


> 
> testsuite/
> -xx-xx  Claudiu Zissulescu  
> 
>   * gcc.target/arc/builtin_eh.c: New test.
> ---
>  gcc/config/arc/arc-protos.h   |  2 +-
>  gcc/config/arc/arc.c  | 13 -
>  gcc/config/arc/arc.h  |  2 --
>  gcc/config/arc/arc.md | 15 +++
>  gcc/testsuite/gcc.target/arc/builtin_eh.c | 22 ++
>  5 files changed, 46 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/builtin_eh.c
> 
> diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
> index 6450b6a014e..4f72a06e3dc 100644
> --- a/gcc/config/arc/arc-protos.h
> +++ b/gcc/config/arc/arc-protos.h
> @@ -110,7 +110,7 @@ extern bool arc_legitimize_reload_address (rtx *, 
> machine_mode, int, int);
>  extern void arc_secondary_reload_conv (rtx, rtx, rtx, bool);
>  extern void arc_cpu_cpp_builtins (cpp_reader *);
>  extern bool arc_store_addr_hazard_p (rtx_insn *, rtx_insn *);
> -extern rtx arc_eh_return_address_location (void);
> +extern void arc_eh_return_address_location (rtx);
>  extern bool arc_is_jli_call_p (rtx);
>  extern void arc_file_end (void);
>  extern bool arc_is_secure_call_p (rtx);
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 6802ca66554..a92456b457d 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -3858,10 +3858,13 @@ arc_check_multi (rtx op, bool push_p)
>  /* Return rtx for the location of the return address on the stack,
> suitable for use in __builtin_eh_return.  The new return address
> will be written to this location in order to redirect the return to
> -   the exception handler.  */
> +   the exception handler.  Our ABI says the blink is pushed first on
> +   stack followed by an unknown number of register saves, and finally
> +   by fp.  Hence we cannot use the EH_RETURN_ADDRESS macro as the
> +   stack is not finalized.  */
>  
> -rtx
> -arc_eh_return_address_location (void)
> +void
> +arc_eh_return_address_location (rtx source)
>  {
>rtx mem;
>int offset;
> @@ -3889,8 +3892,8 @@ arc_eh_return_address_location (void)
>   remove this store seems perfectly sensible.  Marking the memory
>   address as volatile obviously has the effect of preventing DSE
>   from removing the store.  */
> -  MEM_VOLATILE_P (mem) = 1;
> -  return mem;
> +  MEM_VOLATILE_P (mem) = true;
> +  emit_move_insn (mem, source);
>  }
>  
>  /* PIC */
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index afd6d7681cf..a0a84900917 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -1355,8 +1355,6 @@ do { \
>  
>  #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, 2)
>  
> -#define EH_RETURN_HANDLER_RTXarc_eh_return_address_location ()
> -
>  /* Turn off splitting of long stabs.  */
>  #define DBX_CONTIN_LENGTH 0
>  
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index a28c67ac184..a6bac0e8bee 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -163,6 +163,7 @@
>VUNSPEC_ARC_SC
>VUNSPEC_ARC_LL
>VUNSPEC_ARC_BLOCKAGE
> +  VUNSPEC_ARC_EH_RETURN
>])
>  
>  (define_constants
> @@ -6627,6 +6628,20 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>[(set_attr "type" "call_no_delay_slot")
> (set_attr "length" "2")])
>  
> +;; Patterns for exception handling
> +(define_insn_and_split "eh_return"
> +  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")]
> + VUNSPEC_ARC_EH_RETURN)]
> +  ""
> +  "#"
> +  "reload_completed"
> +  [(const_int 0)]
> +  "
> +  {
> +arc_eh_return_address_location (operands[0]);
> +DONE;
> +  }"
> +)
>  ;; include the arc-FPX instructions
>  (include "fpx.md")
>  
> diff --git a/gcc/testsuite/gcc.target/arc/builtin_eh.c 
> b/gcc/testsuite/gcc.target/arc/builtin_eh.c
> new file mode 100644
> index 000..717a54bb084
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/builtin_eh.c
> @@ -0,0 +1,22 @@
> +/* Check if we have the right offset for @bar function.  */
> +/* { dg-options "-O1" } */
> +
> +void bar (void);
> +
> +void
> +foo (int x)
> +{

Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-13 Thread Andrew Stubbs

On 12/11/2018 18:54, Jeff Law wrote:

On 11/12/18 10:52 AM, Andrew Stubbs wrote:

On 12/11/2018 17:20, Segher Boessenkool wrote:

If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
Or actually use the register, i.e. as input to an actually needed
instruction.


They're not useless. If we want to do scalar operations in vector
registers (and we often do, on this target), then we need to write a "1"
into the EXEC (vector mask) register.

Presumably you're setting up active lanes or some such.  This may
ultimately be better modeled by ignoring the problem until much later in
the pipeline.

Shortly before assembly output you run a little LCM-like pass to find
optimal points to insert the assignment to the vector register.  It's a
lot like the mode switching stuff or zeroing the upper halves of the AVX
regsiters to avoid partial register stalls.  THe local properties are
different, but these all feel like the same class of problem.


Yes, this is one of the plans. The tricky bit is getting that to work 
right with while_ult.


Of course, the real reason this is still an issue is finding time for it 
when there's lots else to be done.



Unless we want to rewrite all scalar operations in terms of vec_merge
then there's no way to "actually use the register".

I think you need to correctly model it.  If you lie to the compiler
about what's going on, you're going to run into problems.


I might investigate putting the USE inside an UNSPEC_VOLATILE. That
would have the advantage of letting combine run again. This feels like a
future project I'd rather not have block the port submission though.

The gcn_legitimate_combined_insn code isn't really acceptable though.
You need a cleaner solution here.


Now that Segher says the combine issue is a bug, I think I'll remove 
gcn_legitimate_combined_insn altogether -- it's only an optimization 
issue, after all -- and try to find a testcase I can report as a PR.


I don't suppose I can easily reproduce it on another architecture, so 
it'll have to wait until GCN is committed.


It's also possible that the issue has ceased to exist since GCC 7.


If there are two instructions that both have an UNSPEC_VOLATILE, will
combine coalesce them into one in the combined pattern?

I think you can put a different constant on each.


I was thinking of it the other way around. Two instructions put together 
would still only want to "use" EXEC once. Of course, it would only work 
when the required EXEC value is the same, or in the same pseudoreg.


Andrew


Re: [PATCH 2/3] [ARC] Do not emit ZOL in the presence of text jump tables.

2018-11-13 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-11-12 13:25:12 +0200]:

> Avoid emitting lp instruction when in its ZOL body we find a jump table data
> in text section.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (hwloop_optimize): Bailout when detecting a
>   jump table data in the text section.

This looks good from a code point of view, but there's no explanation
for _why_ this change was made might be worth mentioning.

Thanks,
Andrew



> ---
>  gcc/config/arc/arc.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index a92456b457d..9eab4c27284 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -7791,7 +7791,17 @@ hwloop_optimize (hwloop_info loop)
>for (insn = loop->start_label;
> insn && insn != loop->loop_end;
> insn = NEXT_INSN (insn))
> -length += NONDEBUG_INSN_P (insn) ? get_attr_length (insn) : 0;
> +{
> +  length += NONDEBUG_INSN_P (insn) ? get_attr_length (insn) : 0;
> +  if (JUMP_TABLES_IN_TEXT_SECTION
> +   && JUMP_TABLE_DATA_P (insn))
> + {
> +   if (dump_file)
> + fprintf (dump_file, ";; loop %d has a jump table\n",
> +  loop->loop_no);
> +   return false;
> + }
> +}
>  
>if (!insn)
>  {
> -- 
> 2.19.1
> 


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 10:23:12AM +, Andrew Stubbs wrote:
> Now that Segher says the combine issue is a bug,

Well, first show what really happens; if it really deletes a USE, that
is a bug yes.  rtl.def says:

/* Indicate something is used in a way that we don't want to explain.
   For example, subroutine calls will use the register
   in which the static chain is passed.

   USE can not appear as an operand of other rtx except for PARALLEL.
   USE is not deletable, as it indicates that the operand
   is used in some unknown way.  */
DEF_RTL_EXPR(USE, "use", "e", RTX_EXTRA)


> I don't suppose I can easily reproduce it on another architecture, so 
> it'll have to wait until GCN is committed.

Just show the -fdump-rtl-combine-all output?

> It's also possible that the issue has ceased to exist since GCC 7.

Yes, please mention the version in the PR, if it's not trunk :-)


Segher


Re: [PATCH] Fortran include line fixes and -fdec-include support

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 10:06:39AM +, Mark Eggleston wrote:
> I've applied this patch and tried it out.
> 
> The following fail to compile:

That is IMHO correct, it fails to compile with ifort as well:
https://fortran.godbolt.org/z/Aav6dv

The problem is in mixing quotes, " vs. '
If I fix that, it works (changed the testcases to use the include file
I had in the patch):

! { dg-do compile }
! { dg-options "-fdec" }
subroutine quux
  implicit none
  include &
  &'include_10.inc'
  i = 1
end subroutine quux
subroutine quuz
  implicit none
  include &
  &"include_10.inc"
  i = 1
end subroutine quuz

c { dg-do compile }
c { dg-options "-fdec-include" }
  subroutine quuz
  implicit none
  integer include
  include
 +"include_10.inc"
  i = 1
  include
 + = 2
  write (*,*) include
  end subroutine quuz
  subroutine corge
  implicit none
  include
 +'include_10.inc'
  i = 1
  end subroutine corge

both compile.  I can include these snippets into the testcases.

> Is there any particular reason that include line continuation is isolated in
> -fdec-include? Is there a specific restriction regarding include lines in
> the Fortran standard such that continuations in free form and fixed form
> can't be used with include lines?

My reading of the Fortran standard says it is invalid, but I am not a good
Fortran language lawyer, so I'm open to be convinced otherwise.

The two bullets that I think say it are:
"An INCLUDE line is not a Fortran statement."
and
"An INCLUDE line shall appear on a single source line where a statement may 
appear; it shall be
the only nonblank text on this line other than an optional trailing comment. 
Thus, a statement
label is not allowed."

If there are continuations, it is not a single source line anymore, I don't
see anywhere a word that all the continuations together with the initial
source line are considered a single source line for the purposes of the rest
of the standard.

And in the DEC manual, it explicitly says that INCLUDE is a statement and
that it can be continued.

Jakub


Re: [PATCH 3/3] [ARC] Add support for profiling in glibc.

2018-11-13 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-11-12 13:25:13 +0200]:

> Use PROFILE_HOOK to add mcount library calls in each toolchain.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.h (FUNCTION_PROFILER): Redefine to empty.
>   * config/arc/elf.h (PROFILE_HOOK): Define.
>   * config/arc/linux.h (PROFILE_HOOK): Likewise.

Looks good.

Thanks,
Andrew

> ---
>  gcc/config/arc/arc.h   | 12 +++-
>  gcc/config/arc/elf.h   |  9 +
>  gcc/config/arc/linux.h | 10 ++
>  3 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index a0a84900917..f75c273691c 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -775,15 +775,9 @@ extern int arc_initial_elimination_offset(int from, int 
> to);
>  #define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET)\
>(OFFSET) = arc_initial_elimination_offset ((FROM), (TO))
>  
> -/* Output assembler code to FILE to increment profiler label # LABELNO
> -   for profiling a function entry.  */
> -#define FUNCTION_PROFILER(FILE, LABELNO) \
> -  do {   \
> -  if (flag_pic)  \
> -fprintf (FILE, "\tbl\t__mcount@plt\n");  \
> -  else   \
> -fprintf (FILE, "\tbl\t__mcount\n");  \
> -  } while (0)
> +/* All the work done in PROFILE_HOOK, but still required.  */
> +#undef FUNCTION_PROFILER
> +#define FUNCTION_PROFILER(STREAM, LABELNO) do { } while (0)
>  
>  #define NO_PROFILE_COUNTERS  1
>  
> diff --git a/gcc/config/arc/elf.h b/gcc/config/arc/elf.h
> index 3472fd2e418..3aabcf8c9e6 100644
> --- a/gcc/config/arc/elf.h
> +++ b/gcc/config/arc/elf.h
> @@ -78,3 +78,12 @@ along with GCC; see the file COPYING3.  If not see
>  #undef LINK_GCC_C_SEQUENCE_SPEC
>  #define LINK_GCC_C_SEQUENCE_SPEC \
>"--start-group %G %{!specs=*:%{!nolibc:-lc -lnosys}} --end-group"
> +
> +/* Emit rtl for profiling.  Output assembler code to FILE
> +   to call "_mcount" for profiling a function entry.  */
> +#define PROFILE_HOOK(LABEL)  \
> +  {  \
> +rtx fun; \
> +fun = gen_rtx_SYMBOL_REF (Pmode, "__mcount");\
> +emit_library_call (fun, LCT_NORMAL, VOIDmode);   \
> +  }
> diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
> index 62ebe4de0fc..993f445d2a0 100644
> --- a/gcc/config/arc/linux.h
> +++ b/gcc/config/arc/linux.h
> @@ -123,3 +123,13 @@ along with GCC; see the file COPYING3.  If not see
>   : "=r" (_beg)   \
>   : "0" (_beg), "r" (_end), "r" (_xtr), "r" (_scno)); \
>  }
> +
> +/* Emit rtl for profiling.  Output assembler code to FILE
> +   to call "_mcount" for profiling a function entry.  */
> +#define PROFILE_HOOK(LABEL)  \
> +  {  \
> +   rtx fun, rt;  \
> +   rt = get_hard_reg_initial_val (Pmode, RETURN_ADDR_REGNUM);\
> +   fun = gen_rtx_SYMBOL_REF (Pmode, "_mcount");  \
> +   emit_library_call (fun, LCT_NORMAL, VOIDmode, rt, Pmode); \
> +  }
> -- 
> 2.19.1
> 


Re: [PATCH 2/6] [RS6000] rs6000_output_indirect_call

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 10:44:24AM +1030, Alan Modra wrote:
> On Mon, Nov 12, 2018 at 01:44:08PM -0600, Bill Schmidt wrote:
> > On 11/6/18 11:37 PM, Alan Modra wrote:
> > > +  fun, "l" + sibcall);
> > 
> > It's not at all clear to me what {"l" + sibcall} is doing here.
> 
> It's an ancient C programmer's trick, from the days when most
> compilers didn't optimize too well.  I think I found it first in the
> nethack sources.  :-)
> 
> > Whatever it is, it's clever enough that it warrants a comment... :-)
> > Does adding "l" to false result in the null string?  Is that
> > standard?
> 
> "l" results in a "const char*" pointing at 0x6c,0 bytes in memory
> (assuming ascii).  Adding "true" to that implicitly converts "true" to
> 1 and thus a "const char*" pointing at a NUL byte.  All completely
> standard, even in that new fangled C++ thingy.
> 
> A comment is as much needed as count++; needs /* add one to count. */.
> If it bothers people I'll write:  sibcall ? "" : "l".

That is nicer, esp. if you can propagate it a bit, simplify the separate
cases.

> Hah, even the latest gcc doesn't optimize the conditional expression
> down to "l" + sibcall.  Check out the code generated for
> 
> const char *
> f1 (bool x)
> {
>   return "a" + x;
> }
> 
> const char *
> f2 (bool x)
> {
>   return x ? "" : "b";
> }

It also doesn't do the opposite: it doesn't optimise

const char *f(_Bool x) { fputs("x" + x, stdout); }

(if x is true this does nothing).

> Segher, would you like me to repost the series with accumulated fixes,
> I mean before you review the rest of the series?

Yes please.


Segher


Re: [PATCH 1/4] [aarch64/arm] Updating the cost table for xgene1.

2018-11-13 Thread Richard Earnshaw (lists)
On 12/11/2018 19:14, Christoph Muellner wrote:
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  
> 
>   * config/arm/aarch-cost-tables.h (xgene1_extra_costs): Update the cost 
> table
>   for Xgene1.

OK.

R.

> ---
>  gcc/config/arm/aarch-cost-tables.h | 88 
> +++---
>  1 file changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/gcc/config/arm/aarch-cost-tables.h 
> b/gcc/config/arm/aarch-cost-tables.h
> index 0bd93ba..2a28347 100644
> --- a/gcc/config/arm/aarch-cost-tables.h
> +++ b/gcc/config/arm/aarch-cost-tables.h
> @@ -440,26 +440,26 @@ const struct cpu_cost_table xgene1_extra_costs =
>{
>  0, /* arith.  */
>  0, /* logical.  */
> -0, /* shift.  */
> +COSTS_N_INSNS (1), /* shift.  */
>  COSTS_N_INSNS (1), /* shift_reg.  */
> -COSTS_N_INSNS (1), /* arith_shift.  */
> -COSTS_N_INSNS (1), /* arith_shift_reg.  */
> -COSTS_N_INSNS (1), /* log_shift.  */
> -COSTS_N_INSNS (1), /* log_shift_reg.  */
> -COSTS_N_INSNS (1), /* extend.  */
> -0, /* extend_arithm.  */
> -COSTS_N_INSNS (1), /* bfi.  */
> -COSTS_N_INSNS (1), /* bfx.  */
> +COSTS_N_INSNS (2), /* arith_shift.  */
> +COSTS_N_INSNS (2), /* arith_shift_reg.  */
> +COSTS_N_INSNS (2), /* log_shift.  */
> +COSTS_N_INSNS (2), /* log_shift_reg.  */
> +0, /* extend.  */
> +COSTS_N_INSNS (1), /* extend_arithm.  */
> +0, /* bfi.  */
> +0, /* bfx.  */
>  0, /* clz.  */
> -COSTS_N_INSNS (1), /* rev.  */
> +0, /* rev.  */
>  0, /* non_exec.  */
>  true   /* non_exec_costs_exec.  */
>},
>{
>  /* MULT SImode */
>  {
> -  COSTS_N_INSNS (4),   /* simple.  */
> -  COSTS_N_INSNS (4),   /* flag_setting.  */
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  COSTS_N_INSNS (3),   /* flag_setting.  */
>COSTS_N_INSNS (4),   /* extend.  */
>COSTS_N_INSNS (4),   /* add.  */
>COSTS_N_INSNS (4),   /* extend_add.  */
> @@ -467,8 +467,8 @@ const struct cpu_cost_table xgene1_extra_costs =
>  },
>  /* MULT DImode */
>  {
> -  COSTS_N_INSNS (5),   /* simple.  */
> -  0,   /* flag_setting (N/A).  */
> +  COSTS_N_INSNS (4),   /* simple.  */
> +  COSTS_N_INSNS (4),   /* flag_setting (N/A).  */
>COSTS_N_INSNS (5),   /* extend.  */
>COSTS_N_INSNS (5),   /* add.  */
>COSTS_N_INSNS (5),   /* extend_add.  */
> @@ -477,55 +477,55 @@ const struct cpu_cost_table xgene1_extra_costs =
>},
>/* LD/ST */
>{
> -COSTS_N_INSNS (5), /* load.  */
> -COSTS_N_INSNS (6), /* load_sign_extend.  */
> -COSTS_N_INSNS (5), /* ldrd.  */
> +COSTS_N_INSNS (4), /* load.  */
> +COSTS_N_INSNS (5), /* load_sign_extend.  */
> +COSTS_N_INSNS (4), /* ldrd.  */
>  COSTS_N_INSNS (5), /* ldm_1st.  */
>  1, /* ldm_regs_per_insn_1st.  */
>  1, /* ldm_regs_per_insn_subsequent.  */
> -COSTS_N_INSNS (10),/* loadf.  */
> -COSTS_N_INSNS (10),/* loadd.  */
> -COSTS_N_INSNS (5), /* load_unaligned.  */
> +COSTS_N_INSNS (9), /* loadf.  */
> +COSTS_N_INSNS (9), /* loadd.  */
> +0, /* load_unaligned.  */
>  0, /* store.  */
>  0, /* strd.  */
>  0, /* stm_1st.  */
>  1, /* stm_regs_per_insn_1st.  */
>  1, /* stm_regs_per_insn_subsequent.  */
> -0, /* storef.  */
> -0, /* stored.  */
> +COSTS_N_INSNS (3), /* storef.  */
> +COSTS_N_INSNS (3), /* stored.  */
>  0, /* store_unaligned.  */
> -COSTS_N_INSNS (1), /* loadv.  */
> -COSTS_N_INSNS (1)  /* storev.  */
> +COSTS_N_INSNS (9), /* loadv.  */
> +COSTS_N_INSNS (3)  /* storev.  */
>},
>{
>  /* FP SFmode */
>  {
> -  COSTS_N_INSNS (23),  /* div.  */
> -  COSTS_N_INSNS (5),   /* mult.  */
> -  COSTS_N_INSNS (5),   /* mult_addsub. */
> -  COSTS_N_INSNS (5),   /* fma.  */
> -  COSTS_N_INSNS (5),   /* addsub.  */
> -  COSTS_N_INSNS (2),   /* fpconst. */
> -  COSTS_N_INSNS (3),   /* neg.  */
> -  COSTS_N_INSNS (2),   /* compare.  */
> -  COSTS_N_INSNS (6),   /* widen.  */
> -  COSTS_N_INSNS (6),   /* narrow.  */
> +  COSTS_N_INSNS (22),  /* div.  */
> +  COSTS_N_INSNS (4),   /* mult.  */
> +  COSTS_N_INSNS (4),   /* mult_addsub. */
> +  COSTS_N_

Re: [PATCH 2/4] [aarch64] Update xgene1_addrcost_table.

2018-11-13 Thread Richard Earnshaw (lists)
On 12/11/2018 19:14, Christoph Muellner wrote:
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  
> 
>   * config/aarch64/aarch64.c (xgene1_addrcost_table): Correct the post 
> modify
>   costs.

OK.

R.

> ---
>  gcc/config/aarch64/aarch64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 815f824..a6bc1fb 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -254,7 +254,7 @@ static const struct cpu_addrcost_table 
> xgene1_addrcost_table =
>1, /* ti  */
>  },
>1, /* pre_modify  */
> -  0, /* post_modify  */
> +  1, /* post_modify  */
>0, /* register_offset  */
>1, /* register_sextend  */
>1, /* register_zextend  */
> 



Re: [PATCH 3/4] [aarch64] Add xgene1 prefetch tunings.

2018-11-13 Thread Richard Earnshaw (lists)
On 12/11/2018 19:14, Christoph Muellner wrote:
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  
> 
>   * config/aarch64/aarch64.c (xgene1_tunings): Add Xgene1 specific
>   prefetch tunings.

OK.

R.

> ---
>  gcc/config/aarch64/aarch64.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a6bc1fb..903f4e2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -662,6 +662,17 @@ static const cpu_prefetch_tune tsv110_prefetch_tune =
>-1/* default_opt_level  */
>  };
>  
> +static const cpu_prefetch_tune xgene1_prefetch_tune =
> +{
> +  8, /* num_slots  */
> +  32,/* l1_cache_size  */
> +  64,/* l1_cache_line_size  */
> +  256,   /* l2_cache_size  */
> +  true, /* prefetch_dynamic_strides */
> +  -1,   /* minimum_stride */
> +  -1 /* default_opt_level  */
> +};
> +
>  static const struct tune_params generic_tunings =
>  {
>&cortexa57_extra_costs,
> @@ -943,7 +954,7 @@ static const struct tune_params xgene1_tunings =
>0, /* max_case_values.  */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags.  */
> -  &generic_prefetch_tune
> +  &xgene1_prefetch_tune
>  };
>  
>  static const struct tune_params qdf24xx_tunings =
> 



Re: [PATCH 4/4] [aarch64] Update xgene1 tuning struct.

2018-11-13 Thread Richard Earnshaw (lists)
On 12/11/2018 19:14, Christoph Muellner wrote:
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Christoph Muellner  
> 
>   * config/aarch64/aarch64.c (xgene1_tunings): Optimize Xgene1 tunings for
>   GCC 9.

OK.

R.

> ---
>  gcc/config/aarch64/aarch64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 903f4e2..f7f88a9 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -944,14 +944,14 @@ static const struct tune_params xgene1_tunings =
>4, /* issue_rate  */
>AARCH64_FUSE_NOTHING, /* fusible_ops  */
>"16",  /* function_align.  */
> -  "8",   /* jump_align.  */
> +  "16",  /* jump_align.  */
>"16",  /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
>1, /* vec_reassoc_width.  */
>2, /* min_div_recip_mul_sf.  */
>2, /* min_div_recip_mul_df.  */
> -  0, /* max_case_values.  */
> +  17,/* max_case_values.  */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags.  */
>&xgene1_prefetch_tune
> 



Re: [RS6000] Ignore "c", "l" and "h" for register preference

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 01:35:42PM +1030, Alan Modra wrote:
> This catches a few places where move insn patterns don't slightly
> disparage CTR, LR and VRSAVE regs.  Also fixes the doc for the rs6000
> h constraint, and removes an r->cl alternative covered by r->h.
> 
> Segher okayed a patch adding "*" like this patch a long time ago.
> Somehow I never committed it.  This one does a few more things as
> well, but I think it's sufficiently obvious to commit as such.
> Bootstrapped etc. powerpc64le-linux and committed rev 266044.

Thanks.

I'd like to move to "cl" instead of "h" though :-)

vrsave is not an allocatable register (it's a fixed register).  We do
not have to handle it in the general mov patterns at all, just needs a
separate pattern just for this.

>  (define_insn "*movcc_internal1"
>[(set (match_operand:CC 0 "nonimmediate_operand"
> - "=y,x,?y,y,r,r,r,r,r,*c*l,r,m")
> + "=y,x,?y,y,r,r,r,r, r,*c*l,r,m")
>   (match_operand:CC 1 "general_operand"
> - " y,r, r,O,x,y,r,I,h,   r,m,r"))]
> + " y,r, r,O,x,y,r,I,*h,   r,m,r"))]

Here you see that vrsave as dest already is not supported (and doesn't
have to be, it is never generated).

Ideally "h" does not exist anymore, and "c" and "l" are mentioned nowhere
in the normal mov patterns.


Segher


Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
> On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
> > On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
> > > 
> > > For people developing new code, it's the right way to go, and
> > > especially so for people working on gcc itself.  For people just
> > > wanting stuff to compile, not so much.  I fully expect a chorus of
> > > *MORON* or worse to come from the likes of the linux kernel rabble.
> > 
> > So, if you just want to hear people whine...
> 
> I'm happy to hear other points of view.  Ignore my hyperbole.
> 
> > On darwin, we (darwin, as a platform decision) like all instructions 
> > available from the assembler.
> 
> OK, fair enough.  Another option is to just disable -many when gcc is
> in development, like we enable checking.

That is a good plan for GCC 9 at least.


Segher


Re: [RS6000] Don't put large integer constants in TOC for -mcmodel=medium

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 01:53:20PM +1030, Alan Modra wrote:
> For -mcmodel=medium we can use toc-relative addressing to access
> constants placed in read-only data, which is better since they can be
> merged when in .rodata.cst8.
> 
> Bootstrapped etc. powerpc64le-linux.  OK?

Okay, thanks!


Segher


>   * config/rs6000/linux64.h (ASM_OUTPUT_SPECIAL_POOL_ENTRY_P): Exclude
>   integer constants when -mcmodel=medium.


Re: [RS6000] Remove unnecessary rtx_equal_p

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 02:16:09PM +1030, Alan Modra wrote:
> REGs are unique.  This patch recognizes that fact, speeding up rs6000
> gcc infinitesimally.  Bootstrapped etc. powerpc64le-linux.  OK?

Of course, fine for trunk.  Thanks!


Segher


>   * gcc/config/rs6000/rs6000.c (rs6000_legitimate_address_p): Replace
>   rtx_equal_p call for known REGs with pointer comparison.
>   (rs6000_secondary_reload_memory): Likewise.
>   (rs6000_secondary_reload_inner): Likewise.


[PATCH 1/6] [RS6000] rs6000_call_template for external call insn assembly output

2018-11-13 Thread Alan Modra
Version 2.

This is a first step in tidying rs6000 call patterns, in preparation
to support inline plt calls.

* config/rs6000/rs6000-protos.h (rs6000_call_template): Declare.
(rs6000_sibcall_template): Declare.
(macho_call_template): Rename from output_call.
* config/rs6000/rs6000.c (rs6000_call_template_1): New function.
(rs6000_call_template, rs6000_sibcall_template): Likewise.
(macho_call_template): Rename from output_call.
* config/rs6000/rs6000.md (tls_gd_aix, tls_gd_sysv),
(tls_gd_call_aix, tls_gd_call_sysv, tls_ld_aix, tls_ld_sysv),
(tls_ld_call_aix, tls_ld_call_sysv, call_nonlocal_sysv),
(call_nonlocal_sysv_secure, call_value_nonlocal_sysv),
(call_value_nonlocal_sysv_secure, call_nonlocal_aix),
(call_value_nonlocal_aix): Use rs6000_call_template and update
occurrences of output_call to macho_call_template.
(sibcall_nonlocal_sysv, sibcall_value_nonlocal_sysv, sibcall_aix),
(sibcall_value_aix): Use rs6000_sibcall_template.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index fb69019c47c..303ba7b91c3 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -111,6 +111,8 @@ extern int ccr_bit (rtx, int);
 extern void rs6000_output_function_entry (FILE *, const char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
+extern const char *rs6000_call_template (rtx *, unsigned int, const char *);
+extern const char *rs6000_sibcall_template (rtx *, unsigned int, const char *);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
@@ -228,7 +230,7 @@ extern void (*rs6000_target_modify_macros_ptr) (bool, 
HOST_WIDE_INT,
 extern void rs6000_d_target_versions (void);
 
 #if TARGET_MACHO
-char *output_call (rtx_insn *, rtx *, int, int);
+char *macho_call_template (rtx_insn *, rtx *, int, int);
 #endif
 
 #ifdef NO_DOLLAR_IN_LABEL
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 516e69724cc..6e84f5053c2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21372,6 +21372,50 @@ rs6000_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
   return default_assemble_integer (x, size, aligned_p);
 }
 
+/* Return a template string for assembly to emit when making an
+   external call.  FUNOP is the call mem argument operand number,
+   ARG is either NULL or a @TLSGD or @TLSLD __tls_get_addr argument
+   specifier.  */
+
+static const char *
+rs6000_call_template_1 (rtx *operands ATTRIBUTE_UNUSED, unsigned int funop,
+   bool sibcall, const char *arg)
+{
+  /* -Wformat-overflow workaround, without which gcc thinks that %u
+  might produce 10 digits.  */
+  gcc_assert (funop <= MAX_RECOG_OPERANDS);
+
+  /* The magic 32768 offset here corresponds to the offset of
+ r30 in .got2, as given by LCTOC1.  See sysv4.h:toc_section.  */
+  char z[11];
+  sprintf (z, "%%z%u%s", funop,
+  (DEFAULT_ABI == ABI_V4 && TARGET_SECURE_PLT && flag_pic == 2
+   ? "+32768" : ""));
+
+  static char str[32];  /* 4 spare */
+  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
+sprintf (str, "b%s %s%s%s", sibcall ? "" : "l", z, arg,
+sibcall ? "" : "\n\tnop");
+  else if (DEFAULT_ABI == ABI_V4)
+sprintf (str, "b%s %s%s%s", sibcall ? "" : "l", z, arg,
+flag_pic ? "@plt" : "");
+  else
+gcc_unreachable ();
+  return str;
+}
+
+const char *
+rs6000_call_template (rtx *operands, unsigned int funop, const char *arg)
+{
+  return rs6000_call_template_1 (operands, funop, false, arg);
+}
+
+const char *
+rs6000_sibcall_template (rtx *operands, unsigned int funop, const char *arg)
+{
+  return rs6000_call_template_1 (operands, funop, true, arg);
+}
+
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
VISIBILITY_TYPE.  */
@@ -32810,8 +32854,8 @@ get_prev_label (tree function_name)
CALL_DEST is the routine we are calling.  */
 
 char *
-output_call (rtx_insn *insn, rtx *operands, int dest_operand_number,
-int cookie_operand_number)
+macho_call_template (rtx_insn *insn, rtx *operands, int dest_operand_number,
+int cookie_operand_number)
 {
   static char buf[256];
   if (darwin_emit_branch_islands
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 65f5fa6e66b..db9cfe92c72 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9453,10 +9453,11 @@ (define_insn_and_split 
"tls_gd_aix"
   "HAVE_AS_TLS && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)"
 {
   if (TARGET_CMODEL != CMODEL_SMALL)
-return "addis %0,%1,%2@got@tlsgd@ha\;addi %0,%0,%2@got@tlsgd@l\;"
-  "bl %z3\;nop";
+output_

[PATCH 2/6] [RS6000] rs6000_indirect_call_template

2018-11-13 Thread Alan Modra
Version 2.

Like the last patch for external calls, now handle most assembly code
for indirect calls in one place.  The patch also merges some insns,
correcting some !rs6000_speculate_indirect_jumps cases branching to
LR, which don't require a speculation barrier.

* config/rs6000/rs6000-protos.h (rs6000_indirect_call_template),
(rs6000_indirect_sibcall_template): Declare.
* config/rs6000/rs6000.c (rs6000_indirect_call_template_1),
(rs6000_indirect_call_template, rs6000_indirect_sibcall_template):
New functions.
* config/rs6000/rs6000.md (call_indirect_nonlocal_sysv),
(call_value_indirect_nonlocal_sysv, sibcall_nonlocal_sysv),
(call_indirect_aix, call_value_indirect_aix): Use
rs6000_indirect_call_template and rs6000_indirect_sibcall_template.
call_indirect_elfv2, call_value_indirect_elfv2): Likewise, and
handle both speculation and non-speculation cases.
(call_indirect_aix_nospec, call_value_indirect_aix_nospec): Delete.
(call_indirect_elfv2_nospec, call_value_indirect_elfv2_nospec): Delete.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 303ba7b91c3..967f65e2d94 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -113,6 +113,8 @@ extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
 extern const char *rs6000_call_template (rtx *, unsigned int, const char *);
 extern const char *rs6000_sibcall_template (rtx *, unsigned int, const char *);
+extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
+extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6e84f5053c2..cd1ab95166e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21416,6 +21416,83 @@ rs6000_sibcall_template (rtx *operands, unsigned int 
funop, const char *arg)
   return rs6000_call_template_1 (operands, funop, true, arg);
 }
 
+/* As above, for indirect calls.  */
+
+static const char *
+rs6000_indirect_call_template_1 (rtx *operands, unsigned int funop,
+bool sibcall)
+{
+  /* -Wformat-overflow workaround, without which gcc thinks that %u
+  might produce 10 digits.  */
+  gcc_assert (funop <= MAX_RECOG_OPERANDS);
+
+  static char str[144];
+  const char *ptrload = TARGET_64BIT ? "d" : "wz";
+
+  /* We don't need the extra code to stop indirect call speculation if
+ calling via LR.  */
+  bool speculate = (TARGET_MACHO
+   || rs6000_speculate_indirect_jumps
+   || (REG_P (operands[funop])
+   && REGNO (operands[funop]) == LR_REGNO));
+
+  if (DEFAULT_ABI == ABI_AIX)
+{
+  if (speculate)
+   sprintf (str,
+"l%s 2,%%%u\n\t"
+"b%%T%ul\n\t"
+"l%s 2,%%%u(1)",
+ptrload, funop + 2, funop, ptrload, funop + 3);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"l%s 2,%%%u\n\t"
+"beq%%T%ul-\n\t"
+"l%s 2,%%%u(1)",
+ptrload, funop + 2, funop, ptrload, funop + 3);
+}
+  else if (DEFAULT_ABI == ABI_ELFv2)
+{
+  if (speculate)
+   sprintf (str,
+"b%%T%ul\n\t"
+"l%s 2,%%%u(1)",
+funop, ptrload, funop + 2);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"beq%%T%ul-\n\t"
+"l%s 2,%%%u(1)",
+funop, ptrload, funop + 2);
+}
+  else
+{
+  if (speculate)
+   sprintf (str,
+"b%%T%u%s",
+funop, sibcall ? "" : "l");
+  else
+   sprintf (str,
+"crset 2\n\t"
+"beq%%T%u%s-%s",
+funop, sibcall ? "" : "l", sibcall ? "\n\tb $" : "");
+}
+  return str;
+}
+
+const char *
+rs6000_indirect_call_template (rtx *operands, unsigned int funop)
+{
+  return rs6000_indirect_call_template_1 (operands, funop, false);
+}
+
+const char *
+rs6000_indirect_sibcall_template (rtx *operands, unsigned int funop)
+{
+  return rs6000_indirect_call_template_1 (operands, funop, true);
+}
+
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
VISIBILITY_TYPE.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index db9cfe92c72..fe904b1966b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10539,11 +10539,7 @@ (define_insn "*call_indirect_nonlocal_sysv"
   else if (INTVAL (operands[2]) & CALL_V4_CLEAR_FP_ARGS)
 output_asm_insn ("creqv 6,6,6", operands);
 
-  if (

[PATCH 3/6] [RS6000] Replace TLSmode with P, and correct tls call mems

2018-11-13 Thread Alan Modra
Version 2.

There is really no need to define a TLSmode mode iterator that is
identical (since !TARGET_64BIT == TARGET_32BIT) to the much used P
mode iterator.  It's nonsense to think we might ever want to support
32-bit TLS on 64-bit or vice versa!  The patch also fixes a minor
error in the call mems.  All other direct calls use (call (mem:SI ..)).

* config/rs6000/rs6000.md (TLSmode): Delete mode iterator.  Replace
with P throughout except for call mems which should use SI.
(tls_abi_suffix, tls_sysv_suffix, tls_insn_suffix): Delete mode
iterators.  Replace with bits, mode and ptrload respectively.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fe904b1966b..793a0a9d840 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9436,19 +9436,13 @@ (define_peephole2
 
 ;; TLS support.
 
-;; Mode attributes for different ABIs.
-(define_mode_iterator TLSmode [(SI "! TARGET_64BIT") (DI "TARGET_64BIT")])
-(define_mode_attr tls_abi_suffix [(SI "32") (DI "64")])
-(define_mode_attr tls_sysv_suffix [(SI "si") (DI "di")])
-(define_mode_attr tls_insn_suffix [(SI "wz") (DI "d")])
-
-(define_insn_and_split "tls_gd_aix"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-(call (mem:TLSmode (match_operand:TLSmode 3 "symbol_ref_operand" "s"))
+(define_insn_and_split "tls_gd_aix"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+(call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
  (match_operand 4 "" "g")))
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-   (match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-  UNSPEC_TLSGD)
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+ (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)"
 {
@@ -9461,28 +9455,28 @@ (define_insn_and_split 
"tls_gd_aix"
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
-   (unspec:TLSmode [(match_dup 1)
-(match_dup 2)]
-   UNSPEC_TLSGD))
+   (unspec:P [(match_dup 1)
+  (match_dup 2)]
+ UNSPEC_TLSGD))
(parallel [(set (match_dup 0)
-  (call (mem:TLSmode (match_dup 3))
-(match_dup 4)))
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGD)
+  (call (mem:SI (match_dup 3))
+(match_dup 4)))
+ (unspec:P [(match_dup 2)] UNSPEC_TLSGD)
  (clobber (reg:SI LR_REGNO))])]
   ""
   [(set_attr "type" "two")
(set (attr "length")
  (if_then_else (ne (symbol_ref "TARGET_CMODEL") (symbol_ref 
"CMODEL_SMALL"))
-  (const_int 16)
-  (const_int 12)))])
+  (const_int 16)
+  (const_int 12)))])
 
-(define_insn_and_split "tls_gd_sysv"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-(call (mem:TLSmode (match_operand:TLSmode 3 "symbol_ref_operand" "s"))
+(define_insn_and_split "tls_gd_sysv"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+(call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
  (match_operand 4 "" "g")))
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-   (match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-  UNSPEC_TLSGD)
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+ (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && DEFAULT_ABI == ABI_V4"
 {
@@ -9491,64 +9485,64 @@ (define_insn_and_split 
"tls_gd_sysv"
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
-   (unspec:TLSmode [(match_dup 1)
-(match_dup 2)]
-   UNSPEC_TLSGD))
+   (unspec:P [(match_dup 1)
+  (match_dup 2)]
+ UNSPEC_TLSGD))
(parallel [(set (match_dup 0)
-  (call (mem:TLSmode (match_dup 3))
-(match_dup 4)))
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGD)
+  (call (mem:SI (match_dup 3))
+(match_dup 4)))
+ (unspec:P [(match_dup 2)] UNSPEC_TLSGD)
  (clobber (reg:SI LR_REGNO))])]
   ""
   [(set_attr "type" "two")
(set_attr "length" "8")])
 
-(define_insn_and_split "*tls_gd"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-(match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-   UNSPEC_TLSGD))]
+(define_insn_and_split "*tls_gd"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+  (match_operand:P 2 "rs6000

[PATCH 4/6] [RS6000] Remove constraints on call rounded_stack_size_rtx arg

2018-11-13 Thread Alan Modra
Version 2.  (Same as before, here for completeness.)

This call arg is unused on rs6000.

* config/rs6000/darwin.md (call_indirect_nonlocal_darwin64),
(call_nonlocal_darwin64, call_value_indirect_nonlocal_darwin64),
(call_value_nonlocal_darwin64): Remove constraints from second call
arg, the rounded_stack_size_rtx arg.
* config/rs6000/rs6000.md (tls_gd_aix, tls_gd_sysv, tls_gd_call_aix),
(tls_gd_call_sysv, tls_ld_aix, tls_ld_sysv, tls_ld_call_aix),
(tls_ld_call_sysv, call_local32, call_local64, call_value_local32),
(call_value_local64, call_indirect_nonlocal_sysv),
(call_nonlocal_sysv, call_nonlocal_sysv_secure),
(call_value_indirect_nonlocal_sysv, call_value_nonlocal_sysv),
(call_value_nonlocal_sysv_secure, call_local_aix),
(call_value_local_aix, call_nonlocal_aix, call_value_nonlocal_aix),
(call_indirect_aix, call_value_indirect_aix, call_indirect_elfv2),
(call_value_indirect_elfv2, sibcall_local32, sibcall_local64),
(sibcall_value_local32, sibcall_value_local64, sibcall_aix),
(sibcall_value_aix): Likewise.

diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md
index 2d6d1ca57dd..a1c07702d6f 100644
--- a/gcc/config/rs6000/darwin.md
+++ b/gcc/config/rs6000/darwin.md
@@ -302,7 +302,7 @@ (define_insn "macho_correct_pic_di"
 
 (define_insn "*call_indirect_nonlocal_darwin64"
   [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l,c,*l"))
-(match_operand 1 "" "g,g,g,g"))
+(match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,O,n,n"))
(clobber (reg:SI LR_REGNO))]
   "DEFAULT_ABI == ABI_DARWIN && TARGET_64BIT"
@@ -314,7 +314,7 @@ (define_insn "*call_indirect_nonlocal_darwin64"
 
 (define_insn "*call_nonlocal_darwin64"
   [(call (mem:SI (match_operand:DI 0 "symbol_ref_operand" "s,s"))
-(match_operand 1 "" "g,g"))
+(match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,n"))
(clobber (reg:SI LR_REGNO))]
   "(DEFAULT_ABI == ABI_DARWIN)
@@ -332,7 +332,7 @@ (define_insn "*call_nonlocal_darwin64"
 (define_insn "*call_value_indirect_nonlocal_darwin64"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l,c,*l"))
- (match_operand 2 "" "g,g,g,g")))
+ (match_operand 2)))
(use (match_operand:SI 3 "immediate_operand" "O,O,n,n"))
(clobber (reg:SI LR_REGNO))]
   "DEFAULT_ABI == ABI_DARWIN"
@@ -345,7 +345,7 @@ (define_insn "*call_value_indirect_nonlocal_darwin64"
 (define_insn "*call_value_nonlocal_darwin64"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "symbol_ref_operand" "s,s"))
- (match_operand 2 "" "g,g")))
+ (match_operand 2)))
(use (match_operand:SI 3 "immediate_operand" "O,n"))
(clobber (reg:SI LR_REGNO))]
   "(DEFAULT_ABI == ABI_DARWIN)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 793a0a9d840..c261c8bb9c1 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9439,7 +9439,7 @@ (define_peephole2
 (define_insn_and_split "tls_gd_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
- (match_operand 4 "" "g")))
+ (match_operand 4)))
(unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
@@ -9473,7 +9473,7 @@ (define_insn_and_split "tls_gd_aix"
 (define_insn_and_split "tls_gd_sysv"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
- (match_operand 4 "" "g")))
+ (match_operand 4)))
(unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
@@ -9540,7 +9540,7 @@ (define_insn "*tls_gd_low"
 (define_insn "*tls_gd_call_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 1 "symbol_ref_operand" "s"))
- (match_operand 2 "" "g")))
+ (match_operand 2)))
(unspec:P [(match_operand:P 3 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
@@ -9555,7 +9555,7 @@ (define_insn "*tls_gd_call_aix"
 (define_insn "*tls_gd_call_sysv"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 1 "symbol_ref_operand" "s"))
- (match_operand 2 "" "g")))
+ (match_operand 2)))
(unspec:P [(match_operand:P 3 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
@@ -9568,7 +9568,7 @@ (define_insn "*tls_gd_call_sysv"
 (define_insn_and_split "tls_ld_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 2 "symbol_ref_operand" "s"))
- 

[PATCH 5/6] [RS6000] Use standard call patterns for __tls_get_addr calls

2018-11-13 Thread Alan Modra
Version 2.

The current code handling __tls_get_addr calls for powerpc*-linux
generates a call then overwrites the call insn with a special
tls_{gd,ld}_{aix,sysv} pattern.  It's done that way to support
!TARGET_TLS_MARKERS, where the arg setup insns need to be emitted
immediately before the branch and link.  When TARGET_TLS_MARKERS, the
arg setup insns are split from the actual call, but we then have a
non-standard call pattern that needs to be carried through to output.

This patch changes that scheme, to instead use the standard call
patterns for __tls_get_addr calls, except for the now rare
!TARGET_TLS_MARKERS case.  Doing it this way should be better for
maintenance as the !TARGET_TLS_MARKERS code can eventually disappear.
It also makes it possible to support longcalls (and in following
patches, inline plt calls) for __tls_get_addr without introducing yet
more special call patterns.

__tls_get_addr calls do however need to be different to standard
calls, because when TARGET_TLS_MARKERS the calls are decorated with an
argument specifier, eg. "bl __tls_get_addr(thread_var@tlsgd)" that
causes a reloc to be emitted by the assembler tying the call to its
arg setup insns.  I chose to smuggle the arg in the currently unused
stack size rtl.

I've also introduced rs6000_call_sysv to generate rtl for sysv calls,
as rs6000_call_aix does for aix and elfv2 calls.  This allows
rs6000_longcall_ref to be local to rs6000.c since the calls in the
expanders never did anything for darwin.

* config/rs6000/predicates.md (unspec_tls): New.
* config/rs6000/rs6000-protos.h (rs6000_call_template),
(rs6000_sibcall_template): Update prototype.
(rs6000_longcall_ref): Delete.
(rs6000_call_sysv): Declare.
* config/rs6000/rs6000.c (edit_tls_call_insn): New function.
(global_tlsarg): New variable.
(rs6000_legitimize_tls_address): Rewrite __tls_get_addr call
handling.
(print_operand): Extract UNSPEC_TLSGD address operand.
(rs6000_call_template, rs6000_sibcall_template): Remove arg
parameter, extract from second call operand instead.
(rs6000_longcall_ref): Make static, localize vars.
(rs6000_call_aix): Rename parameter to reflect new usage.  Take
tlsarg from global_tlsarg.  Don't create unused rtl or nop insns.
(rs6000_sibcall_aix): Rename parameter to reflect new usage.  Take
tlsarg from global_tlsarg.
(rs6000_call_sysv): New function.
* config/rs6000/rs6000.md: Adjust rs6000_call_template and
rs6000_sibcall_template throughout.
(tls_gd_aix, tls_gd_sysv, tls_gd_call_aix, tls_gd_call_sysv): Delete.
(tls_ld_aix, tls_ld_sysv, tls_ld_call_aix, tls_ld_call_sysv): Delete.
(tls_gdld_aix, tls_gdld_sysv): New insns, replacing above.
(tls_gd): Swap operand order.  Simplify mode selection.
(tls_gd_high, tls_gd_low): Swap operand order.
(tls_ld): Remove const_int 0 vector element from UNSPEC_TLSLD.
Simplify mode selection.
(tls_ld_high, tls_ld_low): Similarly adjust UNSPEC_TLSLD.
(call, call_value): Don't assert for second call operand.
Use rs6000_call_sysv.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index b80c278d742..7e45d2f0371 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1039,6 +1039,13 @@ (define_predicate "rs6000_tls_symbol_ref"
   (and (match_code "symbol_ref")
(match_test "RS6000_SYMBOL_REF_TLS_P (op)")))
 
+;; Return 1 for the UNSPEC used in TLS call operands
+(define_predicate "unspec_tls"
+  (match_code "unspec")
+{
+  return XINT (op, 1) == UNSPEC_TLSGD || XINT (op, 1) == UNSPEC_TLSLD;
+})
+
 ;; Return 1 if the operand, used inside a MEM, is a valid first argument
 ;; to CALL.  This is a SYMBOL_REF, a pseudo-register, LR or CTR.
 (define_predicate "call_operand"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 967f65e2d94..3fd89dc20db 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -111,8 +111,8 @@ extern int ccr_bit (rtx, int);
 extern void rs6000_output_function_entry (FILE *, const char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
-extern const char *rs6000_call_template (rtx *, unsigned int, const char *);
-extern const char *rs6000_sibcall_template (rtx *, unsigned int, const char *);
+extern const char *rs6000_call_template (rtx *, unsigned int);
+extern const char *rs6000_sibcall_template (rtx *, unsigned int);
 extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
 extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
@@ -136,7 +136,6 @@ extern void rs6000_expand_atomic_op (enum rtx_code, rtx, 
rtx, rtx, rtx, rtx);
 extern void rs6000_emit_swdiv (rtx, rtx, rtx, bool);
 e

[PATCH 6/6] [RS6000] inline plt call sequences

2018-11-13 Thread Alan Modra
Version 2.

Finally, the point of the previous patches in this series, support for
inline PLT calls, keyed off -fno-plt.  This emits code using new
relocations that tie all insns in the sequence together, so that the
linker can edit the sequence back to a direct call should the call
target turn out to be local.  An example of ELFv2 code to call puts is
as follows:

 .reloc .,R_PPC64_PLTSEQ,puts
std 2,24(1)
 .reloc .,R_PPC64_PLT16_HA,puts
addis 12,2,0
 .reloc .,R_PPC64_PLT16_LO_DS,puts
ld 12,0(12)
 .reloc .,R_PPC64_PLTSEQ,puts
mtctr 12
 .reloc .,R_PPC64_PLTCALL,puts
bctrl
ld 2,24(1)

"addis 12,2,puts@plt@ha" and "ld 12,puts@plt@l(12)" are also supported
by the assembler.  gcc instead uses the explicit R_PPC64_PLT16_HA and
R_PPC64_PLT16_LO_DS relocs because when the call is to __tls_get_addr
an extra reloc is emitted at every place where one is shown above, to
specify the __tls_get_addr arg.  The linker expects the extra reloc to
come first.  .reloc enforces that ordering.

The patch also changes code emitted for longcalls if the assembler
supports the new marker relocs, so that these too can be edited.  One
side effect of longcalls using PLT16 relocs is that they can now be
resolved lazily by ld.so.

I don't support lazy inline PLT calls for ELFv1, because ELFv1 would
need barriers to reliably load both the function address and toc
pointer from the PLT.  ELFv1 -fno-plt uses the longcall sequence
instead, which isn't edited by GNU ld.

* config.in (HAVE_AS_PLTSEQ): Add.
* config/rs6000/predicates.md (indirect_call_operand): New.
* config/rs6000/rs6000-protos.h (rs6000_pltseq_template),
(rs6000_sibcall_sysv): Declare.
* config/rs6000/rs6000.c (init_cumulative_args): Set cookie
CALL_LONG for -fno-plt.
(print_operand ): Handle UNSPEC_PLTSEQ.
(rs6000_indirect_call_template_1): Emit .reloc directives for
UNSPEC_PLTSEQ calls.
(rs6000_pltseq_template): New function.
(rs6000_longcall_ref): Add arg parameter.  Use PLT16 insns if
relocs supported by assembler.  Move SYMBOL_REF test to callers.
(rs6000_call_aix): Adjust rs6000_longcall_ref call.  Package
insns in UNSPEC_PLTSEQ, preserving original func_desc.
(rs6000_call_sysv): Likewise.
(rs6000_sibcall_sysv): New function.
* config/rs6000/rs6000.h (HAVE_AS_PLTSEQ): Provide default.
* config/rs6000/rs6000.md (UNSPEC_PLTSEQ, UNSPEC_PLT16_HA,
UNSPEC_PLT16_LO): New.
(pltseq_tocsave, pltseq_plt16_ha, pltseq_plt16_lo, pltseq_mtctr): New.
(call_indirect_nonlocal_sysv): Don't differentiate zero from non-zero
cookie in constraints.  Test explicitly for flags in length attr.
Handle unspec operand 1.
(call_value_indirect_nonlocal_sysv): Likewise.
(call_indirect_aix, call_value_indirect_aix): Handle unspec operand 1.
(call_indirect_elfv2, call_value_indirect_elfv2): Likewise.
(sibcall, sibcall_value): Use rs6000_sibcall_sysv.
(sibcall_indirect_nonlocal_sysv): New pattern.
(sibcall_value_indirect_nonlocal_sysv): Likewise.
(sibcall_nonlocal_sysv, sibcall_value_nonlocal_sysv): Remove indirect
call alternatives.
* configure.ac: Check for gas plt sequence marker support.
* configure: Regenerate.

diff --git a/gcc/config.in b/gcc/config.in
index 67a1e6cfc4c..86ff5e8636b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -577,6 +577,12 @@
 #endif
 
 
+/* Define if your assembler supports R_PPC*_PLTSEQ relocations. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_PLTSEQ
+#endif
+
+
 /* Define if your assembler supports .ref */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_REF
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 7e45d2f0371..1af01935b5e 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1055,6 +1055,24 @@ (define_predicate "call_operand"
  || REGNO (op) >= FIRST_PSEUDO_REGISTER")
  (match_code "symbol_ref")))
 
+;; Return 1 if the operand, used inside a MEM, is a valid first argument
+;; to an indirect CALL.  This is LR, CTR, or a PLTSEQ unspec using CTR.
+(define_predicate "indirect_call_operand"
+  (match_code "reg,unspec")
+{
+  if (REG_P (op))
+return (REGNO (op) == LR_REGNO
+   || REGNO (op) == CTR_REGNO);
+  if (GET_CODE (op) == UNSPEC)
+{
+  if (XINT (op, 1) != UNSPEC_PLTSEQ)
+   return false;
+  op = XVECEXP (op, 0, 0);
+  return REG_P (op) && REGNO (op) == CTR_REGNO;
+}
+  return false;
+})
+
 ;; Return 1 if the operand is a SYMBOL_REF for a function known to be in
 ;; this file.
 (define_predicate "current_file_function_operand"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3fd89dc20db..35209d4525d 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.

RE: [PATCH 0/3] [ARC] Glibc required patches

2018-11-13 Thread Claudiu Zissulescu
Thank you for quick review. All the patches are pushed with the suggested mods.
Claudiu

From: Claudiu Zissulescu [claz...@gmail.com]
Sent: Monday, November 12, 2018 12:25 PM
To: gcc-patches@gcc.gnu.org
Cc: francois.bed...@synopsys.com; andrew.burg...@embecosm.com; 
claudiu.zissule...@synopsys.com
Subject: [PATCH 0/3] [ARC] Glibc required patches

Hi Andrew,

The attached three patches are required to reduce/enable glibc
builds. Although not all of them are glibc related they are found when
porting this library to ARC.

OK to apply?
Claudiu

Claudiu Zissulescu (3):
  [ARC] Update EH code.
  [ARC] Do not emit ZOL in the presence of text jump tables.
  [ARC] Add support for profiling in glibc.

 gcc/config/arc/arc-protos.h   |  2 +-
 gcc/config/arc/arc.c  | 25 +--
 gcc/config/arc/arc.h  | 14 +++--
 gcc/config/arc/arc.md | 15 ++
 gcc/config/arc/elf.h  |  9 
 gcc/config/arc/linux.h| 10 +
 gcc/testsuite/gcc.target/arc/builtin_eh.c | 22 
 7 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/builtin_eh.c

--
2.19.1



[PATCH] Fix PR87962

2018-11-13 Thread Richard Biener


The following better detects invalid nested cycles in particular
those part of an outer reduction.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87962
* tree-vect-loop.c (vect_is_simple_reduction): More reliably
detect outer reduction for disqualifying in-loop uses.

* gcc.dg/pr87962.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2807,11 +2807,11 @@ vect_is_simple_reduction (loop_vec_info
   gphi *phi = as_a  (phi_info->stmt);
   struct loop *loop = (gimple_bb (phi))->loop_father;
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
+  bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop);
   gimple *phi_use_stmt = NULL;
   enum tree_code orig_code, code;
   tree op1, op2, op3 = NULL_TREE, op4 = NULL_TREE;
   tree type;
-  int nloop_uses;
   tree name;
   imm_use_iterator imm_iter;
   use_operand_p use_p;
@@ -2827,7 +2827,7 @@ vect_is_simple_reduction (loop_vec_info
  can be constant.  See PR60382.  */
   if (has_zero_uses (phi_name))
 return NULL;
-  nloop_uses = 0;
+  unsigned nphi_def_loop_uses = 0;
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, phi_name)
 {
   gimple *use_stmt = USE_STMT (use_p);
@@ -2843,20 +2843,7 @@ vect_is_simple_reduction (loop_vec_info
   return NULL;
 }
 
-  /* For inner loop reductions in nested vectorization there are no
- constraints on the number of uses in the inner loop.  */
-  if (loop == vect_loop->inner)
-   continue;
-
-  nloop_uses++;
-  if (nloop_uses > 1)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"reduction value used in loop.\n");
-  return NULL;
-}
-
+  nphi_def_loop_uses++;
   phi_use_stmt = use_stmt;
 }
 
@@ -2894,26 +2881,32 @@ vect_is_simple_reduction (loop_vec_info
   return NULL;
 }
 
-  nloop_uses = 0;
+  unsigned nlatch_def_loop_uses = 0;
   auto_vec lcphis;
+  bool inner_loop_of_double_reduc = false;
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
 {
   gimple *use_stmt = USE_STMT (use_p);
   if (is_gimple_debug (use_stmt))
continue;
   if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
-   nloop_uses++;
+   nlatch_def_loop_uses++;
   else
-   /* We can have more than one loop-closed PHI.  */
-   lcphis.safe_push (as_a  (use_stmt));
+   {
+ /* We can have more than one loop-closed PHI.  */
+ lcphis.safe_push (as_a  (use_stmt));
+ if (nested_in_vect_loop
+ && (STMT_VINFO_DEF_TYPE (loop_info->lookup_stmt (use_stmt))
+ == vect_double_reduction_def))
+   inner_loop_of_double_reduc = true;
+   }
 }
 
   /* If this isn't a nested cycle or if the nested cycle reduction value
  is used ouside of the inner loop we cannot handle uses of the reduction
  value.  */
-  bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop);
-  if ((!nested_in_vect_loop || !lcphis.is_empty ())
-  && nloop_uses > 1)
+  if ((!nested_in_vect_loop || inner_loop_of_double_reduc)
+  && (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
Index: gcc/testsuite/gcc.dg/pr87962.c
===
--- gcc/testsuite/gcc.dg/pr87962.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr87962.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-march=bdver2" { target { x86_64-*-* i?86-*-* } } 
} */
+
+int a, b;
+
+int c()
+{
+  long d, e;
+  while (a) {
+  a++;
+  b = 0;
+  for (; b++ - 2; d = d >> 1)
+   e += d;
+  }
+  return e;
+}


[PATCH] Fix PR87967

2018-11-13 Thread Richard Biener


A simple omission...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87967
* tree-vect-loop.c (vect_transform_loop): Also copy PHIs
for constants for the scalar loop.

* g++.dg/opt/pr87967.C: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -8264,7 +8257,7 @@ vect_transform_loop (loop_vec_info loop_
   e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
   if (! single_pred_p (e->dest))
{
- split_loop_exit_edge (e);
+ split_loop_exit_edge (e, true);
  if (dump_enabled_p ())
dump_printf (MSG_NOTE, "split exit edge of scalar loop\n");
}
Index: gcc/testsuite/g++.dg/opt/pr87967.C
===
--- gcc/testsuite/g++.dg/opt/pr87967.C  (nonexistent)
+++ gcc/testsuite/g++.dg/opt/pr87967.C  (working copy)
@@ -0,0 +1,50 @@
+// { dg-do compile }
+// { dg-options "-O3" }
+
+void h();
+template  struct k { using d = b; };
+template  class> using e = k;
+template  class f>
+using g = typename e::d;
+struct l {
+  template  using ab = typename i::j;
+};
+struct n : l {
+  using j = g;
+};
+class o {
+public:
+  long r();
+};
+char m;
+char s() {
+  if (m)
+return '0';
+  return 'A';
+}
+class t {
+public:
+  typedef char *ad;
+  ad m_fn2();
+};
+void fn3() {
+  char *a;
+  t b;
+  bool p = false;
+  while (*a) {
+h();
+o c;
+if (*a)
+  a++;
+if (c.r()) {
+  n::j q;
+  for (t::ad d = b.m_fn2(), e; d != e; d++) {
+char f = *q;
+*d = f + s();
+  }
+  p = true;
+}
+  }
+  if (p)
+throw;
+}


[PATCH] Fix PR87931

2018-11-13 Thread Richard Biener


We need to restrict what we handle as last operation in a nested
cycle because vectorizable_reduction performs the code-generation
in the end.

Boostrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87931
* tree-vect-loop.c (vect_is_simple_reduction): Restrict
nested cycles we support to latch computations vectorizable_reduction
handles.

* gcc.dg/graphite/pr87931.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2983,6 +2976,22 @@ vect_is_simple_reduction (loop_vec_info
 
   if (nested_in_vect_loop && !check_reduction)
 {
+  /* FIXME: Even for non-reductions code generation is funneled
+through vectorizable_reduction for the stmt defining the
+PHI latch value.  So we have to artificially restrict ourselves
+for the supported operations.  */
+  switch (get_gimple_rhs_class (code))
+   {
+   case GIMPLE_BINARY_RHS:
+   case GIMPLE_TERNARY_RHS:
+ break;
+   default:
+ /* Not supported by vectorizable_reduction.  */
+ if (dump_enabled_p ())
+   report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+   "nested cycle: not handled operation: ");
+ return NULL;
+   }
   if (dump_enabled_p ())
report_vect_op (MSG_NOTE, def_stmt, "detected nested cycle: ");
   return def_stmt_info;
Index: gcc/testsuite/gcc.dg/graphite/pr87931.c
===
--- gcc/testsuite/gcc.dg/graphite/pr87931.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr87931.c (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-copy-prop -fgraphite-identity" } */
+
+#define N 40
+#define M 128
+float in[N+M];
+float coeff[M];
+float fir_out[N];
+
+void fir ()
+{
+  int i,j,k;
+  float diff;
+
+  for (i = 0; i < N; i++) {
+diff = 0;
+for (j = 0; j < M; j++) {
+  diff += in[j+i]*coeff[j];
+}
+fir_out[i] = diff;
+  }
+}


[PATCH] Fix PR87974

2018-11-13 Thread Richard Biener


Do not look at constant or external defs in reduction stmts to
determine the reduction PHI vector type.  Those are promoted/demoted
as required.

This is another fragile area, I'll poke around a bit but nevertheless,
bootstrap & regtest queued.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87974
* tree-vect-loop.c (vectorizable_reduction): When computing
the vectorized reduction PHI vector type ignore constant
and external defs.

* g++.dg/opt/pr87974.C: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -6061,13 +6070,17 @@ vectorizable_reduction (stmt_vec_info st
return true;
 
   gassign *reduc_stmt = as_a  (reduc_stmt_info->stmt);
+  code = gimple_assign_rhs_code (reduc_stmt);
   for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k)
{
  tree op = gimple_op (reduc_stmt, k);
  if (op == phi_result)
continue;
- if (k == 1
- && gimple_assign_rhs_code (reduc_stmt) == COND_EXPR)
+ if (k == 1 && code == COND_EXPR)
+   continue;
+ bool is_simple_use = vect_is_simple_use (op, loop_vinfo, &dt);
+ gcc_assert (is_simple_use);
+ if (dt == vect_constant_def || dt == vect_external_def)
continue;
  if (!vectype_in
  || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
Index: gcc/testsuite/g++.dg/opt/pr87974.C
===
--- gcc/testsuite/g++.dg/opt/pr87974.C  (nonexistent)
+++ gcc/testsuite/g++.dg/opt/pr87974.C  (working copy)
@@ -0,0 +1,33 @@
+// { dg-do compile }
+// { dg-options "-O3" }
+
+struct h {
+typedef int &c;
+};
+class i {
+struct j {
+   using c = int *;
+};
+using as = j::c;
+};
+template  class k {
+public:
+using as = i::as;
+h::c operator[](long l) {
+   k::as d = 0;
+   return d[l];
+}
+};
+class : public k { } a;
+long c, f;
+void m()
+{
+  for (long b; b <= 6; b++)
+for (long g; g < b; g++) {
+   unsigned long e = g;
+   c = 0;
+   for (; c < b; c++)
+ f = e >>= 1;
+   a[g] = f;
+}
+}


Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Alexandre Oliva
On Nov 13, 2018, Richard Biener  wrote:

>> Please let me know if there are objections to this change in the next
>> few days, e.g., if enabling C and C++ for an Ada-only build is too
>> onerous.  It is certainly possible to rework gnattools build machinery
>> so that it uses CC and CXX as detected by the top-level configure if we
>> can't find xgcc and xg++ in ../gcc.

> I really wonder why we not _always_ do this for consistency given we
> already require a host Ada compiler.

Sorry, I can't tell what the 'this' refers to.  Enabling C and C++ for
an Ada-only build?  Reworking gnattools build machinery to use top-level
CC and CXX?  Something else?

FWIW, I see the the point of using the just-built gcc/g++ if it's there
and usable: considering the checks for different versions of Ada
compilers, you really want to use the last stage of the bootstrap to
build tools linked with the runtime built with it.  It seems to me you'd
run into a catch-22 without that.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Aldy Hernandez

On 11/13/18 3:07 AM, Richard Biener wrote:

On Tue, 13 Nov 2018, Aldy Hernandez wrote:




The tricky part starts in the prologue for

if (vr0->undefined_p ())
  {
vr0->deep_copy (vr1);
return;
  }

but yes, we probably can factor out a bit more common code
here.  I'll see to followup with more minor cleanups this
week (noticed a few details myself).


Like this?  (untested)


I would inline value_range_base::union_helper into value_range_base::union_,
and remove all the undefined/varying/etc stuff from value_range::union_.

If should work because upon return from value_range_base::union_, in the
this->undefined_p case, the base class will copy everything but the
equivalences.  Then the derived union_ only has to nuke the equivalences if
this->undefined or this->varying, and the equivalences' IOR just works.

For instance, in the case where other->undefined_p, there shouldn't be
anything in the equivalences so the IOR won't copy anything to this as
expected.  Similarly for this->varying_p.

In the case of other->varying, this will already been set to varying so
neither this nor other should have anything in their equivalence fields, so
the IOR won't do anything.

I think I covered all of them...the bitmap math should just work.  What do you
think?


I think the only case that will not work is the case when this->undefined
(when we need the deep copy).  Because we'll not get the bitmap from
other in that case.  So I've settled with the thing below (just
special-casing that very case)


Ah, good point.




Finally, as I've hinted before, I think we need to be careful that any time we
change state to VARYING / UNDEFINED from a base method, that the derived class
is in a sane state (there are no equivalences set per the API contract).  This
was not originally enforced in VRP, and I wouldn't be surprised if there are
dragons if we enforce honesty.  I suppose, since we have an API, we could
enforce this lazily: any time equiv() is called, clear the equivalences or
return NULL if it's varying or undefined?  Just a thought.


I have updated ->update () to adjust equiv when we update to VARYING
or UNDEFINED.


Excellent idea.  I don't see that part in your patch though?



+/* Helper for meet operation for value ranges.  Given two value ranges VR0 and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{


I know this was my fault, but would you mind removing vr0 from 
union_helper?  Perhaps something like this:


value_range_base::union_helper (const value_range_base *other)

I think it'll be cleaner and more consistent this way.

Thanks.
Aldy


[PATCH][libbacktrace] Handle DW_FORM_GNU_strp_alt

2018-11-13 Thread Tom de Vries
Hi,

The dwz tool attempts to optimize DWARF debugging information contained in ELF
shared libraries and ELF executables for size.

With the dwz -m option, it attempts to optimize by moving DWARF debugging
information entries (DIEs), strings and macro descriptions duplicated in
more than one object into a newly created ELF ET_REL object whose filename is
given as -m option argument.  The debug sections in the executables and
shared libraries specified on the command line are then modified again,
referring to the entities in the newly created object.

After a dwz invocation:
...
$ dwz -m c.debug a.out b.out
...
both a.out and b.out contain a .gnu_debugaltlink section referring to c.debug,
and use "DWZ DWARF multifile extensions" such as DW_FORM_GNU_strp_alt and
DW_FORM_GNU_ref_alt refer to the content of c.debug.

The .gnu_debugaltlink consists of a filename and the expected buildid.

This patch adds to libbacktrace:
- finding a file matching the .gnu_debugaltlink filename
- verifying the .gnu_debugaltlink buildid
- reading the dwarf of the .gnu_debugaltlink
- support for FORM_GNU_strp_alt
- a testcase btest_dwz.c, which is btest.c minimized to the point that it only
  requires FORM_GNU_strp_alt.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[libbacktrace] Handle DW_FORM_GNU_strp_alt

2018-11-11  Tom de Vries  

* dwarf.c (struct dwarf_data): Add altlink field.
(read_attribute): Add altlink parameter.  Handle DW_FORM_GNU_strp_alt
using altlink.
(find_address_ranges, build_address_map, build_dwarf_data): Add and
handle altlink parameter.
(read_referenced_name, read_function_entry): Add argument to
read_attribute call.
(backtrace_dwarf_add): Add and handle fileline_entry and
fileline_altlink parameters.
* elf.c (elf_open_debugfile_by_debugaltlink): New function.
(elf_add): Add and handle fileline_entry, with_buildid_data and
with_buildid_size parameters.  Handle .gnu_debugaltlink section.
(phdr_callback, backtrace_initialize): Add arguments to elf_add calls.
* internal.h (backtrace_dwarf_add): Add fileline_entry and
fileline_altlink parameters.
* configure.ac (DWZ): Set with AC_CHECK_PROG.
(HAVE_DWZ): Set with AM_CONDITIONAL.
* configure: Regenerate.
* Makefile.am (check_PROGRAMS): Add btest_dwz.
(TESTS): Add btest_dwz_2 and btest_dwz_3.
* Makefile.in: Regenerate.
* btest_dwz.c: New file.

---
 libbacktrace/Makefile.am  |  22 +
 libbacktrace/Makefile.in  |  95 ---
 libbacktrace/btest_dwz.c  | 237 ++
 libbacktrace/configure|  57 ++-
 libbacktrace/configure.ac |   3 +
 libbacktrace/dwarf.c  |  50 +++---
 libbacktrace/elf.c| 120 +--
 libbacktrace/internal.h   |   4 +-
 8 files changed, 548 insertions(+), 40 deletions(-)

diff --git a/libbacktrace/Makefile.am b/libbacktrace/Makefile.am
index 3c1bd49dd7b..2fec9bbb4b6 100644
--- a/libbacktrace/Makefile.am
+++ b/libbacktrace/Makefile.am
@@ -96,6 +96,28 @@ btest_LDADD = libbacktrace.la
 
 check_PROGRAMS += btest
 
+if HAVE_DWZ
+
+btest_dwz_SOURCES = btest_dwz.c testlib.c
+btest_dwz_CFLAGS = $(AM_CFLAGS) -g -O0
+btest_dwz_LDADD = libbacktrace.la
+
+check_PROGRAMS += btest_dwz
+
+TESTS += btest_dwz_2 btest_dwz_3
+
+btest_dwz_2 btest_dwz_3: btest_dwz_23
+
+.PHONY: btest_dwz_23
+
+btest_dwz_23: btest_dwz
+   rm -f btest_dwz.debug
+   cp btest_dwz btest_dwz_2
+   cp btest_dwz btest_dwz_3
+   $(DWZ) -m btest_dwz.debug btest_dwz_2 btest_dwz_3
+
+endif HAVE_DWZ
+
 stest_SOURCES = stest.c
 stest_LDADD = libbacktrace.la
 
diff --git a/libbacktrace/Makefile.in b/libbacktrace/Makefile.in
index 60a9d887dba..b3e9b9a4eec 100644
--- a/libbacktrace/Makefile.in
+++ b/libbacktrace/Makefile.in
@@ -120,12 +120,16 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-check_PROGRAMS = $(am__EXEEXT_1) $(am__EXEEXT_2) $(am__EXEEXT_3)
-@NATIVE_TRUE@am__append_1 = btest stest ztest edtest
-@HAVE_ZLIB_TRUE@@NATIVE_TRUE@am__append_2 = -lz
-@HAVE_PTHREAD_TRUE@@NATIVE_TRUE@am__append_3 = ttest
-@HAVE_OBJCOPY_DEBUGLINK_TRUE@@NATIVE_TRUE@am__append_4 = dtest
-@HAVE_COMPRESSED_DEBUG_TRUE@@NATIVE_TRUE@am__append_5 = ctestg ctesta
+check_PROGRAMS = $(am__EXEEXT_1) $(am__EXEEXT_2) $(am__EXEEXT_3) \
+   $(am__EXEEXT_4) $(am__EXEEXT_5)
+@NATIVE_TRUE@am__append_1 = btest
+@HAVE_DWZ_TRUE@@NATIVE_TRUE@am__append_2 = btest_dwz
+@HAVE_DWZ_TRUE@@NATIVE_TRUE@am__append_3 = btest_dwz_2 btest_dwz_3
+@NATIVE_TRUE@am__append_4 = stest ztest edtest
+@HAVE_ZLIB_TRUE@@NATIVE_TRUE@am__append_5 = -lz
+@HAVE_PTHREAD_TRUE@@NATIVE_TRUE@am__append_6 = ttest
+@HAVE_OBJCOPY_DEBUGLINK_TRUE@@NATIVE_TRUE@am__append_7 = dtest
+@HAVE_COMPRESSED_DEBUG_TRUE@@NATIVE_TRUE@am__append_8 = ctestg ctesta
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4

[PATCH] Fix PR86991

2018-11-13 Thread Richard Biener


This PR shows we have stale reduction groups lying around because
the fixup doesn't work reliably with reduction chains.  Fixed by
delaying the build to after detection is successful.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/86991
* tree-vect-loop.c (vect_is_slp_reduction): Delay reduction
group building until we have successfully detected the SLP
reduction.
(vect_is_simple_reduction): Remove fixup code here.

* gcc.dg/pr86991.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266071)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2476,6 +2476,7 @@ vect_is_slp_reduction (loop_vec_info loo
   if (loop != vect_loop)
 return false;
 
+  auto_vec reduc_chain;
   lhs = PHI_RESULT (phi);
   code = gimple_assign_rhs_code (first_stmt);
   while (1)
@@ -2528,17 +2529,9 @@ vect_is_slp_reduction (loop_vec_info loo
 
   /* Insert USE_STMT into reduction chain.  */
   use_stmt_info = loop_info->lookup_stmt (loop_use_stmt);
-  if (current_stmt_info)
-{
- REDUC_GROUP_NEXT_ELEMENT (current_stmt_info) = use_stmt_info;
-  REDUC_GROUP_FIRST_ELEMENT (use_stmt_info)
-= REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-}
-  else
-   REDUC_GROUP_FIRST_ELEMENT (use_stmt_info) = use_stmt_info;
+  reduc_chain.safe_push (use_stmt_info);
 
   lhs = gimple_assign_lhs (loop_use_stmt);
-  current_stmt_info = use_stmt_info;
   size++;
}
 
@@ -2548,10 +2541,9 @@ vect_is_slp_reduction (loop_vec_info loo
   /* Swap the operands, if needed, to make the reduction operand be the second
  operand.  */
   lhs = PHI_RESULT (phi);
-  stmt_vec_info next_stmt_info = REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-  while (next_stmt_info)
+  for (unsigned i = 0; i < reduc_chain.length (); ++i)
 {
-  gassign *next_stmt = as_a  (next_stmt_info->stmt);
+  gassign *next_stmt = as_a  (reduc_chain[i]->stmt);
   if (gimple_assign_rhs2 (next_stmt) == lhs)
{
  tree op = gimple_assign_rhs1 (next_stmt);
@@ -2565,7 +2557,6 @@ vect_is_slp_reduction (loop_vec_info loo
  && vect_valid_reduction_input_p (def_stmt_info))
{
  lhs = gimple_assign_lhs (next_stmt);
- next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
  continue;
}
 
@@ -2600,9 +2591,16 @@ vect_is_slp_reduction (loop_vec_info loo
 }
 
   lhs = gimple_assign_lhs (next_stmt);
-  next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
 }
 
+  /* Build up the actual chain.  */
+  for (unsigned i = 0; i < reduc_chain.length () - 1; ++i)
+{
+  REDUC_GROUP_FIRST_ELEMENT (reduc_chain[i]) = reduc_chain[0];
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain[i]) = reduc_chain[i+1];
+}
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain.last ()) = NULL;
+
   /* Save the chain for further analysis in SLP detection.  */
   stmt_vec_info first_stmt_info
 = REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
@@ -3182,16 +3196,6 @@ vect_is_simple_reduction (loop_vec_info
   return def_stmt_info;
 }
 
-  /* Dissolve group eventually half-built by vect_is_slp_reduction.  */
-  stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (def_stmt_info);
-  while (first)
-{
-  stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (first);
-  REDUC_GROUP_FIRST_ELEMENT (first) = NULL;
-  REDUC_GROUP_NEXT_ELEMENT (first) = NULL;
-  first = next;
-}
-
   /* Look for the expression computing loop_arg from loop PHI result.  */
   if (check_reduction_path (vect_location, loop, phi, loop_arg, code))
 return def_stmt_info;
Index: gcc/testsuite/gcc.dg/pr86991.c
===
--- gcc/testsuite/gcc.dg/pr86991.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr86991.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int b;
+extern unsigned c[];
+unsigned d;
+long e;
+
+void f()
+{
+  unsigned g, h;
+  for (; d; d += 2) {
+  g = 1;
+  for (; g; g += 3) {
+ h = 2;
+ for (; h < 6; h++)
+   c[h] = c[h] - b - ~e;
+  }
+  }
+}


Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Richard Biener
On Tue, 13 Nov 2018, Aldy Hernandez wrote:

> On 11/13/18 3:07 AM, Richard Biener wrote:
> > On Tue, 13 Nov 2018, Aldy Hernandez wrote:
> > 
> > > 
> > > > > The tricky part starts in the prologue for
> > > > > 
> > > > > if (vr0->undefined_p ())
> > > > >   {
> > > > > vr0->deep_copy (vr1);
> > > > > return;
> > > > >   }
> > > > > 
> > > > > but yes, we probably can factor out a bit more common code
> > > > > here.  I'll see to followup with more minor cleanups this
> > > > > week (noticed a few details myself).
> > > > 
> > > > Like this?  (untested)
> > > 
> > > I would inline value_range_base::union_helper into
> > > value_range_base::union_,
> > > and remove all the undefined/varying/etc stuff from value_range::union_.
> > > 
> > > If should work because upon return from value_range_base::union_, in the
> > > this->undefined_p case, the base class will copy everything but the
> > > equivalences.  Then the derived union_ only has to nuke the equivalences
> > > if
> > > this->undefined or this->varying, and the equivalences' IOR just works.
> > > 
> > > For instance, in the case where other->undefined_p, there shouldn't be
> > > anything in the equivalences so the IOR won't copy anything to this as
> > > expected.  Similarly for this->varying_p.
> > > 
> > > In the case of other->varying, this will already been set to varying so
> > > neither this nor other should have anything in their equivalence fields,
> > > so
> > > the IOR won't do anything.
> > > 
> > > I think I covered all of them...the bitmap math should just work.  What do
> > > you
> > > think?
> > 
> > I think the only case that will not work is the case when this->undefined
> > (when we need the deep copy).  Because we'll not get the bitmap from
> > other in that case.  So I've settled with the thing below (just
> > special-casing that very case)
> 
> Ah, good point.
> 
> > 
> > > Finally, as I've hinted before, I think we need to be careful that any
> > > time we
> > > change state to VARYING / UNDEFINED from a base method, that the derived
> > > class
> > > is in a sane state (there are no equivalences set per the API contract).
> > > This
> > > was not originally enforced in VRP, and I wouldn't be surprised if there
> > > are
> > > dragons if we enforce honesty.  I suppose, since we have an API, we could
> > > enforce this lazily: any time equiv() is called, clear the equivalences or
> > > return NULL if it's varying or undefined?  Just a thought.
> > 
> > I have updated ->update () to adjust equiv when we update to VARYING
> > or UNDEFINED.
> 
> Excellent idea.  I don't see that part in your patch though?

That was part of the previous (or previous previous) change.

> 
> > +/* Helper for meet operation for value ranges.  Given two value ranges VR0
> > and
> > +   VR1, return a range that contains both VR0 and VR1.  This may not be the
> > +   smallest possible such range.  */
> > +
> > +value_range_base
> > +value_range_base::union_helper (const value_range_base *vr0,
> > +   const value_range_base *vr1)
> > +{
> 
> I know this was my fault, but would you mind removing vr0 from union_helper?
> Perhaps something like this:
> 
> value_range_base::union_helper (const value_range_base *other)
> 
> I think it'll be cleaner and more consistent this way.

The method is static now and given it doesn't modify VR0 now
but returns a copy that is better IMHO.

Richard.


Re: [PATCH] Fix aarch64_compare_and_swap* constraints (PR target/87839)

2018-11-13 Thread Kyrill Tkachov

Hi Jakub,

On 13/11/18 09:28, Jakub Jelinek wrote:

Hi!

The following testcase ICEs because the predicate and constraints on one of
the operands of @aarch64_compare_and_swapdi aren't consistent.  The RA which
goes according to constraints
(insn 15 13 16 2 (set (reg:DI 104)
(const_int 8589934595 [0x20003])) "pr87839.c":15:3 47 
{*movdi_aarch64}
 (expr_list:REG_EQUIV (const_int 8589934595 [0x20003])
(nil)))
(insn 16 15 21 2 (parallel [
(set (reg:CC 66 cc)
(unspec_volatile:CC [
(const_int 0 [0])
] UNSPECV_ATOMIC_CMPSW))
(set (reg:DI 101)
(mem/v:DI (reg/f:DI 99) [-1  S8 A64]))
(set (mem/v:DI (reg/f:DI 99) [-1  S8 A64])
(unspec_volatile:DI [
(reg:DI 104)
(reg:DI 103)
(const_int 0 [0])
(const_int 32773 [0x8005]) repeated x2
] UNSPECV_ATOMIC_CMPSW))
(clobber (scratch:SI))
]) "pr87839.c":15:3 3532 {aarch64_compare_and_swapdi}
 (expr_list:REG_UNUSED (reg:DI 101)
(expr_list:REG_UNUSED (reg:CC 66 cc)
(nil
when seeing n constraint puts the 0x20003 constant directly into the
atomic instruction, but the predicate requires that it is either a register,
or shifted positive or negative 12-bit constant and so it fails to split.
The positive shifted constant apparently has I constraint and negative one
J, and other uses of aarch64_plus_operand that have some constraint use
rIJ (or r):
config/aarch64/aarch64.md: (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md:(match_operand:SI 2 "aarch64_plus_operand" 
"r,I,J"))
config/aarch64/aarch64.md: (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md: (match_operand:GPI 1 "aarch64_plus_operand" "r"))
config/aarch64/aarch64.md: (match_operand:GPI 1 "aarch64_plus_operand" 
"r,I,J")))]

I don't have a setup to easily bootstrap/regtest aarch64-linux ATM, could
somebody please include it in their bootstrap/regtest? Thanks.

2018-11-13  Jakub Jelinek  

PR target/87839
* config/aarch64/atomics.md (@aarch64_compare_and_swap): Use
rIJ constraint for aarch64_plus_operand rather than rn.

* gcc.target/aarch64/pr87839.c: New test.



This passes bootstrap and regtesting shows no problems on 
aarch64-none-linux-gnu.
The change looks good to me but you'll still need maintainer approval.

Thanks,
Kyrill


--- gcc/config/aarch64/atomics.md.jj2018-11-01 12:06:43.469963662 +0100
+++ gcc/config/aarch64/atomics.md   2018-11-13 09:59:35.660185116 +0100
@@ -71,7 +71,7 @@ (define_insn_and_split "@aarch64_compare
 (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
(set (match_dup 1)
 (unspec_volatile:GPI
-  [(match_operand:GPI 2 "aarch64_plus_operand" "rn")   ;; expect
+  [(match_operand:GPI 2 "aarch64_plus_operand" "rIJ")  ;; expect
(match_operand:GPI 3 "aarch64_reg_or_zero" "rZ");; 
desired
(match_operand:SI 4 "const_int_operand");; 
is_weak
(match_operand:SI 5 "const_int_operand");; mod_s
--- gcc/testsuite/gcc.target/aarch64/pr87839.c.jj 2018-11-13 10:13:44.353309416 
+0100
+++ gcc/testsuite/gcc.target/aarch64/pr87839.c  2018-11-13 10:13:05.496944699 
+0100
@@ -0,0 +1,29 @@
+/* PR target/87839 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+
+long long b[64];
+void foo (void);
+int bar (void (*) (void));
+void qux (long long *, long long) __attribute__((noreturn));
+void quux (long long *, long long);
+
+void
+baz (void)
+{
+  __sync_val_compare_and_swap (b, 4294967298LL, 78187493520LL);
+  __sync_bool_compare_and_swap (b + 1, 8589934595LL, 21474836489LL);
+  __sync_fetch_and_xor (b, 60129542145LL);
+  quux (b, 42949672967LL);
+  __sync_xor_and_fetch (b + 22, 60129542145LL);
+  quux (b + 23, 42949672967LL);
+  if (bar (baz))
+__builtin_abort ();
+  foo ();
+  __sync_val_compare_and_swap (b, 4294967298LL, 0);
+  __sync_bool_compare_and_swap (b + 1, 8589934595LL, 78187493520LL);
+  if (__sync_or_and_fetch (b, 21474836489LL) != 21474836489LL)
+qux (b + 22, 60129542145LL);
+  __atomic_fetch_nand (b + 23, 42949672967LL, __ATOMIC_RELAXED);
+  bar (baz);
+}

Jakub




PING^2: [PATCH] apply_subst_iterator: Handle define_split/define_insn_and_split

2018-11-13 Thread H.J. Lu
On Sun, Nov 4, 2018 at 7:24 AM H.J. Lu  wrote:
>
> On Fri, Oct 26, 2018 at 12:44 AM H.J. Lu  wrote:
> >
> > On 10/25/18, Uros Bizjak  wrote:
> > > On Fri, Oct 26, 2018 at 8:48 AM H.J. Lu  wrote:
> > >>
> > >> On 10/25/18, Uros Bizjak  wrote:
> > >> > On Fri, Oct 26, 2018 at 8:07 AM H.J. Lu  wrote:
> > >> >>
> > >> >> * read-rtl.c (apply_subst_iterator): Handle
> > >> >> define_insn_and_split.
> > >> >> ---
> > >> >>  gcc/read-rtl.c | 6 --
> > >> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> > >> >>
> > >> >> diff --git a/gcc/read-rtl.c b/gcc/read-rtl.c
> > >> >> index d698dd4af4d..5957c29671a 100644
> > >> >> --- a/gcc/read-rtl.c
> > >> >> +++ b/gcc/read-rtl.c
> > >> >> @@ -275,9 +275,11 @@ apply_subst_iterator (rtx rt, unsigned int, int
> > >> >> value)
> > >> >>if (value == 1)
> > >> >>  return;
> > >> >>gcc_assert (GET_CODE (rt) == DEFINE_INSN
> > >> >> + || GET_CODE (rt) == DEFINE_INSN_AND_SPLIT
> > >> >>   || GET_CODE (rt) == DEFINE_EXPAND);
> > >> >
> > >> > Can we also handle DEFINE_SPLIT here?
> > >> >
> > >>
> > >> Yes, we could if there were a usage for it.  I am reluctant to add
> > >> something
> > >> I have no use nor test for.
> > >
> > > Just split one define_insn_and_split to define_insn and corresponding
> > > define_split.
> > >
> > > define_insn_and_split is a contraction for for the define_insn and
> > > corresponding define_split, so it looks weird to only handle
> > > define_insn_and-split without handling define_split.
> > >
> >
> > Here is the updated patch to handle define_split.  Tested with
> >
> > (define_insn "*sse4_1_v8qiv8hi2_2"
> >   [(set (match_operand:V8HI 0 "register_operand")
> > (any_extend:V8HI
> >   (vec_select:V8QI
> > (subreg:V16QI
> >   (vec_concat:V2DI
> > (match_operand:DI 1 "memory_operand")
> > (const_int 0)) 0)
> > (parallel [(const_int 0) (const_int 1)
> >(const_int 2) (const_int 3)
> >(const_int 4) (const_int 5)
> >(const_int 6) (const_int 7)]]
> >   "TARGET_SSE4_1 &&  && "
> >   "#")
> >
> > (define_split
> >   [(set (match_operand:V8HI 0 "register_operand")
> > (any_extend:V8HI
> >   (vec_select:V8QI
> > (subreg:V16QI
> >   (vec_concat:V2DI
> > (match_operand:DI 1 "memory_operand")
> > (const_int 0)) 0)
> > (parallel [(const_int 0) (const_int 1)
> >(const_int 2) (const_int 3)
> >(const_int 4) (const_int 5)
> >(const_int 6) (const_int 7)]]
> >   "TARGET_SSE4_1 &&  && 
> >&& can_create_pseudo_p ()"
> >   [(set (match_dup 0)
> > (any_extend:V8HI (match_dup 1)))]
> > {
> >   operands[1] = adjust_address_nv (operands[1], V8QImode, 0);
> > })
> >
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01665.html
>
> This patch blocks an i386 backend patch.
>

PING.

-- 
H.J.


Re: RFC (branch prediction): PATCH to implement P0479R5, [[likely]] and [[unlikely]].

2018-11-13 Thread Martin Liška
On 11/13/18 5:43 AM, Jason Merrill wrote:
> [[likely]] and [[unlikely]] are equivalent to the GNU hot/cold attributes,
> except that they can be applied to arbitrary statements as well as labels;
> this is most likely to be useful for marking if/else branches as likely or
> unlikely.  Conveniently, PREDICT_EXPR fits the bill nicely as a
> representation.
> 
> I also had to fix marking case labels as hot/cold, which didn't work before.
> Which then required me to force __attribute ((fallthrough)) to apply to the
> statement rather than the label.
> 
> Tested x86_64-pc-linux-gnu.  Does this seem like a sane implementation
> approach to people with more experience with PREDICT_EXPR?

Hi.

In general it makes sense to implement it the same way. Question is how much
should the hold/cold attribute should be close to __builtin_expect.

Let me present few examples and differences that I see:

1) ./xgcc -B. -O2 -fdump-tree-profile_estimate=/dev/stdout /tmp/test1.C

;; Function foo (_Z3foov, funcdef_no=0, decl_uid=2301, cgraph_uid=1, 
symbol_order=3)

Predictions for bb 2
  first match heuristics: 90.00%
  combined heuristics: 90.00%
  __builtin_expect heuristics of edge 2->3: 90.00%

As seen here __builtin_expect is stronger as it's first match heuristics and 
has probability == 90%.

;; Function bar (_Z3barv, funcdef_no=1, decl_uid=2303, cgraph_uid=2, 
symbol_order=4)

Predictions for bb 2
  DS theory heuristics: 74.48%
  combined heuristics: 74.48%
  opcode values nonequal (on trees) heuristics of edge 2->3: 34.00%
  hot label heuristics of edge 2->3: 85.00%

Here we combine hot label prediction with the opcode one, resulting in quite 
poor result 75%.
So maybe cold/hot prediction cal also happen first match.

2) ./xgcc -B. -O2 -fdump-tree-profile_estimate=/dev/stdout /tmp/test2.C
...
foo ()
{
...
  switch (_3)  [3.33%], case 3:  [3.33%], case 42:  
[3.33%], case 333:  [90.00%]>

while:

bar ()
{
  switch (a.1_1)  [25.00%], case 3:  [25.00%], case 42:  
[25.00%], case 333:  [25.00%]>
...

Note that support for __builtin_expect was enhanced in this stage1. I can 
definitely cover also situations when one uses
hot/cold for labels. So definitely place for improvement.

3) last example where one can use the attribute for function decl, resulting in:
__attribute__((hot, noinline))
foo ()
{
..

Hope it's desired? If so I would cover that with a test-case in test-suite.

Jason can you please point to C++ specification of the attributes?
Would you please consider an error diagnostics for situations written in 
test4.C?
Such situation is then silently ignored in profile_estimate pass:

Predictions for bb 2
  hot label heuristics of edge 2->4 (edge pair duplicate): 85.00%
  hot label heuristics of edge 2->3 (edge pair duplicate): 85.00%
...

Thanks,
Martin

> 
> gcc/
>   * gimplify.c (gimplify_case_label_expr): Handle hot/cold attributes.
> gcc/c-family/
>   * c-lex.c (c_common_has_attribute): Handle likely/unlikely.
> gcc/cp/
>   * parser.c (cp_parser_std_attribute): Handle likely/unlikely.
>   (cp_parser_statement): Call process_stmt_hotness_attribute.
>   (cp_parser_label_for_labeled_statement): Apply attributes to case.
>   * cp-gimplify.c (lookup_hotness_attribute, remove_hotness_attribute)
>   (process_stmt_hotness_attribute): New.
>   * decl.c (finish_case_label): Give label in template type void.
>   * pt.c (tsubst_expr) [CASE_LABEL_EXPR]: Copy attributes.
>   [PREDICT_EXPR]: Handle.
> ---
>  gcc/cp/cp-tree.h  |  2 +
>  gcc/c-family/c-lex.c  |  4 +-
>  gcc/cp/cp-gimplify.c  | 42 +
>  gcc/cp/decl.c |  2 +-
>  gcc/cp/parser.c   | 45 +++
>  gcc/cp/pt.c   | 12 +-
>  gcc/gimplify.c| 10 -
>  gcc/testsuite/g++.dg/cpp2a/attr-likely1.C | 38 +++
>  gcc/testsuite/g++.dg/cpp2a/attr-likely2.C | 10 +
>  gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C   | 12 ++
>  gcc/ChangeLog |  4 ++
>  gcc/c-family/ChangeLog|  4 ++
>  gcc/cp/ChangeLog  | 12 ++
>  13 files changed, 184 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/attr-likely1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/attr-likely2.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index c4d79c0cf7f..c55352ec5ff 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7541,6 +7541,8 @@ extern bool cxx_omp_disregard_value_expr(tree, 
> bool);
>  extern void cp_fold_function (tree);
>  extern tree cp_fully_fold(tree);
>  extern void clear_fold_cache (void);
> +extern tree lookup_hotness_attribute (tree);
> +extern tree process_stmt_hotness_attribute   (tree);
>  
>  /* in name-lookup.c */
>  extern tree strip_usin

Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Aldy Hernandez




On 11/13/18 8:58 AM, Richard Biener wrote:

On Tue, 13 Nov 2018, Aldy Hernandez wrote:


On 11/13/18 3:07 AM, Richard Biener wrote:

On Tue, 13 Nov 2018, Aldy Hernandez wrote:




The tricky part starts in the prologue for

 if (vr0->undefined_p ())
   {
 vr0->deep_copy (vr1);
 return;
   }

but yes, we probably can factor out a bit more common code
here.  I'll see to followup with more minor cleanups this
week (noticed a few details myself).


Like this?  (untested)


I would inline value_range_base::union_helper into
value_range_base::union_,
and remove all the undefined/varying/etc stuff from value_range::union_.

If should work because upon return from value_range_base::union_, in the
this->undefined_p case, the base class will copy everything but the
equivalences.  Then the derived union_ only has to nuke the equivalences
if
this->undefined or this->varying, and the equivalences' IOR just works.

For instance, in the case where other->undefined_p, there shouldn't be
anything in the equivalences so the IOR won't copy anything to this as
expected.  Similarly for this->varying_p.

In the case of other->varying, this will already been set to varying so
neither this nor other should have anything in their equivalence fields,
so
the IOR won't do anything.

I think I covered all of them...the bitmap math should just work.  What do
you
think?


I think the only case that will not work is the case when this->undefined
(when we need the deep copy).  Because we'll not get the bitmap from
other in that case.  So I've settled with the thing below (just
special-casing that very case)


Ah, good point.




Finally, as I've hinted before, I think we need to be careful that any
time we
change state to VARYING / UNDEFINED from a base method, that the derived
class
is in a sane state (there are no equivalences set per the API contract).
This
was not originally enforced in VRP, and I wouldn't be surprised if there
are
dragons if we enforce honesty.  I suppose, since we have an API, we could
enforce this lazily: any time equiv() is called, clear the equivalences or
return NULL if it's varying or undefined?  Just a thought.


I have updated ->update () to adjust equiv when we update to VARYING
or UNDEFINED.


Excellent idea.  I don't see that part in your patch though?


That was part of the previous (or previous previous) change.


Ah ok.






+/* Helper for meet operation for value ranges.  Given two value ranges VR0
and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{


I know this was my fault, but would you mind removing vr0 from union_helper?
Perhaps something like this:

value_range_base::union_helper (const value_range_base *other)

I think it'll be cleaner and more consistent this way.


The method is static now and given it doesn't modify VR0 now
but returns a copy that is better IMHO.


Sure.

Aldy


Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 10:48 AM Kyrill Tkachov
 wrote:
>
>
> On 13/11/18 09:28, Richard Biener wrote:
> > On Tue, Nov 13, 2018 at 10:15 AM Kyrill Tkachov
> >  wrote:
> >> Hi Richard,
> >>
> >> On 13/11/18 08:24, Richard Biener wrote:
> >>> On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
> >>>  wrote:
>  On 12/11/18 14:10, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
> >  wrote:
> >> On 09/11/18 12:18, Richard Biener wrote:
> >>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >>>  wrote:
>  Hi all,
> 
>  In this testcase the codegen for VLA SVE is worse than it could be 
>  due to unrolling:
> 
>  fully_peel_me:
>  mov x1, 5
>  ptrue   p1.d, all
>  whilelo p0.d, xzr, x1
>  ld1dz0.d, p0/z, [x0]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0]
>  cntdx2
>  addvl   x3, x0, #1
>  whilelo p0.d, x2, x1
>  beq .L1
>  ld1dz0.d, p0/z, [x0, #1, mul vl]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x3]
>  cntwx2
>  incbx0, all, mul #2
>  whilelo p0.d, x2, x1
>  beq .L1
>  ld1dz0.d, p0/z, [x0]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0]
>  .L1:
>  ret
> 
>  In this case, due to the vector-length-agnostic nature of SVE the 
>  compiler doesn't know the loop iteration count.
>  For such loops we don't want to unroll if we don't end up 
>  eliminating branches as this just bloats code size
>  and hurts icache performance.
> 
>  This patch introduces a new unroll-known-loop-iterations-only param 
>  that disables cunroll when the loop iteration
>  count is unknown (SCEV_NOT_KNOWN). This case occurs much more often 
>  for SVE VLA code, but it does help some
>  Advanced SIMD cases as well where loops with an unknown iteration 
>  count are not unrolled when it doesn't eliminate
>  the branches.
> 
>  So for the above testcase we generate now:
>  fully_peel_me:
>  mov x2, 5
>  mov x3, x2
>  mov x1, 0
>  whilelo p0.d, xzr, x2
>  ptrue   p1.d, all
>  .L2:
>  ld1dz0.d, p0/z, [x0, x1, lsl 3]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0, x1, lsl 3]
>  incdx1
>  whilelo p0.d, x1, x3
>  bne .L2
>  ret
> 
>  Not perfect still, but it's preferable to the original code.
>  The new param is enabled by default on aarch64 but disabled for 
>  other targets, leaving their behaviour unchanged
>  (until other target people experiment with it and set it, if 
>  appropriate).
> 
>  Bootstrapped and tested on aarch64-none-linux-gnu.
>  Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences 
>  in performance.
> 
>  Ok for trunk?
> >>> Hum.  Why introduce a new --param and not simply key on
> >>> flag_peel_loops instead?  That is
> >>> enabled by default at -O3 and with FDO but you of course can control
> >>> that in your targets
> >>> post-option-processing hook.
> >> You mean like this?
> >> It's certainly a simpler patch, but I was just a bit hesitant of 
> >> making this change for all targets :)
> >> But I suppose it's a reasonable change.
> > No, that change is backward.  What I said is that peeling is already
> > conditional on
> > flag_peel_loops and that is enabled by -O3.  So you want to disable
> > flag_peel_loops for
> > SVE instead in the target.
>  Sorry, I got confused by the similarly named functions.
>  I'm talking about try_unroll_loop_completely when run as part of 
>  canonicalize_induction_variables i.e. the "ivcanon" pass
>  (sorry about blaming cunroll here). This doesn't get called through the 
>  try_unroll_loops_completely path.
> >>> Well, peeling gets disabled.  From your patch I see you want to
> >>> disable "unrolling" when
> >>> the number of loop iteration is not constant.  That is called peeling
> >>> where we need to
> >>> emit the loop exit test N times.
> >>>
> >>> Did you check your testcases with -fno-peel-loops?
> >> -fno-peel-loops doesn't help in the testcases. The code that does this 
> >> peeling (try_unroll_loop_completely)
> >> can be ca

Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 2:10 PM Alexandre Oliva  wrote:
>
> On Nov 13, 2018, Richard Biener  wrote:
>
> >> Please let me know if there are objections to this change in the next
> >> few days, e.g., if enabling C and C++ for an Ada-only build is too
> >> onerous.  It is certainly possible to rework gnattools build machinery
> >> so that it uses CC and CXX as detected by the top-level configure if we
> >> can't find xgcc and xg++ in ../gcc.
>
> > I really wonder why we not _always_ do this for consistency given we
> > already require a host Ada compiler.
>
> Sorry, I can't tell what the 'this' refers to.  Enabling C and C++ for
> an Ada-only build?  Reworking gnattools build machinery to use top-level
> CC and CXX?  Something else?

Reworking gnattools build to always use host CC/CXX in "stage1" (or for crosses)
rather than doing sth different.  That would also not require C++ to be enabled
for crosses.

> FWIW, I see the the point of using the just-built gcc/g++ if it's there
> and usable: considering the checks for different versions of Ada
> compilers, you really want to use the last stage of the bootstrap to
> build tools linked with the runtime built with it.  It seems to me you'd
> run into a catch-22 without that.

Yeah, but gnattools is bootstrapped, right?  For --disable-bootstrap
you get binaries built with the host compiler throughout and that's
good.  IIRC I originally stumbled across this with --disable-bootstrap.

Richard.

>
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: cleanups and unification of value_range dumping code

2018-11-13 Thread Aldy Hernandez



On 11/13/18 3:12 AM, Richard Biener wrote:

On Mon, Nov 12, 2018 at 10:50 AM Aldy Hernandez  wrote:


I have rebased my value_range dumping patch after your value_range_base
changes.

I know you are not a fan of the gimple-pretty-print.c chunk, but I still
think having one set of dumping code is preferable to catering to
possible GC corruption while debugging.  If you're strongly opposed (as,
I'm putting my foot down), I can remove it as well as the relevant
pretty_printer stuff.


I'd say we do not want to change the gimple-pretty-print.c stuff also because
we'll miss the leading #.  I'd rather see a simple wide-int-range class
wrapping the interesting bits up.  I guess I'll come up with one then ;)


Ok.  Removed.




The patch looks bigger than it is because I moved all the dump routines
into one place.

OK?

p.s. After your changes, perhaps get_range_info(const_tree, value_range
&) should take a value_range_base instead?


Yes, I missed that and am now testing this change.


Thanks.



Btw, the patch needs updating again (sorry).  If you leave out the
gimple-pretty-print.c stuff there's no requirement to use the pretty-printer
API, right?


No need to apologize for contributing code :).  Thanks.  And yes, 
there's no need for the pretty-printer bits.


I've also removed the value_range*::dump() versions with no arguments, 
as now we have an overloaded debug() for use from the debugger.


Testing attached patch.

Aldy
gcc/

	* tree-vrp.c (value_range_base::dump): Dump type.
	Do not use INF nomenclature for 1-bit types.
	(dump_value_range): Group all variants to common dumping code.
	(debug): New overloaded functions for value_ranges.
	(value_range_base::dump): Remove no argument version.
	(value_range::dump): Same.

gcc/testsuite/

	* gcc.dg/tree-ssa/pr64130.c: Adjust for new value_range pretty
	printer.
	* gcc.dg/tree-ssa/vrp92.c: Same.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
index e068765e2fc..28ffbb76da8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
@@ -15,6 +15,6 @@ int funsigned2 (uint32_t a)
   return (-1 * 0x1L) / a == 0;
 }
 
-/* { dg-final { scan-tree-dump ": \\\[2, 8589934591\\\]" "evrp" } } */
-/* { dg-final { scan-tree-dump ": \\\[-8589934591, -2\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "int \\\[2, 8589934591\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "int \\\[-8589934591, -2\\\]" "evrp" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
index 5a2dbf0108a..66d74e9b5e9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
@@ -18,5 +18,5 @@ int foo (int i, int j)
   return j;
 }
 
-/* { dg-final { scan-tree-dump "res_.: \\\[1, 1\\\]" "vrp1" } } */
+/* { dg-final { scan-tree-dump "res_.: int \\\[1, 1\\\]" "vrp1" } } */
 /* { dg-final { scan-tree-dump-not "Threaded" "vrp1" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 27bc1769f11..f498386e8eb 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -365,8 +365,6 @@ value_range_base::type () const
   return TREE_TYPE (min ());
 }
 
-/* Dump value range to FILE.  */
-
 void
 value_range_base::dump (FILE *file) const
 {
@@ -374,21 +372,26 @@ value_range_base::dump (FILE *file) const
 fprintf (file, "UNDEFINED");
   else if (m_kind == VR_RANGE || m_kind == VR_ANTI_RANGE)
 {
-  tree type = TREE_TYPE (min ());
+  tree ttype = type ();
+
+  print_generic_expr (file, ttype);
+  fprintf (file, " ");
 
   fprintf (file, "%s[", (m_kind == VR_ANTI_RANGE) ? "~" : "");
 
-  if (INTEGRAL_TYPE_P (type)
-	  && !TYPE_UNSIGNED (type)
-	  && vrp_val_is_min (min ()))
+  if (INTEGRAL_TYPE_P (ttype)
+	  && !TYPE_UNSIGNED (ttype)
+	  && vrp_val_is_min (min ())
+	  && TYPE_PRECISION (ttype) != 1)
 	fprintf (file, "-INF");
   else
 	print_generic_expr (file, min ());
 
   fprintf (file, ", ");
 
-  if (INTEGRAL_TYPE_P (type)
-	  && vrp_val_is_max (max ()))
+  if (INTEGRAL_TYPE_P (ttype)
+	  && vrp_val_is_max (max ())
+	  && TYPE_PRECISION (ttype) != 1)
 	fprintf (file, "+INF");
   else
 	print_generic_expr (file, max ());
@@ -398,7 +401,7 @@ value_range_base::dump (FILE *file) const
   else if (varying_p ())
 fprintf (file, "VARYING");
   else
-fprintf (file, "INVALID RANGE");
+gcc_unreachable ();
 }
 
 void
@@ -425,17 +428,45 @@ value_range::dump (FILE *file) const
 }
 
 void
-value_range_base::dump () const
+dump_value_range (FILE *file, const value_range *vr)
 {
-  dump_value_range (stderr, this);
-  fprintf (stderr, "\n");
+  if (!vr)
+fprintf (file, "[]");
+  else
+vr->dump (file);
 }
 
 void
-value_range::dump () const
+dump_value_range (FILE *file, const value_range_base *vr)
+{
+  if (!vr)
+fprintf (file, "[]");
+  else
+vr->dump (file);
+}
+
+DEBUG_FUNCTION void
+debug (const value_range_base *vr)
+{
+  dump_value_range (stderr, vr);
+}
+
+DEBUG_FUN

Re: [PATCH] Improve -fprofile-report.

2018-11-13 Thread Richard Biener
On Tue, Nov 6, 2018 at 3:05 PM Martin Liška  wrote:
>
> Hi.
>
> The patch is based on what was discussed on IRC and in the PR.
> Apart from that the reported layout is improved.
>
> Patch survives regression tests on x86_64-linux-gnu.
>
> Ready for trunk?

OK.

Thanks,
Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2018-11-06  Martin Liska  
>
> PR tree-optimization/87885
> * cfghooks.c (account_profile_record): Rename
> to ...
> (profile_record_check_consistency): ... this.
> Calculate missing num_mismatched_freq_in.
> (profile_record_account_profile): New function
> that calculates time and size of a function.
> * cfghooks.h (struct profile_record): Remove
> all tuples.
> (struct cfg_hooks): Remove after_pass flag.
> (account_profile_record): Rename to ...
> (profile_record_check_consistency): ... this.
> (profile_record_account_profile): New.
> * cfgrtl.c (rtl_account_profile_record): Remove
> after_pass flag.
> * passes.c (check_profile_consistency): Do only
> checking.
> (account_profile): Calculate size and time of
> function only.
> (pass_manager::dump_profile_report): Reformat
> output.
> (execute_one_ipa_transform_pass): Call
> consistency check before clean upand call account_profile
> after a clean up is done.
> (execute_one_pass): Call check_profile_consistency and
> account_profile instead of using after_pass flag..
> * tree-cfg.c (gimple_account_profile_record): Likewise.
> ---
>  gcc/cfghooks.c |  38 +++--
>  gcc/cfghooks.h |  17 ++--
>  gcc/cfgrtl.c   |  12 ++-
>  gcc/passes.c   | 207 ++---
>  gcc/tree-cfg.c |  11 ++-
>  5 files changed, 161 insertions(+), 124 deletions(-)
>
>


Re: [PATCH] RFC: machine-readable diagnostic output (PR other/19165)

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 8:58 AM David Malcolm  wrote:
>
> This patch implements a -fdiagnostics-format=json option which
> converts the diagnostics to be output to stderr in a JSON format;
> see the documentation in invoke.texi.
>
> Logically-related diagnostics are nested at the JSON level, using
> the auto_diagnostic_group mechanism.

LGTM if people really want it.

Richard.

> gcc/ChangeLog:
> PR other/19165
> * Makefile.in (OBJS): Move json.o to...
> (OBJS-libcommon): ...here and add diagnostic-format-json.o.
> * common.opt (fdiagnostics-format=): New option.
> (diagnostics_output_format): New enum.
> * diagnostic-format-json.cc: New file.
> * diagnostic.c (default_diagnostic_final_cb): New function, taken
> from start of diagnostic_finish.
> (diagnostic_initialize): Initialize final_cb to
> default_diagnostic_final_cb.
> (diagnostic_finish): Move "being treated as errors" messages to
> default_diagnostic_final_cb.  Call any final_cb.
> * diagnostic.h (enum diagnostics_output_format): New enum.
> (struct diagnostic_context): Add "final_cb".
> (diagnostic_output_format_init): New decl.
> * doc/invoke.texi (-fdiagnostics-format): New option.
> * dwarf2out.c (gen_producer_string): Ignore
> OPT_fdiagnostics_format_.
> * gcc.c (driver_handle_option): Handle OPT_fdiagnostics_format_.
> * lto-wrapper.c (append_diag_options): Ignore it.
> * opts.c (common_handle_option): Handle it.
> ---
>  gcc/Makefile.in   |   2 +-
>  gcc/common.opt|  17 +++
>  gcc/diagnostic-format-json.cc | 265 
> ++
>  gcc/diagnostic.c  |  40 ---
>  gcc/diagnostic.h  |  16 +++
>  gcc/doc/invoke.texi   |  78 +
>  gcc/dwarf2out.c   |   1 +
>  gcc/gcc.c |   5 +
>  gcc/lto-wrapper.c |   1 +
>  gcc/opts.c|   5 +
>  10 files changed, 414 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/diagnostic-format-json.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 16c9ed6..9534d59 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1395,7 +1395,6 @@ OBJS = \
> ira-color.o \
> ira-emit.o \
> ira-lives.o \
> -   json.o \
> jump.o \
> langhooks.o \
> lcm.o \
> @@ -1619,6 +1618,7 @@ OBJS = \
>  # Objects in libcommon.a, potentially used by all host binaries and with
>  # no target dependencies.
>  OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
> +   diagnostic-format-json.o json.o \
> edit-context.o \
> pretty-print.o intl.o \
> sbitmap.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5a5d332..2f669f6 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1273,6 +1273,23 @@ Enum(diagnostic_color_rule) String(always) 
> Value(DIAGNOSTICS_COLOR_YES)
>  EnumValue
>  Enum(diagnostic_color_rule) String(auto) Value(DIAGNOSTICS_COLOR_AUTO)
>
> +fdiagnostics-format=
> +Common Joined RejectNegative Enum(diagnostics_output_format)
> +-fdiagnostics-format=[text|json] Select output format
> +
> +; Required for these enum values.
> +SourceInclude
> +diagnostic.h
> +
> +Enum
> +Name(diagnostics_output_format) Type(int)
> +
> +EnumValue
> +Enum(diagnostics_output_format) String(text) 
> Value(DIAGNOSTICS_OUTPUT_FORMAT_TEXT)
> +
> +EnumValue
> +Enum(diagnostics_output_format) String(json) 
> Value(DIAGNOSTICS_OUTPUT_FORMAT_JSON)
> +
>  fdiagnostics-parseable-fixits
>  Common Var(flag_diagnostics_parseable_fixits)
>  Print fix-it hints in machine-readable form.
> diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
> new file mode 100644
> index 000..7860696
> --- /dev/null
> +++ b/gcc/diagnostic-format-json.cc
> @@ -0,0 +1,265 @@
> +/* JSON output for diagnostics
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by David Malcolm .
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "diagnostic.h"
> +#include "json.h"
> +
> +/* The top-level JSON array of pending diagnostics.  */
> +
> +static json::array *toplevel_array;
> +
> +/* The

Re: [PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc

2018-11-13 Thread Sudakshina Das
Hi James

On 07/11/18 15:16, James Greenhalgh wrote:
> On Fri, Nov 02, 2018 at 01:37:33PM -0500, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch add the march option for armv8.5-a.
>>
>> Bootstrapped and regression tested with aarch64-none-linux-gnu.
>> Is this ok for trunk?
> 
> One minor tweak, otherwise OK.
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for
>>  ARMv8.5-A.
>>  * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New.
>>  (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New.
>>  * gcc/doc/invoke.texi: Document ARMv8.5-A.
> 
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index 
>> fa9af26fd40fd23b1c9cd6da9b6300fd77089103..b324cdd2fede33af13c03362750401f9eb1c9a90
>>  100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -170,6 +170,8 @@ extern unsigned aarch64_architecture_version;
>>   #define AARCH64_FL_SHA3  (1 << 18)  /* Has ARMv8.4-a SHA3 and 
>> SHA512.  */
>>   #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions. 
>>  */
>>   #define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions. 
>>  */
>> +/* ARMv8.5-A architecture extensions.  */
>> +#define AARCH64_FL_V8_5   (1 << 22)  /* Has ARMv8.5-A features.  */
>>   
>>   /* Statistical Profiling extensions.  */
>>   #define AARCH64_FL_PROFILE(1 << 21)
> 
> Let's keep this in order. 20, 21, 22.
> 

I have the moved the Armv8.5 stuff below. Patch attached.
If this looks ok, I will rebase 2/6 on top. Let me know if you
want me to resend the rebased 2/6 too.

Thanks
Sudi

> Thanks,
> James
> 
> 



diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index a37a5553894d6ab1d629017ea204478f69d8773d..7d05cd604093d15f27e5b197803a50c45a260e6e 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -35,5 +35,6 @@ AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
 AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 AARCH64_ARCH("armv8.4-a", generic,	 8_4A,	8,  AARCH64_FL_FOR_ARCH8_4)
+AARCH64_ARCH("armv8.5-a", generic,	 8_5A,	8,  AARCH64_FL_FOR_ARCH8_5)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8ab21e7bc37c7d5ffba1a365345f70d9f501b3ac..8ce8445586f29963107848604c5e2bab8e853685 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -177,6 +177,9 @@ extern unsigned aarch64_architecture_version;
 /* Statistical Profiling extensions.  */
 #define AARCH64_FL_PROFILE(1 << 21)
 
+/* ARMv8.5-A architecture extensions.  */
+#define AARCH64_FL_V8_5	  (1 << 22)  /* Has ARMv8.5-A features.  */
+
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
 
@@ -195,6 +198,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_4			\
   (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
| AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
+#define AARCH64_FL_FOR_ARCH8_5			\
+  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5)
 
 /* Macros to test ISA flags.  */
 
@@ -216,6 +221,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_SHA3	   (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML	   (aarch64_isa_flags & AARCH64_FL_F16FML)
 #define AARCH64_ISA_RCPC8_4	   (aarch64_isa_flags & AARCH64_FL_RCPC8_4)
+#define AARCH64_ISA_V8_5	   (aarch64_isa_flags & AARCH64_FL_V8_5)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3e54087ab98049ba932caa34ba2fb135eda48396..26770c5aafda1524d63a89cacf8cc069b7c8b9b6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15118,8 +15118,11 @@ more feature modifiers.  This option has the form
 @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
 
 The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @samp{armv8.4-a}
-or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a},
+@samp{armv8.5-a} or @var{native}.
+
+The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler
+support for the ARMv8.5-A architecture extensions.
 
 The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler
 support for the ARMv8.4-A architecture extensions.




Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-11-13 Thread Sudakshina Das
Hi

On 02/11/18 18:38, Sudakshina Das wrote:
> Hi
> 
> This patch is part of a series that enables ARMv8.5-A in GCC and
> adds Branch Target Identification Mechanism.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
> 
> This patch adds a new pass called "bti" which is triggered by the
> command line argument -mbranch-protection whenever "bti" is turned on.
> 
> The pass iterates through the instructions and adds appropriated BTI
> instructions based on the following:
>  * Add a new "BTI C" at the beginning of a function, unless its already
>protected by a "PACIASP/PACIBSP". We exempt the functions that are
>only called directly.
>  * Add a new "BTI J" for every target of an indirect jump, jump table
>targets, non-local goto targets or labels that might be referenced
>by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)
> 
> Since we have already changed the use of indirect tail calls to only x16
> and x17, we do not have to use "BTI JC".
> (check patch 3/6).
> 

I missed out on the explanation for the changes to the trampoline code.
The patch also updates the trampoline code in case BTI is enabled. Since
the trampoline code is a target of an indirect branch, we need to add an
appropriate BTI instruction at the beginning of it to avoid a branch
target exception.

> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added
> new tests.
> Is this ok for trunk?
> 
> Thanks
> Sudi
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
>   Ramana Radhakrishnan  
> 
>   * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>   * gcc/config/aarch64/aarch64.h: Update comment for
>   TRAMPOLINE_SIZE.
>   * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>   Update if bti is enabled.
>   * config/aarch64/aarch64-bti-insert.c: New file.
>   * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>   bti pass.
>   * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>   Declare the new bti pass.
>   * config/aarch64/aarch64.md (bti_nop): Define.
>   * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>   * gcc.target/aarch64/bti-1.c: New test.
>   * gcc.target/aarch64/bti-2.c: New test.
>   * lib/target-supports.exp
>   (check_effective_target_aarch64_bti_hw): Add new check for
>   BTI hw.
>

Updated patch attached with more comments and a bit of simplification
in aarch64-bti-insert.c. ChangeLog still applies.

Thanks
Sudi

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -317,7 +317,7 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c
new file mode 100644
index ..15202e0def3b514bdbd1564b39a121e43e01a67f
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -0,0 +1,226 @@
+/* Branch Target Identification for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#define INCLUDE_STRING
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "gimple.h"
+#include "tm_p.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "emit-rtl.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "dumpfile.h"
+#include "rtl-iter.h"
+#include "cfgrtl.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+
+/* This pass enables the support for Branch Target I

Re: [PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET

2018-11-13 Thread Sudakshina Das
Hi James

On 07/11/18 15:36, James Greenhalgh wrote:
> On Fri, Nov 02, 2018 at 01:38:46PM -0500, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch is adding a new configure option for enabling and return
>> address signing by default with --enable-standard-branch-protection.
>> This is equivalent to -mbranch-protection=standard which would
>> imply -mbranch-protection=pac-ret+bti.
>>
>> Bootstrapped and regression tested with aarch64-none-linux-gnu with
>> and without the configure option turned on.
>> Also tested on aarch64-none-elf with and without configure option with a
>> BTI enabled aem. Only 2 regressions and these were because newlib
>> requires patches to protect hand coded libraries with BTI.
>>
>> Is this ok for trunk?
> 
> With a tweak to the comment above your changes in aarch64.c, yes this is OK.
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64.c (aarch64_override_options): Add case to check
>>  configure option to set BTI and Return Address Signing.
>>  * configure.ac: Add --enable-standard-branch-protection and
>>  --disable-standard-branch-protection.
>>  * configure: Regenerated.
>>  * doc/install.texi: Document the same.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * gcc.target/aarch64/bti-1.c: Update test to not add command
>>  line option when configure with bti.
>>  * gcc.target/aarch64/bti-2.c: Likewise.
>>  * lib/target-supports.exp
>>  (check_effective_target_default_branch_protection):
>>  Add configure check for --enable-standard-branch-protection.
>>
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> 12a55a640de4fdc5df21d313c7ea6841f1daf3f2..a1a5b7b464eaa2ce67ac66d9aea837159590aa07
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -11558,6 +11558,26 @@ aarch64_override_options (void)
>> if (!selected_tune)
>>   selected_tune = selected_cpu;
>>   
>> +  if (aarch64_enable_bti == 2)
>> +{
>> +#ifdef TARGET_ENABLE_BTI
>> +  aarch64_enable_bti = 1;
>> +#else
>> +  aarch64_enable_bti = 0;
>> +#endif
>> +}
>> +
>> +  /* No command-line option yet.  */
> 
> This is too broad. Can you narrow this down to which command line option this
> relates to, and what the expected default behaviours are (for both LP64 and
> ILP32).
> 

Updated patch attached. Return address signing is not supported for
ILP32 currently. This patch just follows that and hence the extra ILP32
check is added.

Thanks
Sudi

> Thanks,
> James
> 
>> +  if (accepted_branch_protection_string == NULL && !TARGET_ILP32)
>> +{
>> +#ifdef TARGET_ENABLE_PAC_RET
>> +  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
>> +  aarch64_ra_sign_key = AARCH64_KEY_A;
>> +#else
>> +  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
>> +#endif
>> +}
>> +
>>   #ifndef HAVE_AS_MABI_OPTION
>> /* The compiler may have been configured with 2.23.* binutils, which does
>>not have support for ILP32.  */
> 



diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b97d9e4deecf5ca33761dfd1008c39bb4b849881..e267d3441fd7f21105bfba339b69f2ecdb7595ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11579,6 +11579,28 @@ aarch64_override_options (void)
   if (!selected_tune)
 selected_tune = selected_cpu;
 
+  if (aarch64_enable_bti == 2)
+{
+#ifdef TARGET_ENABLE_BTI
+  aarch64_enable_bti = 1;
+#else
+  aarch64_enable_bti = 0;
+#endif
+}
+
+  /* Return address signing is currently not supported for ILP32 targets.  For
+ LP64 targets use the configured option in the absence of a command-line
+ option for -mbranch-protection.  */
+  if (!TARGET_ILP32 && accepted_branch_protection_string == NULL)
+{
+#ifdef TARGET_ENABLE_PAC_RET
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
+  aarch64_ra_sign_key = AARCH64_KEY_A;
+#else
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
+#endif
+}
+
 #ifndef HAVE_AS_MABI_OPTION
   /* The compiler may have been configured with 2.23.* binutils, which does
  not have support for ILP32.  */
diff --git a/gcc/configure b/gcc/configure
index 03461f1e27538a3a0791c2b61b0e75c3ff1a25be..a0f95106c22ee858bbf4516f14cd9d265dede272 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -947,6 +947,7 @@ with_plugin_ld
 enable_gnu_indirect_function
 enable_initfini_array
 enable_comdat
+enable_standard_branch_protection
 enable_fix_cortex_a53_835769
 enable_fix_cortex_a53_843419
 with_glibc_version
@@ -1677,6 +1678,14 @@ Optional Features:
   --enable-initfini-array	use .init_array/.fini_array sections
   --enable-comdat enable COMDAT group support
 
+  --

Re: [PATCH] [ARC] Cleanup, fix and set LRA default.

2018-11-13 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-11-12 13:29:33 +0200]:

> From: claziss 
> 
> Hi Andrew,
> 
> This is a patch which fixes and sets LRA by default.
> 
> OK to apply?
> Claudiu
> 
>   Commit message 
> 
> LP_COUNT register cannot be freely allocated by the compiler as it
> size, and/or content may change depending on the ARC hardware
> configuration. Thus, make this register fixed.
> 
> Remove register classes and unused constraint letters.
> 
> Cleanup the implementation of conditional_register_usage hook by using
> macros instead of magic constants and removing all references to
> reg_class_contents which are bringing so much grief when lra is enabled.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.h (reg_class): Reorder registers classes, remove
>   unused register classes.
>   (REG_CLASS_NAMES): Likewise.
>   (REG_CLASS_CONTENTS): Likewise.
>   (FIXED_REGISTERS): Make lp_count fixed.
>   (BASE_REG_CLASS): Remove ACC16_BASE_REGS reference.
>   (PROGRAM_COUNTER_REGNO): Remove.
>   * config/arc/arc.c (arc_conditional_register_usage): Remove unused
>   register classes, use constants for register numbers, remove
>   reg_class_contents references.
>   (arc_process_double_reg_moves): Add asserts.
>   (arc_secondary_reload): Remove LPCOUNT_REG reference, use
>   lra_in_progress predicate.
>   (arc_init_reg_tables): Remove unused register classes.
>   (arc_register_move_cost): Likewise.
>   (arc_preferred_reload_class): Likewise.
>   (hwloop_optimize): Update rtx patterns involving lp_count
>   register.
>   (arc_return_address_register): Rename ILINK1, INLINK2 regnums
>   macros.
>   * config/arc/constraints.md ("c"): Choose between GENERAL_REGS and
>   CHEAP_CORE_REGS.  Former one will be used for LRA.
>   ("Rac"): Choose between GENERAL_REGS and ALL_CORE_REGS.  Former
>   one will be used for LRA.
>   ("w"): Choose between GENERAL_REGS and WRITABLE_CORE_REGS.  Former
>   one will be used for LRA.
>   ("W"): Choose between GENERAL_REGS and MPY_WRITABLE_CORE_REGS.
>   Former one will be used for LRA.
>   ("f"): Delete constraint.
>   ("k"): Likewise.
>   ("e"): Likewise.

The entries below this are for arc.md, but you're missing the filename
in the ChangeLog format.

>   (movqi_insn): Remove unsed lp_count constraints.
>   (movhi_insn): Likewise.
>   (movsi_insn): Update pattern.
>   (arc_lp): Likewise.
>   (dbnz): Likewise.
>   ("l"): Change it from register constraint to constraint.
>   (stack_tie): Remove 'b' constraint letter.
>   (R4_REG): Define.
>   (R9_REG, R15_REG, R16_REG, R25_REG): Likewise.
>   (R32_REG, R40_REG, R41_REG, R42_REG, R43_REG, R44_REG): Likewise.
>   (R57_REG, R59_REG, PCL_REG): Likewise.
>   (ILINK1_REGNUM): Renamed to ILINK1_REG.
>   (ILINK2_REGNUM): Renamed to ILINK2_REG.
>   (Rgp): Remove.
>   (SP_REGS): Likewise.
>   (Rcw): Remove unused reg classes.
>   * config/arc/predicates.md (dest_reg_operand): Just default on
>   register_operand predicate.
>   (mpy_dest_reg_operand): Likewise.
>   (move_dest_operand): Use macros instead of constants.

I'm basically happy with this.  There's a few formatting issues as we
saw in previous patches - tabs instead of whitespace in comments and
single whitespace instead of two at the end of a sentence.  But with
that fixed (and the doc fix Eric suggested) I'm happy.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc.c  | 331 +-
>  gcc/config/arc/arc.h  | 106 ---
>  gcc/config/arc/arc.md |  57 --
>  gcc/config/arc/arc.opt|   7 +-
>  gcc/config/arc/constraints.md |  45 ++---
>  gcc/config/arc/predicates.md  |  28 +--
>  6 files changed, 222 insertions(+), 352 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 75c2384eede..6802ca66554 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -734,11 +734,6 @@ arc_secondary_reload (bool in_p,
>if (cl == DOUBLE_REGS)
>  return GENERAL_REGS;
>  
> -  /* The loop counter register can be stored, but not loaded directly.  */
> -  if ((cl == LPCOUNT_REG || cl == WRITABLE_CORE_REGS)
> -  && in_p && MEM_P (x))
> -return GENERAL_REGS;
> -
>   /* If we have a subreg (reg), where reg is a pseudo (that will end in
>  a memory location), then we may need a scratch register to handle
>  the fp/sp+largeoffset address.  */
> @@ -756,8 +751,9 @@ arc_secondary_reload (bool in_p,
> if (regno != -1)
>   return NO_REGS;
>  
> -   /* It is a pseudo that ends in a stack location.  */
> -   if (reg_equiv_mem (REGNO (x)))
> +   /* It is a pseudo that ends in a stack location.  This
> +  procedure only works with the old reload step.  */
> +   if (reg_equiv_mem (REGNO (x)) && !lra_in_progress)
>   {
> /* G

Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Martin Liška
On 11/12/18 1:07 PM, Jakub Jelinek wrote:
> On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
>> The patch reject usage of the mentioned options.
>>
>> Ready for trunk?
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-11-12  Martin Liska  
>>
>>  PR sanitizer/87930
>>  * config/i386/i386.c (ix86_option_override_internal): Error
>>  about usage -mabi=ms and -fsanitize={,kernel-}address.
> 
> Please add testcases for this. 

Done in attached patch.

Can this be changed through attribute too?

No.

I'm going to install the patch.

Thanks,
Martin

> If so, a test for that should be there too.
> 
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3546,6 +3546,11 @@ ix86_option_override_internal (bool main_args_p,
>>  error ("-mabi=ms not supported with X32 ABI");
>>gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
>>  
>> +  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi 
>> == MS_ABI)
>> +error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
>> +  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi 
>> == MS_ABI)
>> +error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
>> +
>>/* For targets using ms ABI enable ms-extensions, if not
>>   explicit turned off.  For non-ms ABI we turn off this
>>   option.  */
>>
> 
> 
>   Jakub
> 

>From 7720cb384b33835a29beaaddb4a940a5eeadb13f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 12 Nov 2018 12:55:33 +0100
Subject: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR
 sanitizer/87930).

gcc/ChangeLog:

2018-11-12  Martin Liska  

	PR sanitizer/87930
	* config/i386/i386.c (ix86_option_override_internal): Error
	about usage -mabi=ms and -fsanitize={,kernel-}address.

gcc/testsuite/ChangeLog:

2018-11-13  Martin Liska  

	PR sanitizer/87930
	* gcc.target/i386/pr87930.c: New test.
---
 gcc/config/i386/i386.c  | 5 +
 gcc/testsuite/gcc.target/i386/pr87930.c | 6 ++
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87930.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 711bec0cc9d..b3e0807b894 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3546,6 +3546,11 @@ ix86_option_override_internal (bool main_args_p,
 error ("-mabi=ms not supported with X32 ABI");
   gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
 
+  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
+  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
+
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
  option.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c b/gcc/testsuite/gcc.target/i386/pr87930.c
new file mode 100644
index 000..e9cf29c221a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87930.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-fsanitize=address -mabi=ms" } */
+
+int i;
+
+/* { dg-error ".-mabi=ms. not supported with .-fsanitize=address." "" { target *-*-* } 0 } */
-- 
2.19.1



Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Renlin Li

Hi Peter,

I could verify that, your patch fixes all the ICEs I saw with 
arm-linux-gnueabihf toolchain!
There are some differences on the test results, because I compare the latest 
results with something which is old.

I haven't test it on bare-metal toolchain yet. But will do to ensure all 
related issues are fixed.

Thanks for fixing it!

Regards,
Renlin



On 11/12/2018 08:25 PM, Peter Bergner wrote:

On 11/12/18 6:25 AM, Renlin Li wrote:

I tried to build a native arm-linuxeabihf toolchain with the patch.
But I got the following ICE:


Ok, the issue was a problem in handling the src reg from a register copy.
I thought I could just remove it from the dead_set, but forgot that the
updating of the program points looks at whether the pseudo is live or
not.  The change below on top of the previous patch fixes the ICE for me.
I now add the src reg back into pseudos_live before we process the insn's
input operands so it doesn't trigger a new program point being added.

Renlin and Jeff, can you apply this patch on top of the previous one
and see whether that is better?

Thanks.

Peter


--- gcc/lra-lives.c.orig2018-11-12 14:15:18.257657911 -0600
+++ gcc/lra-lives.c 2018-11-12 14:08:55.978795092 -0600
@@ -934,6 +934,18 @@
  || sparseset_contains_pseudos_p (start_dying))
next_program_point (curr_point, freq);
  
+  /* If we removed the source reg from a simple register copy from the

+live set above, then add it back now so we don't accidentally add
+it to the start_living set below.  */
+  if (ignore_reg != NULL_RTX)
+   {
+ int ignore_regno = REGNO (ignore_reg);
+ if (HARD_REGISTER_NUM_P (ignore_regno))
+   SET_HARD_REG_BIT (hard_regs_live, ignore_regno);
+ else
+   sparseset_set_bit (pseudos_live, ignore_regno);
+   }
+
sparseset_clear (start_living);
  
/* Mark each used value as live.	*/

@@ -959,11 +971,6 @@
  
sparseset_and_compl (dead_set, start_living, start_dying);
  
-  /* If we removed the source reg from a simple register copy from the

-live set, then it will appear to be dead, but it really isn't.  */
-  if (ignore_reg != NULL_RTX)
-   sparseset_clear_bit (dead_set, REGNO (ignore_reg));
-
sparseset_clear (start_dying);
  
/* Mark early clobber outputs dead.  */




Re: [PATCH PR84648]Adjust loop exit conditions for loop-until-wrap cases.

2018-11-13 Thread Richard Biener
On Sun, Nov 11, 2018 at 9:02 AM bin.cheng  wrote:
>
> Hi,
> This patch fixes PR84648 by adjusting exit conditions for loop-until-wrap 
> cases.
> It only handles simple cases in which IV.base are constants because we rely on
> current niter analyzer which doesn't handle parameterized bound in wrapped
> case.  It could be relaxed in the future.
>
> Bootstrap and test on x86_64 in progress.

Please use TYPE_MIN/MAX_VALUE or wi::min/max_value consistently.
Either tree_int_cst_equal (iv0->base, TYPE_MIN_VALUE (type)) or
wide_int_to_tree (niter_type, wi::max_value (TYPE_PRECISION (type),
TYPE_SIGN (type))).

Also

+  iv0->base = low;
+  iv0->step = fold_convert (niter_type, integer_one_node);

build_int_cst (niter_type, 1);

+  iv1->base = high;
+  iv1->step = integer_zero_node;

build_int_cst (niter_type, 0);

With the code, what happens to signed IVs?  I suppose we figure out things
earlier by means of undefined overflow?

Apart from the above nits OK for trunk.

Thanks,
Richard.

> Thanks,
> bin
> 2018-11-11  Bin Cheng  
>
> PR tree-optimization/84648
> * tree-ssa-loop-niter.c (adjust_cond_for_loop_until_wrap): New.
> (number_of_iterations_cond): Adjust exit cond for loop-until-wrap case
> by calling adjust_cond_for_loop_until_wrap.
>
> 2018-11-11  Bin Cheng  
>
> PR tree-optimization/84648
> * gcc.dg/tree-ssa/pr84648.c: New test.


Re: cleanups and unification of value_range dumping code

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 3:43 PM Aldy Hernandez  wrote:
>
>
>
> On 11/13/18 3:12 AM, Richard Biener wrote:
> > On Mon, Nov 12, 2018 at 10:50 AM Aldy Hernandez  wrote:
> >>
> >> I have rebased my value_range dumping patch after your value_range_base
> >> changes.
> >>
> >> I know you are not a fan of the gimple-pretty-print.c chunk, but I still
> >> think having one set of dumping code is preferable to catering to
> >> possible GC corruption while debugging.  If you're strongly opposed (as,
> >> I'm putting my foot down), I can remove it as well as the relevant
> >> pretty_printer stuff.
> >
> > I'd say we do not want to change the gimple-pretty-print.c stuff also 
> > because
> > we'll miss the leading #.  I'd rather see a simple wide-int-range class
> > wrapping the interesting bits up.  I guess I'll come up with one then ;)
>
> Ok.  Removed.
>
> >
> >> The patch looks bigger than it is because I moved all the dump routines
> >> into one place.
> >>
> >> OK?
> >>
> >> p.s. After your changes, perhaps get_range_info(const_tree, value_range
> >> &) should take a value_range_base instead?
> >
> > Yes, I missed that and am now testing this change.
>
> Thanks.
>
> >
> > Btw, the patch needs updating again (sorry).  If you leave out the
> > gimple-pretty-print.c stuff there's no requirement to use the pretty-printer
> > API, right?
>
> No need to apologize for contributing code :).  Thanks.  And yes,
> there's no need for the pretty-printer bits.
>
> I've also removed the value_range*::dump() versions with no arguments,
> as now we have an overloaded debug() for use from the debugger.
>
> Testing attached patch.

OK.

Thanks,
Richard.

> Aldy


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 04:01:56PM +0100, Martin Liška wrote:
> On 11/12/18 1:07 PM, Jakub Jelinek wrote:
> > On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
> >> The patch reject usage of the mentioned options.
> >>
> >> Ready for trunk?
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-11-12  Martin Liska  
> >>
> >>PR sanitizer/87930
> >>* config/i386/i386.c (ix86_option_override_internal): Error
> >>about usage -mabi=ms and -fsanitize={,kernel-}address.
> > 
> > Please add testcases for this. 
> 
> Done in attached patch.
> 
> Can this be changed through attribute too?
> 
> No.
> 
> I'm going to install the patch.

Can you please move the test to gcc.dg/asan/ and guard with
{ i?86-*-* x86_64-*-* } && lp64 ?
I don't think we don't want -fsanitize=address tests outside of /asan/
subdirs where we guard them on the availability of asan.

Jakub


[PATCH] update-copyright.py: Add filters for D language sources

2018-11-13 Thread Iain Buclaw
Hi,

This adds filters for upstream dmd, druntime, and phobos libraries, so
that the update-copyright script doesn't complain or try to update the
copyright years for those files.

OK for trunk?
-- 
Iain

---
contrib/ChangeLog:

2018-11-13  Iain Buclaw  

* update-copyright.py (TestsuiteFilter): Skip .d tests.
(LibPhobosFilter): Add filter for upstream D sources.
(GCCCopyright): Add D Language Foundation as external author.
(GCCCmdLine): Add libphobos.

---
diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
index 9295c6b8a30..67c21cab23c 100755
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -574,6 +574,7 @@ class TestsuiteFilter (GenericFilter):
 '.c',
 '.C',
 '.cc',
+'.d',
 '.h',
 '.hs',
 '.f',
@@ -616,6 +617,25 @@ class LibGCCFilter (GenericFilter):
 'soft-fp',
 ])
 
+class LibPhobosFilter (GenericFilter):
+def __init__ (self):
+GenericFilter.__init__ (self)
+
+self.skip_files |= set ([
+# Source module imported from upstream.
+'object.d',
+])
+
+self.skip_dirs |= set ([
+# Contains sources imported from upstream.
+'core',
+'etc',
+'gc',
+'gcstub',
+'rt',
+'std',
+])
+
 class LibStdCxxFilter (GenericFilter):
 def __init__ (self):
 GenericFilter.__init__ (self)
@@ -682,6 +702,7 @@ class GCCCopyright (Copyright):
 self.add_external_author ('Silicon Graphics')
 self.add_external_author ('Stephen L. Moshier')
 self.add_external_author ('Sun Microsystems, Inc. All rights reserved.')
+self.add_external_author ('The D Language Foundation, All Rights Reserved')
 self.add_external_author ('The Go Authors.  All rights reserved.')
 self.add_external_author ('The Go Authors. All rights reserved.')
 self.add_external_author ('The Go Authors.')
@@ -720,6 +741,7 @@ class GCCCmdLine (CmdLine):
 self.add_dir ('libitm')
 self.add_dir ('libobjc')
 # liboffloadmic is imported from upstream.
+self.add_dir ('libphobos', LibPhobosFilter())
 self.add_dir ('libquadmath')
 # libsanitizer is imported from upstream.
 self.add_dir ('libssp')
@@ -745,6 +767,7 @@ class GCCCmdLine (CmdLine):
 'libiberty',
 'libitm',
 'libobjc',
+'libphobos',
 'libssp',
 'libstdc++-v3',
 'libvtv',


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Martin Liška
On 11/13/18 4:08 PM, Jakub Jelinek wrote:
> On Tue, Nov 13, 2018 at 04:01:56PM +0100, Martin Liška wrote:
>> On 11/12/18 1:07 PM, Jakub Jelinek wrote:
>>> On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
 The patch reject usage of the mentioned options.

 Ready for trunk?
 Thanks,
 Martin

 gcc/ChangeLog:

 2018-11-12  Martin Liska  

PR sanitizer/87930
* config/i386/i386.c (ix86_option_override_internal): Error
about usage -mabi=ms and -fsanitize={,kernel-}address.
>>>
>>> Please add testcases for this. 
>>
>> Done in attached patch.
>>
>> Can this be changed through attribute too?
>>
>> No.
>>
>> I'm going to install the patch.
> 
> Can you please move the test to gcc.dg/asan/ and guard with
> { i?86-*-* x86_64-*-* } && lp64 ?
> I don't think we don't want -fsanitize=address tests outside of /asan/
> subdirs where we guard them on the availability of asan.
> 
>   Jakub
> 

Sure, there's a patch I've just tested.

Martin
>From 89aa27fb5d816c7d854cd77ba0f157d69bc927f7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 13 Nov 2018 16:23:08 +0100
Subject: [PATCH] Move a test-case to a proper folder.

ChangeLog:

2018-11-13  Martin Liska  

	* pr87930.c: Move from gcc/testsuite/gcc.target/i386/
	into gcc/testsuite/gcc.dg/asan/.
---
 gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c (68%)

diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c b/gcc/testsuite/gcc.dg/asan/pr87930.c
similarity index 68%
rename from gcc/testsuite/gcc.target/i386/pr87930.c
rename to gcc/testsuite/gcc.dg/asan/pr87930.c
index e9cf29c221a..4f8e6999fde 100644
--- a/gcc/testsuite/gcc.target/i386/pr87930.c
+++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
 /* { dg-options "-fsanitize=address -mabi=ms" } */
 
 int i;
-- 
2.19.1



Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 04:24:39PM +0100, Martin Liška wrote:
> 2018-11-13  Martin Liska  
> 
>   * pr87930.c: Move from gcc/testsuite/gcc.target/i386/
>   into gcc/testsuite/gcc.dg/asan/.

* gcc.target/i386/pr87930.c: Move to ...
* gcc.dg/asan/pr87930.c: ... here.  Guard for i?86/x86_64 targets.

Ok with that change.

> ---
>  gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>  rename gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c (68%)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c 
> b/gcc/testsuite/gcc.dg/asan/pr87930.c
> similarity index 68%
> rename from gcc/testsuite/gcc.target/i386/pr87930.c
> rename to gcc/testsuite/gcc.dg/asan/pr87930.c
> index e9cf29c221a..4f8e6999fde 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87930.c
> +++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target lp64 } } */
> +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
>  /* { dg-options "-fsanitize=address -mabi=ms" } */
>  
>  int i;
> -- 
> 2.19.1
> 


Jakub


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Martin Liška
On 11/13/18 4:26 PM, Jakub Jelinek wrote:
> On Tue, Nov 13, 2018 at 04:24:39PM +0100, Martin Liška wrote:
>> 2018-11-13  Martin Liska  
>>
>>  * pr87930.c: Move from gcc/testsuite/gcc.target/i386/
>>  into gcc/testsuite/gcc.dg/asan/.
> 
>   * gcc.target/i386/pr87930.c: Move to ...
>   * gcc.dg/asan/pr87930.c: ... here.  Guard for i?86/x86_64 targets.
> 
> Ok with that change.

Thanks, installed as r266076.

Martin

> 
>> ---
>>  gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>  rename gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c (68%)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c 
>> b/gcc/testsuite/gcc.dg/asan/pr87930.c
>> similarity index 68%
>> rename from gcc/testsuite/gcc.target/i386/pr87930.c
>> rename to gcc/testsuite/gcc.dg/asan/pr87930.c
>> index e9cf29c221a..4f8e6999fde 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr87930.c
>> +++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target lp64 } } */
>> +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
>>  /* { dg-options "-fsanitize=address -mabi=ms" } */
>>  
>>  int i;
>> -- 
>> 2.19.1
>>
> 
> 
>   Jakub
> 



Re: Bug 52869 - [DR 1207] "this" not being allowed in noexcept clauses

2018-11-13 Thread Marek Polacek
On Tue, Nov 13, 2018 at 11:49:55AM +0530, Umesh Kalappa wrote:
> Hi All,
> 
> the following patch fix the subjected issue
> 
> Index: gcc/cp/parser.c
> ===
> --- gcc/cp/parser.c (revision 266026)
> +++ gcc/cp/parser.c (working copy)
> @@ -24615,6 +24615,8 @@
>  {
>tree expr;
>cp_lexer_consume_token (parser->lexer);
> +
> +  inject_this_parameter (current_class_type, TYPE_UNQUALIFIED);
> 
>if (cp_lexer_peek_token (parser->lexer)->type == CPP_OPEN_PAREN)
> {
> 
> 
> ok to commit along the testcase with changelog update ?

Thanks for the patch.

Please also include the testcase along with the patch (and I think it should
also test noexcept in a template).  Please also include a ChangeLog entry
in the patch submission.

Can you describe how this patch has been tested?

Further, wouldn't it be better to call inject_this_parameter inside the
CPP_OPEN_PAREN block?  If noexcept doesn't have any expression, then it
can't refer to "this".

Marek


Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Alexandre Oliva
On Nov 13, 2018, Richard Biener  wrote:

> Reworking gnattools build to always use host CC/CXX in "stage1" (or for 
> crosses)
> rather than doing sth different.

> Yeah, but gnattools is bootstrapped, right?

No, it's not built in stage1, it's a post-bootstrap host subpackage.

> For --disable-bootstrap you get binaries built with the host compiler
> throughout and that's good.

I guess it *could* be built with the host compiler, and linked with that
runtime.  Then, in order to run it, you might still need the previous
gnat rts shared libs.  Just like for a non-bootstrapped compiler
front-end, yeah.

Whereas for a bootstrapped compiler, you'd want to use what, the stage2
compiler, that built other final host tools, to build gnattools as well?

All doable, but not without additional complexity.  I'm afraid I fail to
see the upside.

Unfortunately we don't have a lot of other post-bootstrap host tools
linked (on native builds) with target libraries to draw parallels.  I
guess we could reason about them as if they were part of the gcc/
subdir, and this would lead down the path you suggest.  I will then
point to the complications arising from using a just-built libstdc++ in
subsequent stages, and in later target builds in the same stage, and
conclude it's far from trivial, it's actually quite error-prone and
hardly glitch-free, and so it's not too hard to understand why gnat,
that had to take care of this before we even thought of bootstrapping
libstdc++, took a different path.

Anyway...  I won't take your suggestion as an objection to the proposed
patch, but I will say that I still have a lot to learn about the Ada
build machinery to be able to grasp the impact your suggestion might
have on existing uses, so, as much as I like uniformity and symmetry,
I'm not jumping right onto it ;-)

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Peter Bergner
On 11/13/18 9:01 AM, Renlin Li wrote:
> I could verify that, your patch fixes all the ICEs I saw with 
> arm-linux-gnueabihf toolchain!
> There are some differences on the test results, because I compare the latest 
> results with something which is old.
> 
> I haven't test it on bare-metal toolchain yet. But will do to ensure all 
> related issues are fixed.

Hi Renlin,

That's excellent news!  My guess on the testsuite results changes is that
they're probably caused by the combine changes/fixes that went in around
the same time as my patches.

If you want to disable the special copy handling, which should only help
things since it gives RA more freedom, you can apply the patch I mentioned
here:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00379.html

which allows you to turn on and off the optimization with an option.


Jeff and Vlad,

I think with the above results, I think the patch is ready for review.
I'm attaching the latest updated patch below.

Again, this passed bootstrap and regtesting on powerpc64le-linux with
no regressions.  Ok for mainline?

Peter


gcc/
PR rtl-optimization/87899
* lra-lives.c (start_living): Update white space in comment.
(enum point_type): New.
(sparseset_contains_pseudos_p): New function.
(update_pseudo_point): Likewise.
(make_hard_regno_live): Use HARD_REGISTER_NUM_P macro.
(make_hard_regno_dead): Likewise.  Remove ignore_reg_for_conflicts
handling.  Move early exit after adding conflicts.
(mark_pseudo_live): Use HARD_REGISTER_NUM_P macro.  Add early exit
if regno is already live.  Remove all handling of program points.
(mark_pseudo_dead): Use HARD_REGISTER_NUM_P macro.  Add early exit
after adding conflicts.  Remove all handling of program points and
ignore_reg_for_conflicts.
(mark_regno_live): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_live.
(mark_regno_dead): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_dead.
(check_pseudos_live_through_calls): Use HARD_REGISTER_NUM_P macro.
(process_bb_lives): Use HARD_REGISTER_NUM_P and HARD_REGISTER_P macros.
Use new function update_pseudo_point.  Handle register copies by
removing the source register from the live set.  Handle INOUT operands.
Update to the next program point using the unused_set, dead_set and
start_dying sets.
(lra_create_live_ranges_1): Use HARD_REGISTER_NUM_P macro.

Index: gcc/lra-lives.c
===
--- gcc/lra-lives.c (revision 265971)
+++ gcc/lra-lives.c (working copy)
@@ -83,7 +83,7 @@ static HARD_REG_SET hard_regs_live;
 
 /* Set of pseudos and hard registers start living/dying in the current
insn.  These sets are used to update REG_DEAD and REG_UNUSED notes
-   in the insn. */
+   in the insn.  */
 static sparseset start_living, start_dying;
 
 /* Set of pseudos and hard regs dead and unused in the current
@@ -96,10 +96,6 @@ static bitmap_head temp_bitmap;
 /* Pool for pseudo live ranges. */
 static object_allocator lra_live_range_pool ("live ranges");
 
-/* If non-NULL, the source operand of a register to register copy for which
-   we should not add a conflict with the copy's destination operand.  */
-static rtx ignore_reg_for_conflicts;
-
 /* Free live range list LR.  */
 static void
 free_live_range_list (lra_live_range_t lr)
@@ -224,6 +220,57 @@ lra_intersected_live_ranges_p (lra_live_
   return false;
 }
 
+enum point_type {
+  DEF_POINT,
+  USE_POINT
+};
+
+/* Return TRUE if set A contains a pseudo register, otherwise, return FALSE.  
*/
+static bool
+sparseset_contains_pseudos_p (sparseset a)
+{
+  int regno;
+  EXECUTE_IF_SET_IN_SPARSESET (a, regno)
+if (!HARD_REGISTER_NUM_P (regno))
+  return true;
+  return false;
+}
+
+/* Mark pseudo REGNO as living or dying at program point POINT, depending on
+   whether TYPE is a definition or a use.  If this is the first reference to
+   REGNO that we've encountered, then create a new live range for it.  */
+
+static void
+update_pseudo_point (int regno, int point, enum point_type type)
+{
+  lra_live_range_t p;
+
+  /* Don't compute points for hard registers.  */
+  if (HARD_REGISTER_NUM_P (regno))
+return;
+
+  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+{
+  if (type == DEF_POINT)
+   {
+ if (sparseset_bit_p (pseudos_live, regno))
+   {
+ p = lra_reg_info[regno].live_ranges;
+ lra_assert (p != NULL);
+ p->finish = point;
+   }
+   }
+  else /* USE_POINT */
+   {
+ if (!sparseset_bit_p (pseudos_live, regno)
+ && ((p = lra_reg_info[regno].live_ranges) == NULL
+ || (p->finish != point && p->finish + 1 != point)))
+   lra_reg_info[regno].l

Re: [PATCH] Fix PR86991

2018-11-13 Thread Richard Biener
On Tue, 13 Nov 2018, Richard Biener wrote:

> 
> This PR shows we have stale reduction groups lying around because
> the fixup doesn't work reliably with reduction chains.  Fixed by
> delaying the build to after detection is successful.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

The following slightly fixed is what I have applied.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/86991
* tree-vect-loop.c (vect_is_slp_reduction): Delay reduction
group building until we have successfully detected the SLP
reduction.
(vect_is_simple_reduction): Remove fixup code here.

* gcc.dg/pr86991.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266071)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2466,7 +2466,7 @@ vect_is_slp_reduction (loop_vec_info loo
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
   enum tree_code code;
   gimple *loop_use_stmt = NULL;
-  stmt_vec_info use_stmt_info, current_stmt_info = NULL;
+  stmt_vec_info use_stmt_info;
   tree lhs;
   imm_use_iterator imm_iter;
   use_operand_p use_p;
@@ -2476,6 +2476,7 @@ vect_is_slp_reduction (loop_vec_info loo
   if (loop != vect_loop)
 return false;
 
+  auto_vec reduc_chain;
   lhs = PHI_RESULT (phi);
   code = gimple_assign_rhs_code (first_stmt);
   while (1)
@@ -2528,17 +2529,9 @@ vect_is_slp_reduction (loop_vec_info loo
 
   /* Insert USE_STMT into reduction chain.  */
   use_stmt_info = loop_info->lookup_stmt (loop_use_stmt);
-  if (current_stmt_info)
-{
- REDUC_GROUP_NEXT_ELEMENT (current_stmt_info) = use_stmt_info;
-  REDUC_GROUP_FIRST_ELEMENT (use_stmt_info)
-= REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-}
-  else
-   REDUC_GROUP_FIRST_ELEMENT (use_stmt_info) = use_stmt_info;
+  reduc_chain.safe_push (use_stmt_info);
 
   lhs = gimple_assign_lhs (loop_use_stmt);
-  current_stmt_info = use_stmt_info;
   size++;
}
 
@@ -2548,10 +2541,9 @@ vect_is_slp_reduction (loop_vec_info loo
   /* Swap the operands, if needed, to make the reduction operand be the second
  operand.  */
   lhs = PHI_RESULT (phi);
-  stmt_vec_info next_stmt_info = REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-  while (next_stmt_info)
+  for (unsigned i = 0; i < reduc_chain.length (); ++i)
 {
-  gassign *next_stmt = as_a  (next_stmt_info->stmt);
+  gassign *next_stmt = as_a  (reduc_chain[i]->stmt);
   if (gimple_assign_rhs2 (next_stmt) == lhs)
{
  tree op = gimple_assign_rhs1 (next_stmt);
@@ -2565,7 +2557,6 @@ vect_is_slp_reduction (loop_vec_info loo
  && vect_valid_reduction_input_p (def_stmt_info))
{
  lhs = gimple_assign_lhs (next_stmt);
- next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
  continue;
}
 
@@ -2600,14 +2591,20 @@ vect_is_slp_reduction (loop_vec_info loo
 }
 
   lhs = gimple_assign_lhs (next_stmt);
-  next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
 }
 
+  /* Build up the actual chain.  */
+  for (unsigned i = 0; i < reduc_chain.length () - 1; ++i)
+{
+  REDUC_GROUP_FIRST_ELEMENT (reduc_chain[i]) = reduc_chain[0];
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain[i]) = reduc_chain[i+1];
+}
+  REDUC_GROUP_FIRST_ELEMENT (reduc_chain.last ()) = reduc_chain[0];
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain.last ()) = NULL;
+
   /* Save the chain for further analysis in SLP detection.  */
-  stmt_vec_info first_stmt_info
-= REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-  LOOP_VINFO_REDUCTION_CHAINS (loop_info).safe_push (first_stmt_info);
-  REDUC_GROUP_SIZE (first_stmt_info) = size;
+  LOOP_VINFO_REDUCTION_CHAINS (loop_info).safe_push (reduc_chain[0]);
+  REDUC_GROUP_SIZE (reduc_chain[0]) = size;
 
   return true;
 }
@@ -3182,16 +3195,6 @@ vect_is_simple_reduction (loop_vec_info
   return def_stmt_info;
 }
 
-  /* Dissolve group eventually half-built by vect_is_slp_reduction.  */
-  stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (def_stmt_info);
-  while (first)
-{
-  stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (first);
-  REDUC_GROUP_FIRST_ELEMENT (first) = NULL;
-  REDUC_GROUP_NEXT_ELEMENT (first) = NULL;
-  first = next;
-}
-
   /* Look for the expression computing loop_arg from loop PHI result.  */
   if (check_reduction_path (vect_location, loop, phi, loop_arg, code))
 return def_stmt_info;
Index: gcc/testsuite/gcc.dg/pr86991.c
===
--- gcc/testsuite/gcc.dg/pr86991.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr86991.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int b;
+extern unsigned c[];
+unsigned d;
+long e;
+
+void f()
+{
+  unsigned g, h;
+  for (; d; d += 2) {
+

Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Peter Bergner
On 11/13/18 2:53 AM, Eric Botcazou wrote:
>> +static rtx
>> +simple_move_operator (rtx x)
>> +{
>> +  /* A word sized rotate of a register pair is equivalent to swapping
>> + the registers in the register pair.  */
>> +  if (GET_CODE (x) == ROTATE
>> +  && GET_MODE (x) == twice_word_mode
>> +  && simple_move_operand (XEXP (x, 0))
>> +  && CONST_INT_P (XEXP (x, 1))
>> +  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
>> +return XEXP (x, 0);;
>> +
>> +  return NULL_RTX;
>> +}
>> +
> 
> Superfluous semi-colon.  Given that the function returns an operand, its name 
> is IMO misleading, so maybe [get_]operand_for_simple_move_operator.

Fixed and renamed function to operand_for_simple_move_operator.


> Can we factor out the duplicate manipulation into a function here, for 
> example 
> resolve_operand_for_simple_move_operator?

Good idea.  I went with your name, but would swap_concatn_operands() be a
better name, since that is what it is doing?  I'll leave it up to you.
Updated patch below.

Peter


gcc/
PR rtl-optimization/87507
* lower-subreg.c (operand_for_simple_move_operator): New function.
(simple_move): Strip simple operators.
(find_pseudo_copy): Likewise.
(resolve_operand_for_simple_move_operator): New function.
(resolve_simple_move): Strip simple operators and swap operands.

gcc/testsuite/
PR rtl-optimization/87507
* gcc.target/powerpc/pr87507.c: New test.
* gcc.target/powerpc/pr68805.c: Update expected results.

Index: gcc/lower-subreg.c
===
--- gcc/lower-subreg.c  (revision 265971)
+++ gcc/lower-subreg.c  (working copy)
@@ -320,6 +320,24 @@ simple_move_operand (rtx x)
   return true;
 }
 
+/* If X is an operator that can be treated as a simple move that we
+   can split, then return the operand that is operated on.  */
+
+static rtx
+operand_for_simple_move_operator (rtx x)
+{
+  /* A word sized rotate of a register pair is equivalent to swapping
+ the registers in the register pair.  */
+  if (GET_CODE (x) == ROTATE
+  && GET_MODE (x) == twice_word_mode
+  && simple_move_operand (XEXP (x, 0))
+  && CONST_INT_P (XEXP (x, 1))
+  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
+return XEXP (x, 0);
+
+  return NULL_RTX;
+}
+
 /* If INSN is a single set between two objects that we want to split,
return the single set.  SPEED_P says whether we are optimizing
INSN for speed or size.
@@ -330,7 +348,7 @@ simple_move_operand (rtx x)
 static rtx
 simple_move (rtx_insn *insn, bool speed_p)
 {
-  rtx x;
+  rtx x, op;
   rtx set;
   machine_mode mode;
 
@@ -348,6 +366,9 @@ simple_move (rtx_insn *insn, bool speed_
 return NULL_RTX;
 
   x = SET_SRC (set);
+  if ((op = operand_for_simple_move_operator (x)) != NULL_RTX)
+x = op;
+
   if (x != recog_data.operand[0] && x != recog_data.operand[1])
 return NULL_RTX;
   /* For the src we can handle ASM_OPERANDS, and it is beneficial for
@@ -386,9 +407,13 @@ find_pseudo_copy (rtx set)
 {
   rtx dest = SET_DEST (set);
   rtx src = SET_SRC (set);
+  rtx op;
   unsigned int rd, rs;
   bitmap b;
 
+  if ((op = operand_for_simple_move_operator (src)) != NULL_RTX)
+src = op;
+
   if (!REG_P (dest) || !REG_P (src))
 return false;
 
@@ -846,6 +871,21 @@ can_decompose_p (rtx x)
   return true;
 }
 
+/* OPND is a concatn operand this is used with a simple move operator.
+   Return a new rtx with the concatn's operands swapped.  */
+
+static rtx
+resolve_operand_for_simple_move_operator (rtx opnd)
+{
+  gcc_assert (GET_CODE (opnd) == CONCATN);
+  rtx concatn = copy_rtx (opnd);
+  rtx op0 = XVECEXP (concatn, 0, 0);
+  rtx op1 = XVECEXP (concatn, 0, 1);
+  XVECEXP (concatn, 0, 0) = op1;
+  XVECEXP (concatn, 0, 1) = op0;
+  return concatn;
+}
+
 /* Decompose the registers used in a simple move SET within INSN.  If
we don't change anything, return INSN, otherwise return the start
of the sequence of moves.  */
@@ -853,7 +893,7 @@ can_decompose_p (rtx x)
 static rtx_insn *
 resolve_simple_move (rtx set, rtx_insn *insn)
 {
-  rtx src, dest, real_dest;
+  rtx src, dest, real_dest, src_op;
   rtx_insn *insns;
   machine_mode orig_mode, dest_mode;
   unsigned int orig_size, words;
@@ -876,6 +916,23 @@ resolve_simple_move (rtx set, rtx_insn *
 
   real_dest = NULL_RTX;
 
+  if ((src_op = operand_for_simple_move_operator (src)) != NULL_RTX)
+{
+  if (resolve_reg_p (dest))
+   {
+ /* DEST is a CONCATN, so swap its operands and strip
+SRC's operator.  */
+ dest = resolve_operand_for_simple_move_operator (dest);
+ src = src_op;
+   }
+  else if (resolve_reg_p (src_op))
+   {
+ /* SRC is an operation on a CONCATN, so strip the operator and
+swap the CONCATN's operands.  */
+ src = resolve_operand_for_simple_move_operator (src_op);
+   }
+}
+
   if (GET_CODE (src) == SUBREG
   && resolve_reg_p (

Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 4:51 PM Alexandre Oliva  wrote:
>
> On Nov 13, 2018, Richard Biener  wrote:
>
> > Reworking gnattools build to always use host CC/CXX in "stage1" (or for 
> > crosses)
> > rather than doing sth different.
>
> > Yeah, but gnattools is bootstrapped, right?
>
> No, it's not built in stage1, it's a post-bootstrap host subpackage.

Huh, indeed - it's a host_module without bootstrap ...  and libada is
a target_module not bootstrapped either.  So we're indeed in a curious
situation where we have a bootstrap of Ada requiring a host Ada but
nothing of Ada is actually bootstrapped ... ;)

> > For --disable-bootstrap you get binaries built with the host compiler
> > throughout and that's good.
>
> I guess it *could* be built with the host compiler, and linked with that
> runtime.  Then, in order to run it, you might still need the previous
> gnat rts shared libs.  Just like for a non-bootstrapped compiler
> front-end, yeah.

Yeah, I expected that for non-bootstrap.  And I somehow assumed it
was bootstrapped so I'd get gnattools and gnat1 not depending on the
host compiler libs.  I guess we're lucky for gnat1 because it's written
in C?

At least I now somewhat understand why we wind up with the
bootstrapped CC/CXX for the build of the stage3 host modules.

>
> Whereas for a bootstrapped compiler, you'd want to use what, the stage2
> compiler, that built other final host tools, to build gnattools as well?
>
> All doable, but not without additional complexity.  I'm afraid I fail to
> see the upside.
>
> Unfortunately we don't have a lot of other post-bootstrap host tools
> linked (on native builds) with target libraries to draw parallels.  I
> guess we could reason about them as if they were part of the gcc/
> subdir, and this would lead down the path you suggest.  I will then
> point to the complications arising from using a just-built libstdc++ in
> subsequent stages, and in later target builds in the same stage, and
> conclude it's far from trivial, it's actually quite error-prone and
> hardly glitch-free, and so it's not too hard to understand why gnat,
> that had to take care of this before we even thought of bootstrapping
> libstdc++, took a different path.
>
> Anyway...  I won't take your suggestion as an objection to the proposed
> patch, but I will say that I still have a lot to learn about the Ada
> build machinery to be able to grasp the impact your suggestion might
> have on existing uses, so, as much as I like uniformity and symmetry,
> I'm not jumping right onto it ;-)

;)

Richard.

>
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions

2018-11-13 Thread Tamar Christina
Hi Joseph,

> What types exactly is this meant to apply to?  Floating-point?  Integer?  
> Mixtures of those?  (I'm guessing not mixtures, because those would be 
> something other than "convert" here.)

Originally I had it for both Floating-point and Integer, but not a mix of the 
two.

> For integer types, it's not safe, in that if e.g. F is int and X is unsigned 
> long long, you're changing from defined overflow to undefined overflow.

This is a good point, so I should limited it to floating point formats only.

> For floating-point types, using TYPE_PRECISION is suspect (it's not 
> wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the 
> machine modes in question.

Would restricting it to flag_unsafe_math_optimizations not be enough in this 
case? Since if it's only done for unsafe math
then you likely won't care about a small loss in precision anyway?

> Specifically, see convert.c:convert_to_real_1, the long comment starting 
> "Sometimes this transformation is safe (cannot change results through 
> affecting double rounding cases) and sometimes it is not.", and the 
> associated code calling real_can_shorten_arithmetic.  I think that code in 
> convert.c ought to apply to your case of half precision converted to float 
> for arithmetic and then converted back to half precision afterwards.  (In the 
> case where the excess precision configuration - which depends on 
> TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half 
> precision, anyway.)

Sorry I hadn't attached a test-case because I wanted to get some feedback 
before.

But my simple testcase is

#include 

#define N 200

void foo (_Float16 complex a[N], _Float16 complex b[N], _Float16 complex *c)
{
  for (int x = 0; x < N; x++)
c[x] = a[x] + b[x] * I;
}

A simplified version of one of the expressions in the gimple tree for this is

  _7 = IMAGPART_EXPR <*_4>;
  _13 = REALPART_EXPR <*_3>;
  _8 = (floatD.30) _7;
  _14 = (floatD.30) _13;
  _35 = _14 + _8;
  _19 = (_Float16D.33) _35;
  IMAGPART_EXPR <*_20> = _19;

note that the type of _4, _3 and _19 are that of half float.

  _4 = a_26(D) + _2;
  _3 = b_25(D) + _2;
  _20 = c_28(D) + _2;

  complex _Float16D.43 * a_26(D) = aD.3538;
  complex _Float16D.43 * b_25(D) = bD.3539;
  complex _Float16D.43 * c_28(D) = cD.3540;

If the code is SLP vectorized, then inside the tree (without my match.pd rule)
it will contain the casts, which is evident from the resulting vector code

ldr h0, [x6, x3]
ldr h2, [x1, x3]
fcvts0, h0
fcvts2, h2
fadds0, s0, s2
fcvth1, s1
fcvth0, s0
str h1, [x2, x3]
str h0, [x4, x3]

The problem seems to be (as far as I can tell) that when a loop is present, a 
new
temporary is created to store the intermediate result, while it's creating the
destination variable with the dest + offset.  It can't do the cast and the 
computation
at the same time.

foo (complex _Float16 * a, complex _Float16 * b, complex _Float16 * c)
{
  complex _Float16 D_3548;
  complex _Float16 D_3549;
  complex float D_3550;

The type of this temporary seems to be the type of the inner computation, so in
this case float due to the 90 rotation of b in the C file.

This also has the effect that it seems to make the type of the imaginary and 
real
expressions all float

So when convert.c tries to convert them, all it sees are

 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x7ffb18e362a0 precision:32
pointer_to_this >
side-effects
arg:0  -Original Message-
> From: Joseph Myers 
> Sent: Tuesday, November 13, 2018 00:49
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> ; l...@redhat.com; i...@airs.com;
> rguent...@suse.de
> Subject: Re: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away
> unneeded type casts in expressions
> 
> On Sun, 11 Nov 2018, Tamar Christina wrote:
> 
> > This patch adds a match.pd rule for stripping away the type converts
> > when you're converting to a type that has twice the precision of the
> > current type in the same class, doing a simple math operation on it
> > and converting back to the smaller type.
> 
> What types exactly is this meant to apply to?  Floating-point?  Integer?
> Mixtures of those?  (I'm guessing not mixtures, because those would be
> something other than "convert" here.)
> 
> For integer types, it's not safe, in that if e.g. F is int and X is unsigned 
> long
> long, you're changing from defined overflow to undefined overflow.
> 
> For floating-point types, using TYPE_PRECISION is suspect (it's not
> wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the
> machine modes in question.
> 
> Specifically, s

Re: record_ranges_from_incoming_edge: use value_range API for creating new range

2018-11-13 Thread Aldy Hernandez
With your cleanups, the main raison d'etre for my patch goes away, but 
here is the promised removal of ignore_equivs_equal_p.


I think the == operator is a bit confusing, and equality intent should 
be clearly specified.  I am providing the following for the derived 
class (with no hidden default arguments):


bool equal_p (const value_range &, bool ignore_equivs) const;

and providing the following for the base class:

bool equal_p (const value_range_base &) const;

I am also removing access to both the == and the != operators.  It 
should now be clear from the code whether the equivalence bitmap is 
being taken into account or not.


What do you think?

Aldy


[PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-11-13 Thread Peter Bergner
PR87870 shows a problem loading simple constant values into TImode variables.
This is a regression ever since VSX was added and we started using the
vsx_mov_64bit pattern.  We still get the correct code on trunk if we
compile with -mno-vsx, since we fall back to using the older mov_ppc64
move pattern, which has an alternative "r" <- "n".

Our current vsx_mov_64bit pattern currently has two alternatives for
loading constants into GPRs, one using "*r" <- "jwM" and "??r" <- "W".
These look redundant to me, since "W" contains support for both all-zero
constants (ie, "j") and all-one constants (ie, wM) as well as a few more.
My patch below consolidates them both and uses a new mode iterator that
uses "W" for the vector modes and "n" for TImode like mov_ppc64
used.

I'll note I didn't change the vsx_mov_32bit pattern, since TImode
isn't supported with -m32.  However, if you want, I could remove the
redundant "*r" <- "jwM" alternative there too?

This passes bootstrap and regtesting with no regressions.  Ok for trunk?

Peter


gcc/
PR target/87870
* config/rs6000/vsx.md (nW): New mode iterator.
(vsx_mov_64bit): Use it.  Remove redundant GPR 0/-1 alternative.

gcc/testsuite/
PR target/87870
* gcc.target/powerpc/pr87870.c: New test.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 265971)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r  [(V16QI "??r")
 (TF"??r")
 (TI"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW   [(V16QI "W")
+(V8HI  "W")
+(V4SI  "W")
+(V4SF  "W")
+(V2DI  "W")
+(V2DF  "W")
+(V1TI  "W")
+(KF"W")
+(TF"W")
+(TI"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
   (V2DF  "v2di")
@@ -1193,17 +1205,17 @@ (define_insn_and_split "*xxspltib_
 
 ;;  VSX store  VSX load   VSX move  VSX->GPR   GPR->VSXLQ (GPR)
 ;;  STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIBVSPLTISW
-;;  VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX 
(VMX)
+;;  VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO,  , , r, we,?wQ,
 ?&r,   ??r,   ??Y,   , wo,v,
-?,*r,v, ??r,   wZ,v")
+?,v, , wZ,v")
 
(match_operand:VSX_M 1 "input_operand" 
", ZwO,   , we,r, r,
 wQ,Y, r, r, wE,jwM,
-?jwM,  jwM,   W, W, v, wZ"))]
+?jwM,  W, ,  v, wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)
&& (register_operand (operands[0], mode) 
@@ -1214,12 +1226,12 @@ (define_insn "vsx_mov_64bit"
   [(set_attr "type"
"vecstore,  vecload,   vecsimple, mffgpr,mftgpr,load,
 store, load,  store, *, vecsimple, 
vecsimple,
-vecsimple, *, *, *, vecstore,  
vecload")
+vecsimple, *, *, vecstore,  vecload")
 
(set_attr "length"
"4, 4, 4, 8, 4, 8,
 8, 8, 8, 8, 4, 4,
-4, 8, 20,20,4, 4")])
+4, 20,20,4, 4")])
 
 ;;  VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;  XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR 
const
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 265971)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r  [(V16QI "??r")
 (TF"??r")
 (TI"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW   [(V16QI "W")
+(V8HI  "W")
+(V4SI  "W")
+(V4SF  "W")
+(V2DI  "W")
+(V2DF  "W")
+(V1TI  "W")
+(KF"W")
+(TF"W")
+(TI"n")])
+
 ;; Same size i

Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Peter Bergner
On 11/13/18 5:17 AM, Segher Boessenkool wrote:
> On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
>> On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
>>> On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
>>> On darwin, we (darwin, as a platform decision) like all instructions 
>>> available from the assembler.
>>
>> OK, fair enough.  Another option is to just disable -many when gcc is
>> in development, like we enable checking.
> 
> That is a good plan for GCC 9 at least.

I like the plan too.  We can also continue to pass -many just for darwin
if they really really think they need it.

Peter



[PATCH][RFC] Come up with -flive-patching master option.

2018-11-13 Thread Qing Zhao
Hi,


Attached is the patch for new -flive-patching=[inline-only-static | 
inline-clone] master option.

'-flive-patching=LEVEL'
 Control GCC's optimizations to provide a safe compilation for
 live-patching.  Provides multiple-level control on how many of the
 optimizations are enabled by users' request.  The LEVEL argument
 should be one of the following:

 'inline-only-static'

  Only enable inlining of static functions, disable all other
  ipa optimizations/analyses.  As a result, when patching a
  static routine, all its callers need to be patches as well.

 'inline-clone'

  Only enable inlining and all optimizations that internally
  create clone, for example, cloning, ipa-sra, partial inlining,
  etc.; disable all other ipa optimizations/analyses.  As a
  result, when patching a routine, all its callers and its
  clones' callers need to be patched as well.

 When -flive-patching specified without any value, the default value
 is "inline-clone".

 This flag is disabled by default.

let me know your comments and suggestions on the implementation.

thanks a lot.

Qing



flive-patching.patch
Description: Binary data


> On Nov 12, 2018, at 4:29 PM, Qing Zhao  wrote:
> 
> 
>> On Nov 12, 2018, at 2:53 AM, Martin Liška  wrote:
>> 
>>> 
>>> Okay, I see.
>>> 
>>> I am also working on a similar option as yours, but make the 
>>> -flive-patching as two level control:
>>> 
>>> +flive-patching
>>> +Common RejectNegative Alias(flive-patching=,inline-clone)
>>> +
>>> +flive-patching=
>>> +Common Report Joined RejectNegative Enum(live_patching_level) 
>>> Var(flag_live_patching) Init(LIVE_NONE)
>>> +-flive-patching=[inline-only-static|inline-clone]  Control 
>>> optimizations to provide a safe comp for live-patching purpose.
>>> 
>>> the implementation for -flive-patching=inline-clone (the default) is 
>>> exactly as yours,  the new level -flive-patching=inline-only-static
>>> is to only enable inlining of static function for live patching, which is 
>>> important for multiple-processes live patching to control memory
>>> consumption. 
>>> 
>>> (please see my 2nd version of the -flive-patching proposal).
>>> 
>>> I will send out my complete patch in another email.
>> 
>> Hi, sure, works for me. Let's make 2 level option.
> 
> thank you.
> 
> I will send the patch tomorrow.
> 
> Qing
>> 
>> Martin
> 



Re: [PATCH] -fsave-optimization-record: compress the output using zlib

2018-11-13 Thread Kyrill Tkachov

Hi David,

On 09/11/18 21:00, Jeff Law wrote:

On 11/9/18 10:51 AM, David Malcolm wrote:
> One of the concerns noted at Cauldron about -fsave-optimization-record
> was the size of the output files.
>
> This file implements compression of the -fsave-optimization-record
> output, using zlib.
>
> I did some before/after testing of this patch, using SPEC 2017's
> 502.gcc_r with -O3, looking at the sizes of the generated
> FILENAME.opt-record.json[.gz] files.
>
> The largest file was for insn-attrtab.c:
>   before:  171736285 bytes (164M)
>   after: 5304015 bytes (5.1M)
>
> Smallest file was for vasprintf.c:
>   before:  30567 bytes
>   after:4485 bytes
>
> Median file by size before was lambda-mat.c:
>   before:2266738 bytes (2.2M)
>   after:   75988 bytes (15K)
>
> Total of all files in the benchmark:
>   before: 2041720713 bytes (1.9G)
>   after:66870770 bytes (63.8M)
>
> ...so clearly compression is a big win in terms of file size, at the
> cost of making the files slightly more awkward to work with. [1]
> I also wonder if we want to support any pre-filtering of the output
> (FWIW roughly half of the biggest file seems to be "Adding assert for "
> messages from tree-vrp.c).
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>


So does this now add a dependency on zlib?
I can't build GCC on my aarch64-none-linux machine after this patch due to a 
missing zlib.h.
I see there's a zlib in the top-level GCC tree. Is that build/used during the 
GCC build itself?

Thanks,
Kyrill


> [1] I've updated my optrecord.py module to deal with this, which
> simplifies things; it's still not clear to me if that should live
> in "contrib/" or not.
>
> gcc/ChangeLog:
>* doc/invoke.texi (-fsave-optimization-record): Note that the
>output is compressed.
>* optinfo-emit-json.cc: Include .
>(optrecord_json_writer::write): Compress the output.
OK.
jeff




RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions

2018-11-13 Thread Joseph Myers
On Tue, 13 Nov 2018, Tamar Christina wrote:

> Would restricting it to flag_unsafe_math_optimizations not be enough in 
> this case? Since if it's only done for unsafe math then you likely won't 
> care about a small loss in precision anyway?

We have what should be the right logic (modulo DFP issues) in 
convert_to_real_1, for converting (outertype)((innertype0)a+(innertype1)b) 
into ((newtype)a+(newtype)b).  I think the right thing to do is to move 
that logic, including the associated comments, from convert_to_real_1 to 
match.pd (not duplicate it or a subset in both places - make match.pd able 
to do everything that logic in convert.c does, so it's no longer needed in 
convert.c).  That logic knows both when the conversion is OK based on 
flag_unsafe_math_optimizations, and when it's OK based on 
real_can_shorten_arithmetic.

(Most of the other special case logic in convert_to_real_1 to optimize 
particular conversions would make sense to move as well - but for your 
present issue, only the PLUS_EXPR / MINUS_EXPR / MULT_EXPR / RDIV_EXPR 
logic should need to move.)

> But my simple testcase is

I'd hope a simpler test could be added - one not involving complex 
arithmetic at all, just an intermediate temporary variable or whatever's 
needed for convert_to_real_1 not to apply.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Iain Sandoe
Hi Folks,

> On 13 Nov 2018, at 17:48, Peter Bergner  wrote:
> 
> On 11/13/18 5:17 AM, Segher Boessenkool wrote:
>> On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
>>> On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
 On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
 On darwin, we (darwin, as a platform decision) like all instructions 
 available from the assembler.
>>> 
>>> OK, fair enough.  Another option is to just disable -many when gcc is
>>> in development, like we enable checking.
>> 
>> That is a good plan for GCC 9 at least.
> 
> I like the plan too.  We can also continue to pass -many just for darwin
> if they really really think they need it.

As far as I expect, Darwin should be untouched by this - we have a separate 
assembler (which doesn’t even respond to -many), so unless there’s some higher 
level translation done (it’s not mentioned in any Darwin specs), we should just 
carry on as before.

When I do expect things to change is when multiple .machine directives are 
included in asm sources.
(probably) the old cctools assembler won’t deal with them properly
(the 4.0.1 era) LLVM-backend based version I have doesn’t deal with them either 
(this could be a general consideration for the other parts of the PPC 
toolchain).  Having said that, I didn’t experiment with .machine on later LLVM 
backend versions yet.

Thus, my current expectation is that this will be a NOP unless/until 
incompatible asm source changes are made.

Iain



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Andi Kleen
On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote:
> > I even had an earlier version of this that instrumented
> > assembler output of the compiler with PTWRITE in a separate script,
> > and it worked fine too.
> 
> Apart from eventually messing up branch range restrictions I guess ;)

You mean for LOOP? For everything else the assembler handles it I
believe.

> Did you gather any statistics on how many ptwrite instructions
> that are generated by your patch are not covered by any
> location range & expr?  

Need to look into that. Any suggestions how to do it in the compiler?

I had some decode failures with the perf dwarf decoder,
but I was usually blaming them on perf dwarf limitations. 

> I assume ptwrite is writing from register
> input only so you probably should avoid instrumenting writes
> of constants (will require an extra register)?

Hmm, I think those are needed unfortunately because someone
might want to trace every update of of something. With branch
tracing it could be recreated theoretically but would 
be a lot more work for the decoder.

> How does the .text size behave say for cc1 when you enable
> the various granularities of instrumentation?  How many
> ptwrite instructions are there per 100 regular instructions?

With locals tracing (worst case) I see ~23% of all instructions
in cc1 be PTWRITE. Binary is ~27% bigger.

> Can we get an updated patch based on my review?

Yes, working on it, also addressing Martin's comments. Hopefully soon.
> 
> I still think we should eventually move the pass later

It's after pass_sanopt now.

> avoid instrumenting places we'll not have any meaningful locations
> in the debug info - if only to reduce required trace bandwith.

Can you suggest how to check that?

-Andi


[PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 09/11/18 21:00, Jeff Law wrote:
> > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > One of the concerns noted at Cauldron about -fsave-optimization-
> > > record
> > > was the size of the output files.
> > > 
> > > This file implements compression of the -fsave-optimization-
> > > record
> > > output, using zlib.
> > > 
> > > I did some before/after testing of this patch, using SPEC 2017's
> > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > FILENAME.opt-record.json[.gz] files.
> > > 
> > > The largest file was for insn-attrtab.c:
> > >   before:  171736285 bytes (164M)
> > >   after: 5304015 bytes (5.1M)
> > > 
> > > Smallest file was for vasprintf.c:
> > >   before:  30567 bytes
> > >   after:4485 bytes
> > > 
> > > Median file by size before was lambda-mat.c:
> > >   before:2266738 bytes (2.2M)
> > >   after:   75988 bytes (15K)
> > > 
> > > Total of all files in the benchmark:
> > >   before: 2041720713 bytes (1.9G)
> > >   after:66870770 bytes (63.8M)
> > > 
> > > ...so clearly compression is a big win in terms of file size, at
> > > the
> > > cost of making the files slightly more awkward to work with. [1]
> > > I also wonder if we want to support any pre-filtering of the
> > > output
> > > (FWIW roughly half of the biggest file seems to be "Adding assert
> > > for "
> > > messages from tree-vrp.c).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> 
> So does this now add a dependency on zlib?
> I can't build GCC on my aarch64-none-linux machine after this patch
> due to a missing zlib.h.
> I see there's a zlib in the top-level GCC tree. Is that build/used
> during the GCC build itself?
> 
> Thanks,
> Kyrill

Sorry about that.  Does the following patch fix the build for you?

gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
1.8.5.3



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Richard Biener
On November 13, 2018 7:09:15 PM GMT+01:00, Andi Kleen  
wrote:
>On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote:
>> > I even had an earlier version of this that instrumented
>> > assembler output of the compiler with PTWRITE in a separate script,
>> > and it worked fine too.
>> 
>> Apart from eventually messing up branch range restrictions I guess ;)
>
>You mean for LOOP? For everything else the assembler handles it I
>believe.
>
>> Did you gather any statistics on how many ptwrite instructions
>> that are generated by your patch are not covered by any
>> location range & expr?  
>
>Need to look into that. Any suggestions how to do it in the compiler?

I guess you need to do that in a dwarf decoder somehow. 

>I had some decode failures with the perf dwarf decoder,
>but I was usually blaming them on perf dwarf limitations. 
>
>> I assume ptwrite is writing from register
>> input only so you probably should avoid instrumenting writes
>> of constants (will require an extra register)?
>
>Hmm, I think those are needed unfortunately because someone
>might want to trace every update of of something. With branch
>tracing it could be recreated theoretically but would 
>be a lot more work for the decoder.
>
>> How does the .text size behave say for cc1 when you enable
>> the various granularities of instrumentation?  How many
>> ptwrite instructions are there per 100 regular instructions?
>
>With locals tracing (worst case) I see ~23% of all instructions
>in cc1 be PTWRITE. Binary is ~27% bigger.

OK, I suppose it will get better when addressing some of my review comments. 

>> Can we get an updated patch based on my review?
>
>Yes, working on it, also addressing Martin's comments. Hopefully soon.
>> 
>> I still think we should eventually move the pass later
>
>It's after pass_sanopt now.
>
>> avoid instrumenting places we'll not have any meaningful locations
>> in the debug info - if only to reduce required trace bandwith.
>
>Can you suggest how to check that?

I'd look at doing the instrumentation after var-tracking has run - that is what 
computes the locations in the end. That means instrumenting on late RTL after 
register allocation (and eventually with branch range restrictions in place). 
Basically you'd instrument at the same time as generating debug info.

Richard. 

>-Andi



Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Peter Bergner
On 11/13/18 12:06 PM, Iain Sandoe wrote:
> As far as I expect, Darwin should be untouched by this - we have a separate 
> assembler (which doesn’t even respond to -many), so unless there’s some 
> higher level translation done (it’s not mentioned in any Darwin specs), we 
> should just carry on as before.

Ah, good then.

> When I do expect things to change is when multiple .machine directives are 
> included in asm sources.
> (probably) the old cctools assembler won’t deal with them properly

Usually when there are multiple .machine's being used, they should be used
with the ".machine push" and ".machine pop" directives so the temporary
.machine value doesn't corrupt the .machine value being used for the rest
of the file.  Like so.

.machine "power8"

...

.machine push
.machine "power9"

.machine pop

...

Hopefully the cctools supports that.

Peter



[PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 09/11/18 21:00, Jeff Law wrote:
> > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > One of the concerns noted at Cauldron about -fsave-optimization-
> > > record
> > > was the size of the output files.
> > > 
> > > This file implements compression of the -fsave-optimization-
> > > record
> > > output, using zlib.
> > > 
> > > I did some before/after testing of this patch, using SPEC 2017's
> > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > FILENAME.opt-record.json[.gz] files.
> > > 
> > > The largest file was for insn-attrtab.c:
> > >   before:  171736285 bytes (164M)
> > >   after: 5304015 bytes (5.1M)
> > > 
> > > Smallest file was for vasprintf.c:
> > >   before:  30567 bytes
> > >   after:4485 bytes
> > > 
> > > Median file by size before was lambda-mat.c:
> > >   before:2266738 bytes (2.2M)
> > >   after:   75988 bytes (15K)
> > > 
> > > Total of all files in the benchmark:
> > >   before: 2041720713 bytes (1.9G)
> > >   after:66870770 bytes (63.8M)
> > > 
> > > ...so clearly compression is a big win in terms of file size, at
> > > the
> > > cost of making the files slightly more awkward to work with. [1]
> > > I also wonder if we want to support any pre-filtering of the
> > > output
> > > (FWIW roughly half of the biggest file seems to be "Adding assert
> > > for "
> > > messages from tree-vrp.c).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> 
> So does this now add a dependency on zlib?
> I can't build GCC on my aarch64-none-linux machine after this patch
> due to a missing zlib.h.
> I see there's a zlib in the top-level GCC tree. Is that build/used
> during the GCC build itself?
> 
> Thanks,
> Kyrill

Sorry about that.  Does the following patch fix the build for you?

gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
1.8.5.3



Re: record_ranges_from_incoming_edge: use value_range API for creating new range

2018-11-13 Thread Richard Biener
On November 13, 2018 5:40:59 PM GMT+01:00, Aldy Hernandez  
wrote:
>With your cleanups, the main raison d'etre for my patch goes away, but 
>here is the promised removal of ignore_equivs_equal_p.
>
>I think the == operator is a bit confusing, and equality intent should 
>be clearly specified.  I am providing the following for the derived 
>class (with no hidden default arguments):
>
>   bool equal_p (const value_range &, bool ignore_equivs) const;
>
>and providing the following for the base class:
>
>   bool equal_p (const value_range_base &) const;
>
>I am also removing access to both the == and the != operators.  It 
>should now be clear from the code whether the equivalence bitmap is 
>being taken into account or not.
>
>What do you think?

Sounds good. 

Richard. 

>Aldy



Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Richard Henderson
On 11/13/18 5:38 PM, Peter Bergner wrote:
> On 11/13/18 2:53 AM, Eric Botcazou wrote:
>>> +static rtx
>>> +simple_move_operator (rtx x)
>>> +{
>>> +  /* A word sized rotate of a register pair is equivalent to swapping
>>> + the registers in the register pair.  */
>>> +  if (GET_CODE (x) == ROTATE
>>> +  && GET_MODE (x) == twice_word_mode
>>> +  && simple_move_operand (XEXP (x, 0))
>>> +  && CONST_INT_P (XEXP (x, 1))
>>> +  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
>>> +return XEXP (x, 0);;
>>> +
>>> +  return NULL_RTX;
>>> +}
>>> +
>>
>> Superfluous semi-colon.  Given that the function returns an operand, its 
>> name 
>> is IMO misleading, so maybe [get_]operand_for_simple_move_operator.
> 
> Fixed and renamed function to operand_for_simple_move_operator.

Would not operand_for_swap_move_operator be better?  This is not a "simple
move", it is something that requires swapping the words of the operand.
(Presumably one could think of other operators that generate a swap, and match
them here.  I can't think of another one off the top of my head though.)


r~


Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Peter Bergner
On 11/13/18 12:49 PM, Richard Henderson wrote:
> On 11/13/18 5:38 PM, Peter Bergner wrote:
>> On 11/13/18 2:53 AM, Eric Botcazou wrote:
>>> Superfluous semi-colon.  Given that the function returns an operand, its 
>>> name 
>>> is IMO misleading, so maybe [get_]operand_for_simple_move_operator.
>>
>> Fixed and renamed function to operand_for_simple_move_operator.
> 
> Would not operand_for_swap_move_operator be better?  This is not a "simple
> move", it is something that requires swapping the words of the operand.
> (Presumably one could think of other operators that generate a swap, and match
> them here.  I can't think of another one off the top of my head though.)

That's fine with me.

Peter




[PATCH] avoid copying inlining attributes in attribute copy

2018-11-13 Thread Martin Sebor

Enabling in Glibc the recent enhancement to detect attribute
mismatches in alias declarations (PR 81824) revealed a use
case where aliases are being declared for targets declared
with the always_inline attribute.  Copying the always_inline
attribute to the declaration of the alias would then trigger

  error: always_inline function might not be inlinable [-Werror=attributes]

due to the alias not being inlinable in some contexts where
(presumably) the target would be.  To avoid the warning for
this use case the attached patch excludes all attributes
that affect inlining from being copied by attribute copy.

While testing this I also more thoroughly exercised attribute
tls_target (which also came up during the Glibc deployment),
and improved the diagnostics for the attribute to make their
root cause easier to understand (printing "attribute ignored"
alone isn't very informative without also explaining why).

Martin

PS While testing this I also opened bug 88010 where inlining
attributes on aliases seem to be silently ignored in favor of
those on their targets.  I'm wondering what the expected (or
preferred) behavior is.
gcc/c-family/ChangeLog:

	* c-attribs.c (handle_copy_attribute): Exclude inlining attributes.
	(handle_tls_model_attribute): Improve diagnostics.
	(has_attribute): Fix a typo.

gcc/testsuite/ChangeLog:

	* gcc.dg/attr-copy-5.c: New test.
	* gcc.dg/tls/diag-6.c: Adjust expected diagnostics.


Index: gcc/c-family/c-attribs.c
===
--- gcc/c-family/c-attribs.c	(revision 266033)
+++ gcc/c-family/c-attribs.c	(working copy)
@@ -2239,13 +2239,15 @@ handle_copy_attribute (tree *node, tree name, tree
   /* Copy decl attributes from REF to DECL.  */
   for (tree at = attrs; at; at = TREE_CHAIN (at))
 	{
-	  /* Avoid copying attributes that affect a symbol linkage or
-	 visibility since those in all likelihood only apply to
-	 the target.
+	  /* Avoid copying attributes that affect a symbol linkage,
+	 inlining, or visibility since those in all likelihood
+	 only apply to the target.
 	 FIXME: make it possible to specify which attributes to
 	 copy or not to copy in the copy attribute itself.  */
 	  tree atname = get_attribute_name (at);
 	  if (is_attribute_p ("alias", atname)
+	  || is_attribute_p ("always_inline", atname)
+	  || is_attribute_p ("gnu_inline", atname)
 	  || is_attribute_p ("ifunc", atname)
 	  || is_attribute_p ("visibility", atname)
 	  || is_attribute_p ("weak", atname)
@@ -2458,17 +2460,26 @@ handle_tls_model_attribute (tree *node, tree name,
   tree decl = *node;
   enum tls_model kind;
 
-  if (!VAR_P (decl) || !DECL_THREAD_LOCAL_P (decl))
+  if (!VAR_P (decl))
 {
-  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  warning (OPT_Wattributes, "%qE attribute ignored because %qD "
+	   "is not a variable",
+	   name, decl);
   return NULL_TREE;
 }
 
+  if (!DECL_THREAD_LOCAL_P (decl))
+{
+  warning (OPT_Wattributes, "%qE attribute ignored because %qD does "
+	   "not have thread storage duration", name, decl);
+  return NULL_TREE;
+}
+
   kind = DECL_TLS_MODEL (decl);
   id = TREE_VALUE (args);
   if (TREE_CODE (id) != STRING_CST)
 {
-  error ("tls_model argument not a string");
+  error ("%qE argument not a string", name);
   return NULL_TREE;
 }
 
@@ -2481,7 +2492,9 @@ handle_tls_model_attribute (tree *node, tree name,
   else if (!strcmp (TREE_STRING_POINTER (id), "global-dynamic"))
 kind = TLS_MODEL_GLOBAL_DYNAMIC;
   else
-error ("tls_model argument must be one of \"local-exec\", \"initial-exec\", \"local-dynamic\" or \"global-dynamic\"");
+error ("%qE argument must be one of %qs, %qs, %qs, or %qs",
+	   name,
+	   "local-exec", "initial-exec", "local-dynamic", "global-dynamic");
 
   set_decl_tls_model (decl, kind);
   return NULL_TREE;
Index: gcc/testsuite/gcc.dg/attr-copy-5.c
===
--- gcc/testsuite/gcc.dg/attr-copy-5.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/attr-copy-5.c	(working copy)
@@ -0,0 +1,57 @@
+/* PR middle-end/81824 - Warn for missing attributes with function aliases
+   Verify that attributes always_inline, gnu_inline, and noinline aren't
+   copied.  Also verify that copying attribute tls_model to a non-thread
+   variable triggers a warning.
+   { dg-do compile }
+   { dg-options "-Wall" }
+   { dg-require-effective-target tls } */
+
+#define ATTR(...)   __attribute__ ((__VA_ARGS__))
+
+ATTR (always_inline, gnu_inline, noreturn) inline int
+finline_noret (void)
+{
+  __builtin_abort ();
+  /* Expect no -Wreturn-type.  */
+}
+
+int call_finline_noret (void)
+{
+  finline_noret ();
+  /* Expect no -Wreturn-type.  */
+}
+
+
+ATTR (copy (finline_noret)) int
+fnoret (void);
+
+int call_fnoret (void)
+{
+  fnoret ();
+  /* Expect no -Wreturn-type.  */
+}
+
+
+/* Verify that attribute always_inline on a

  1   2   >