Re: [PATCH 3/5] gcc: Add --nostdlib++ option

2021-12-05 Thread Richard Purdie via Gcc-patches
On Thu, 2021-12-02 at 20:04 -0700, Jeff Law wrote:
> 
> On 10/28/2021 10:39 AM, Richard Purdie wrote:
> > On Thu, 2021-10-28 at 08:51 -0600, Jeff Law wrote:
> > > On 10/27/2021 2:05 PM, Richard Purdie via Gcc-patches wrote:
> > > > OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime 
> > > > libraries
> > > > separately from the compiler and slightly differently to the standard 
> > > > gcc build.
> > > > 
> > > > In general this works well but in trying to build them separately we 
> > > > run into
> > > > an issue since we're using our gcc, not xgcc and there is no way to 
> > > > tell configure
> > > > to use libgcc but not look for libstdc++.
> > > > 
> > > > This adds such an option allowing such configurations to work.
> > > > 
> > > > 2021-10-26 Richard Purdie 
> > > > 
> > > > gcc/c-family/ChangeLog:
> > > > 
> > > >   * c.opt: Add --nostdlib++ option
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > >   * g++spec.c (lang_specific_driver): Add --nostdlib++ option
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >   * doc/invoke.texi: Document --nostdlib++ option
> > > >   * gcc.c: Add --nostdlib++ option
> > > Couldn't you use -nostdlib then explicitly add -lgcc?
> > > 
> > > If that works, that would seem better to me compared to adding an option
> > > to specs processing that is really only useful to one build
> > > system/procedure.
> > It sounds great in principle but I've never been able to get it to work. 
> > With
> > "-nostdinc++ -nostdlib" I miss the startup files so I also tried 
> > "-nostdinc++ -
> > nodefaultlibs -lgcc". The latter gets further and I can build libstdc++ but 
> > the
> > resulting library doesn't link into applications correctly.
> Can you be more specific about "doesn't link into applications 
> correctly".  I'm still hesitant to add another option if we can 
> reasonably avoid it.

I took a step back and had another look at what our build is doing and why we
need this. Our build builds the different components separately in many cases so
libstdc++ is built separately since that allows us to tune it to specific
targets whilst the core gcc is architecture specific.

When we run configure for libstdc++, we have to configure CXX. We can configure
it as:

CXX="${HOST_PREFIX}g++ -nostdinc++"

however that gives errors about a missing libstdc++ during configure tests (e.g.
the atomic ones) since the library isn't present yet and we're trying to build
it. When I last ran into this I added the nostdlib++ option to mirror the
stdinc++ one and this solved the problem.

Adding -lgcc doesn't seem to work, binaries using libstdc++ segfault on some
architectures (mips64 and ppc). I suspect this is a link ordering issue which we
have little control of from the CXX variable but I haven't got deeply into it as
I got the feeling it would be a pain to try and correct, we need the compiler's
automatic linking behaviour which you can't emulate from one commandline.

I did also try -nostdlib and -nodefaultlibs with various libraries added in but
it never seems to work well giving segfaults, again as I suspect the linking is
order specific.

Thinking about the problem from scratch again, I wondered if a dummy libstdc++
would be enough to make the configure tests work correctly. I've found that I
can do something like:

mkdir -p XXX/dummylib
touch XXX/dummylib/libstdc++.so
export CXX="${CXX} -nostdinc++ -LXXX/dummylib"

and the configure works correctly and the resulting libs don't segfault on
target.

This is a bit of a hack but probably no worse than some of the other things we
have to do to build so if you're not keen on the patch we could go with this. It
did seem at least worth discussing our use case though! :)

Cheers,

Richard






[pushed] Objective-C, NeXT: Reorganise meta-data declarations.

2021-12-05 Thread Iain Sandoe via Gcc-patches
This moves the GTY declaration of the meta-data indentifier
array into the header that enumerates these and provides
shorthand defines for them.  This avoids a problem seen with
a relocatable PCH implementation.

Tested on x86_64, i686 Darwin, pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/objc/ChangeLog:

* objc-next-metadata-tags.h (objc_rt_trees): Declare here.
* objc-next-runtime-abi-01.c: Remove from here.
* objc-next-runtime-abi-02.c: Likewise.
* objc-runtime-shared-support.c: Reorder headers, provide
a GTY declaration the definition of objc_rt_trees.
---
 gcc/objc/objc-next-metadata-tags.h | 2 ++
 gcc/objc/objc-next-runtime-abi-01.c| 9 +
 gcc/objc/objc-next-runtime-abi-02.c| 9 +
 gcc/objc/objc-runtime-shared-support.c | 9 ++---
 4 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/gcc/objc/objc-next-metadata-tags.h 
b/gcc/objc/objc-next-metadata-tags.h
index ceb87832359..aea14558b5c 100644
--- a/gcc/objc/objc-next-metadata-tags.h
+++ b/gcc/objc/objc-next-metadata-tags.h
@@ -79,6 +79,8 @@ enum objc_runtime_tree_index
   OCTI_RT_META_MAX
 };
 
+extern GTY(()) tree objc_rt_trees[OCTI_RT_META_MAX];
+
 /* Tags for the META data so that the backend can put them in the correct
sections for targets/runtimes (Darwin/NeXT) that require this.
This information also survives LTO - which might produce mixed language
diff --git a/gcc/objc/objc-next-runtime-abi-01.c 
b/gcc/objc/objc-next-runtime-abi-01.c
index 12f8bdc0b9c..0f76162c70e 100644
--- a/gcc/objc/objc-next-runtime-abi-01.c
+++ b/gcc/objc/objc-next-runtime-abi-01.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "objc-runtime-hooks.h"
 #include "objc-runtime-shared-support.h"
+#include "objc-next-metadata-tags.h"
 #include "objc-encoding.h"
 
 /* NeXT ABI 0 and 1 private definitions.  */
@@ -99,14 +100,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define CLS_HAS_CXX_STRUCTORS  0x2000L
 
-/* rt_trees identifiers - shared between NeXT implementations.  These
-   allow the FE to tag meta-data in a manner that survives LTO and can
-   be used when the runtime requires that certain meta-data items
-   appear in particular named sections.  */
-
-#include "objc-next-metadata-tags.h"
-extern GTY(()) tree objc_rt_trees[OCTI_RT_META_MAX];
-
 static void next_runtime_01_initialize (void);
 
 static tree next_runtime_abi_01_super_superclassfield_id (void);
diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 7ca0fd7cf00..913ed643782 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "objc-runtime-hooks.h"
 #include "objc-runtime-shared-support.h"
+#include "objc-next-metadata-tags.h"
 #include "objc-encoding.h"
 
 /* ABI 2 Private definitions. */
@@ -180,14 +181,6 @@ enum objc_v2_tree_index
 #define objc_rethrow_exception_decl \
objc_v2_global_trees[OCTI_V2_RETHROW_DECL]
 
-/* rt_trees identifiers - shared between NeXT implementations.  These allow
-   the FE to tag meta-data in a manner that survives LTO and can be used when
-   the  runtime requires that certain meta-data items appear in particular
-   named sections.  */
-
-#include "objc-next-metadata-tags.h"
-extern GTY(()) tree objc_rt_trees[OCTI_RT_META_MAX];
-
 /* The OCTI_V2_... enumeration itself is in above.  */
 static GTY(()) tree objc_v2_global_trees[OCTI_V2_MAX];
 
diff --git a/gcc/objc/objc-runtime-shared-support.c 
b/gcc/objc/objc-runtime-shared-support.c
index 40f506c6e75..0d3036f478e 100644
--- a/gcc/objc/objc-runtime-shared-support.c
+++ b/gcc/objc/objc-runtime-shared-support.c
@@ -44,16 +44,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "objc-runtime-hooks.h"
 
 #include "objc-runtime-shared-support.h"
-#include "objc-encoding.h"
-
-/* rt_trees identifiers - shared between NeXT implementations.  These allow
-   the FE to tag meta-data in a manner that survives LTO and can be used when
-   the  runtime requires that certain meta-data items appear in particular
-   named sections.  */
 #include "objc-next-metadata-tags.h"
-extern GTY(()) tree objc_rt_trees[OCTI_RT_META_MAX];
+#include "objc-encoding.h"
 
 /* Rather than repeatedly looking up the identifiers, we save them here.  */
+extern GTY(()) tree objc_rt_trees[OCTI_RT_META_MAX];
 tree objc_rt_trees[OCTI_RT_META_MAX];
 
 /* For building an objc struct.  These might not be used when this file
-- 
2.24.3 (Apple Git-128)



[PATCH] ranger: Optimise irange_union

2021-12-05 Thread Richard Sandiford via Gcc-patches
When compiling an optabs.ii at -O2 with a release-checking build,
the hottest function in the profile was irange_union.  This patch
tries to optimise it a bit.  The specific changes are:

- Use quick_push rather than safe_push, since the final number
  of entries is known in advance.

- Avoid assigning wi::to_wide & co. to a temporary wide_int,
  such as in:

wide_int val_j = wi::to_wide (res[j]);

  wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
  storage.  Assigning the result to wide_int forces an unnecessary
  copy to temporary storage.

  This is one area where "auto" helps a lot.  In the end though,
  it seemed more readable to inline the wi::to_*s rather than
  use auto.

- Use to_widest_int rather than to_wide_int.  Both are functionally
  correct, but to_widest_int is more efficient, for three reasons:

  - to_wide returns a wide-int representation in which the most
significant element might not be canonically sign-extended.
This is because we want to allow the storage of an INTEGER_CST
like 0x1U << 31 to be accessed directly with both a wide_int view
(where only 32 bits matter) and a widest_int view (where many more
bits matter, and where the 32 bits are zero-extended to match the
unsigned type).  However, operating on uncanonicalised wide_int
forms is less efficient than operating on canonicalised forms.

  - to_widest_int has a constant rather than variable precision and
there are never any redundant upper bits to worry about.

  - Using widest_int avoids the need for an overflow check, since
there is enough precision to add 1 to any IL constant without
wrap-around.

This gives a ~2% compile-time speed up with the test above.

I also tried adding a path for two single-pair ranges, but it
wasn't a win.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* value-range.cc (irange::irange_union): Use quick_push rather
than safe_push.  Use widest_int rather than wide_int.  Avoid
assigning wi::to_* results to wide*_int temporaries.
---
 gcc/value-range.cc | 46 +-
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 82509fa55a7..d38d0786072 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1550,70 +1550,50 @@ irange::irange_union (const irange &r)
   // the merge is performed.
   //
   // [Xi,Yi]..[Xn,Yn]  U  [Xj,Yj]..[Xm,Ym]   -->  [Xk,Yk]..[Xp,Yp]
-  tree ttype = r.type ();
-  signop sign = TYPE_SIGN (ttype);
-
-  auto_vec res;
-  wide_int u1 ;
-  wi::overflow_type ovf;
+  auto_vec res (m_num_ranges * 2 + r.m_num_ranges * 2);
   unsigned i = 0, j = 0, k = 0;
 
   while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2)
 {
   // lower of Xi and Xj is the lowest point.
-  if (wi::le_p (wi::to_wide (m_base[i]), wi::to_wide (r.m_base[j]), sign))
+  if (wi::to_widest (m_base[i]) <= wi::to_widest (r.m_base[j]))
{
- res.safe_push (m_base[i]);
- res.safe_push (m_base[i + 1]);
+ res.quick_push (m_base[i]);
+ res.quick_push (m_base[i + 1]);
  k += 2;
  i += 2;
}
   else
{
- res.safe_push (r.m_base[j]);
- res.safe_push (r.m_base[j + 1]);
+ res.quick_push (r.m_base[j]);
+ res.quick_push (r.m_base[j + 1]);
  k += 2;
  j += 2;
}
 }
   for ( ; i < m_num_ranges * 2; i += 2)
 {
-  res.safe_push (m_base[i]);
-  res.safe_push (m_base[i + 1]);
+  res.quick_push (m_base[i]);
+  res.quick_push (m_base[i + 1]);
   k += 2;
 }
   for ( ; j < r.m_num_ranges * 2; j += 2)
 {
-  res.safe_push (r.m_base[j]);
-  res.safe_push (r.m_base[j + 1]);
+  res.quick_push (r.m_base[j]);
+  res.quick_push (r.m_base[j + 1]);
   k += 2;
 }
 
   // Now normalize the vector removing any overlaps.
   i = 2;
-  int prec = TYPE_PRECISION (ttype);
-  wide_int max_val = wi::max_value (prec, sign);
   for (j = 2; j < k ; j += 2)
 {
-  wide_int val_im1 = wi::to_wide (res[i - 1]);
-  if (val_im1 == max_val)
-   break;
-  u1 = wi::add (val_im1, 1, sign, &ovf);
-
-  // Overflow indicates we are at MAX already.
-  // A wide int bug requires the previous max_val check
-  // trigger: gcc.c-torture/compile/pr80443.c  with -O3
-  if (ovf == wi::OVF_OVERFLOW)
-   break;
-
-  wide_int val_j = wi::to_wide (res[j]);
-  wide_int val_jp1 = wi::to_wide (res[j+1]);
   // Current upper+1 is >= lower bound next pair, then we merge ranges.
-  if (wi::ge_p (u1, val_j, sign))
+  if (wi::to_widest (res[i - 1]) + 1 >= wi::to_widest (res[j]))
{
  // New upper bounds is greater of current or the next one.
- if (wi::gt_p (val_jp1, val_im1, sign))
-   res [i - 1] = res[j + 1];
+ if (wi::to_widest (res[j + 1]) > wi::to_widest (res[i - 1]))
+   

[PATCH] PR fortran/103418 - random_number() does not accept pointer, intent(in) array argument

2021-12-05 Thread Harald Anlauf via Gcc-patches
Dear all,

the check of dummy arguments with pointer attribute and INTENT(IN)
was broken in the case the argument was passed to an intrinsic.
We therefore rejected valid code as e.g. given in the PR.

The patch relaxes the excessive check.  This requires the adjustment
of one of the tests for MOVE_ALLOC.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is a rejects-valid and possibly annoying, I would like to
backport as far seems reasonable.

Thanks,
Harald

From fa07ada75a5ea25845d7e168204cd980263a7d8d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 5 Dec 2021 22:45:32 +0100
Subject: [PATCH] Fortran: fix check for pointer dummy arguments with
 INTENT(IN)

gcc/fortran/ChangeLog:

	PR fortran/103418
	* check.c (variable_check): Correct check of procedure dummy
	arguments with INTENT(IN) and POINTER attribute to get accepted
	when passed to intrinsics.

gcc/testsuite/ChangeLog:

	PR fortran/103418
	* gfortran.dg/move_alloc_8.f90: Adjust error messages.
	* gfortran.dg/pointer_intent_9.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/check.c   |  2 +-
 gcc/testsuite/gfortran.dg/move_alloc_8.f90|  4 +--
 .../gfortran.dg/pointer_intent_9.f90  | 28 +++
 3 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pointer_intent_9.f90

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index ee3a51ee253..879b5b1996e 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1031,7 +1031,7 @@ variable_check (gfc_expr *e, int n, bool allow_proc)
 	break;
 	}

-  if (!ref)
+  if (!ref && !(pointer && (e->rank == 0 || e->ref)))
 	{
 	  gfc_error ("%qs argument of %qs intrinsic at %L cannot be "
 		 "INTENT(IN)", gfc_current_intrinsic_arg[n]->name,
diff --git a/gcc/testsuite/gfortran.dg/move_alloc_8.f90 b/gcc/testsuite/gfortran.dg/move_alloc_8.f90
index f624b703cc9..d968ea0e5cd 100644
--- a/gcc/testsuite/gfortran.dg/move_alloc_8.f90
+++ b/gcc/testsuite/gfortran.dg/move_alloc_8.f90
@@ -60,7 +60,7 @@ subroutine test2 (x, px)
   integer, allocatable :: a
   type(t2), pointer :: ta

-  call move_alloc (px, ta)  ! { dg-error "cannot be INTENT.IN." }
+  call move_alloc (px, ta)  ! { dg-error "must be ALLOCATABLE" }
   call move_alloc (x%a, a)  ! { dg-error "cannot be INTENT.IN." }
   call move_alloc (x%ptr%a, a)  ! OK (3)
   call move_alloc (px%a, a) ! OK (4)
@@ -84,7 +84,7 @@ subroutine test3 (x, px)
   integer, allocatable :: a
   class(t2), pointer :: ta

-  call move_alloc (px, ta)  ! { dg-error "cannot be INTENT.IN." }
+  call move_alloc (px, ta)  ! { dg-error "must be ALLOCATABLE" }
   call move_alloc (x%a, a)  ! { dg-error "cannot be INTENT.IN." }
   call move_alloc (x%ptr%a, a)  ! OK (6)
   call move_alloc (px%a, a) ! OK (7)
diff --git a/gcc/testsuite/gfortran.dg/pointer_intent_9.f90 b/gcc/testsuite/gfortran.dg/pointer_intent_9.f90
new file mode 100644
index 000..000f407d393
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pointer_intent_9.f90
@@ -0,0 +1,28 @@
+! { dg-do compile }
+! PR fortran/103418
+! Validate checks for use of dummy arguments with pointer attribute
+
+module m
+  type t
+ real, pointer :: a, b(:)
+  end type t
+contains
+  subroutine s1 (a, b, c, d)
+real, pointer, intent(in) :: a, b(:)
+type(t),   intent(in) :: c
+class(t),  intent(in) :: d
+real, pointer :: pa, pb(:)
+call random_number (a)! legal
+call random_number (b)
+call random_number (c% a)
+call random_number (c% b)
+call random_number (d% a)
+call random_number (d% b)
+call move_alloc (a, pa)   ! { dg-error "must be ALLOCATABLE" }
+call move_alloc (b, pb)   ! { dg-error "must be ALLOCATABLE" }
+allocate (a)  ! { dg-error "pointer association context" }
+allocate (b(10))  ! { dg-error "pointer association context" }
+allocate (c% a)   ! { dg-error "pointer association context" }
+allocate (c% b(10))   ! { dg-error "pointer association context" }
+  end subroutine s1
+end module
--
2.26.2



[PATCH] ranger: Add shortcuts for single-successor blocks

2021-12-05 Thread Richard Sandiford via Gcc-patches
When compiling an optabs.ii at -O2 with a release-checking build,
there were 6,643,575 calls to gimple_outgoing_range_stmt_p.  96.8% of
them were for blocks with a single successor, which never have a control
statement that generates new range info.  This patch therefore adds a
shortcut for that case.

This gives a ~1% compile-time improvement for the test.

I tried making the function inline (in the header) so that the
single_succ_p didn't need to be repeated, but it seemed to make things
slightly worse.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* gimple-range-edge.cc (gimple_outgoing_range::edge_range_p): Add
a shortcut for blocks with single successors.
* gimple-range-gori.cc (gori_map::calculate_gori): Likewise.
---
 gcc/gimple-range-edge.cc | 3 +++
 gcc/gimple-range-gori.cc | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/gcc/gimple-range-edge.cc b/gcc/gimple-range-edge.cc
index afffc8dbcae..9e805230004 100644
--- a/gcc/gimple-range-edge.cc
+++ b/gcc/gimple-range-edge.cc
@@ -182,6 +182,9 @@ gimple_outgoing_range::calc_switch_ranges (gswitch *sw)
 gimple *
 gimple_outgoing_range::edge_range_p (irange &r, edge e)
 {
+  if (single_succ_p (e->src))
+return NULL;
+
   // Determine if there is an outgoing edge.
   gimple *s = gimple_outgoing_range_stmt_p (e->src);
   if (!s)
diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 0dba34b58c5..b256e4afd15 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -555,6 +555,9 @@ gori_map::calculate_gori (basic_block bb)
   m_outgoing[bb->index] = BITMAP_ALLOC (&m_bitmaps);
   m_incoming[bb->index] = BITMAP_ALLOC (&m_bitmaps);
 
+  if (single_succ_p (bb))
+return;
+
   // If this block's last statement may generate range informaiton, go
   // calculate it.
   gimple *stmt = gimple_outgoing_range_stmt_p (bb);
-- 
2.31.1



[PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-05 Thread Richard Sandiford via Gcc-patches
When compiling an optabs.ii at -O2 with a release-checking build,
we spent a lot of time in call_may_clobber_ref_p.  One of the
first things the function tries is the get_modref_function_summary
approach, but that also seems to be the most expensive check.
At least for the optabs.ii test, most of the things g_m_f_s could
handle could also be handled by points-to analysis, which is much
cheaper.

This patch therefore rearranges the tests in approximate order
of cheapness.  One wart is that points-to analysis can't be
used on its own for this purpose; it has to be used in
conjunction with check_fnspec.  This is because the argument
information in the fnspec is not reflected in the points-to
information, which instead contains overly optimistic (rather
than conservatively pessimistic) information.

Specifically, find_func_aliases_for_builtin_call short-circuits
the handle_rhs_call/handle_call_arg logic and doesn't itself add
any compensating information about the arguments.  This means that:

  free (foo)

does not have an points-to information for foo (it has an empty
set instead).

It would be good to fix that for compile-time reasons if nothing else.
Doing points-to analysis without check_fnspec gave a measureable
speed-up, since PTA resolved the vast majority of queries, and
(as expected) the vast majority of calls had no fnspec.

This gave about a ~2% compile-time improvement for the test above.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-ssa-alias.c (call_may_clobber_ref_p_1): Reorder checks
to the cheaper ones come first.
---
 gcc/tree-ssa-alias.c | 145 +--
 1 file changed, 70 insertions(+), 75 deletions(-)

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 88fd7821c5e..573766278bc 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2952,9 +2952,6 @@ ref_maybe_used_by_stmt_p (gimple *stmt, tree ref, bool 
tbaa_p)
 bool
 call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p)
 {
-  tree base;
-  tree callee;
-
   /* If the call is pure or const it cannot clobber anything.  */
   if (gimple_call_flags (call)
   & (ECF_PURE|ECF_CONST|ECF_LOOPING_CONST_OR_PURE|ECF_NOVOPS))
@@ -2977,9 +2974,64 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool 
tbaa_p)
break;
   }
 
-  callee = gimple_call_fndecl (call);
+  tree base = ao_ref_base (ref);
+  if (base
+  && (TREE_CODE (base) == SSA_NAME
+ || CONSTANT_CLASS_P (base)))
+return false;
+
+  /* A call that is not without side-effects might involve volatile
+ accesses and thus conflicts with all other volatile accesses.  */
+  if (ref->volatile_p)
+return true;
+
+  bool independent_pt = false;
+  if (base && DECL_P (base))
+{
+  /* If the reference is based on a decl that is not aliased the call
+cannot possibly clobber it.  */
+  if (!may_be_aliased (base)
+ /* But local non-readonly statics can be modified through
+recursion or the call may implement a threading barrier which
+we must treat as may-def.  */
+ && (TREE_READONLY (base)
+ || !is_global_var (base)))
+   return false;
 
-  if (callee != NULL_TREE && !ref->volatile_p)
+  auto clobbers = gimple_call_clobber_set (call);
+  independent_pt = !pt_solution_includes (clobbers, base);
+}
+  else if (base
+  && (TREE_CODE (base) == MEM_REF
+  || TREE_CODE (base) == TARGET_MEM_REF)
+  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
+{
+  tree addr = TREE_OPERAND (base, 0);
+
+  /* If the reference is based on a pointer that points to memory that
+may not be written to then the call cannot possibly clobber it.  */
+  if (SSA_NAME_POINTS_TO_READONLY_MEMORY (addr))
+   return false;
+
+  /* Check if the base variable is call-clobbered.  */
+  if (auto *pi = SSA_NAME_PTR_INFO (addr))
+   {
+ auto clobbers = gimple_call_clobber_set (call);
+ independent_pt = !pt_solutions_intersect (clobbers, &pi->pt);
+   }
+}
+
+  /* Points-to information needs to be tested in conjunction with
+ check_fnspec.  The fnspec argument information for built-in
+ functions is excluded from the PT sets.  */
+  int res = check_fnspec (call, ref, true);
+  if (res >= 0)
+return res;
+
+  if (independent_pt)
+return false;
+
+  if (tree callee = gimple_call_fndecl (call))
 {
   struct cgraph_node *node = cgraph_node::get (callee);
   if (node)
@@ -3017,77 +3069,20 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
  return false;
}
  alias_stats.modref_clobber_may_alias++;
- }
-   }
-}
-
-  base = ao_ref_base (ref);
-  if (!base)
-return true;
-
-  if (TREE_CODE (base) == SSA_NAME
-  || CONSTANT_CLASS_P (base))
-return false;
-
-  /* A call that is not without side

[PATCH] fold: Optimise fold_view_convert_expr

2021-12-05 Thread Richard Sandiford via Gcc-patches
When compiling an optabs.ii at -O2 with a release-checking build,
there were 210,172 calls to fold_view_convert_expr.  99.8% of them
were conversions of an INTEGER_CST to an ENUMERAL_TYPE, so this patch
adds a fast(er) path for that case.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* fold-const.c (fold_view_convert_expr): Add a fast path for
conversions of INTEGER_CSTs to other integral types.
---
 gcc/fold-const.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0b9a42f764a..19613015ddb 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9128,6 +9128,13 @@ fold_view_convert_vector_encoding (tree type, tree expr)
 static tree
 fold_view_convert_expr (tree type, tree expr)
 {
+  /* Conversions of INTEGER_CSTs to ENUMERAL_TYPEs are very common,
+ so handle them directly.  */
+  if (INTEGRAL_TYPE_P (type)
+  && TREE_CODE (expr) == INTEGER_CST
+  && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (expr)))
+return wide_int_to_tree (type, wi::to_wide (expr));
+
   /* We support up to 512-bit values (for V8DFmode).  */
   unsigned char buffer[64];
   int len;
-- 
2.31.1



[PATCH] gimple: Optimise inlined gimple_seq_last

2021-12-05 Thread Richard Sandiford via Gcc-patches
In self-compilations, GCC doesn't realise that gimple_seq_last
always returns nonnull when the argument is nonnull.  Although
it's a small thing in itself, the function is used (indirectly)
many times and the extra null checks bloat what are otherwise
simple loops.

This patch adds an optimisation hint to tell the compiler that
s && !s->prev is impossible.  This gives a small (<1%) but
measureable improvement in compile time.  The original intention
was to make a loop small enough that it could be turned into an
inline function.

The form of assertion in the patch:

  __extension__ ({ if (!(EXPR)) __builtin_unreachable (); (void) 0; })

seems to work more reliably than the form used by release-checking
gcc_asserts:

  ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0 : 0))

For that reason, I wondered about making gcc_assert use the
new macro too.  However, gcc_assert is currently inconsistent:
it preserves side-effects even in release compilers when compiled
with GCC, but it doesn't when compiled by other compilers.
This difference dates back to 2013 (g:2df77822ee0974d84b2)
and it looks like we now have several gcc_asserts that assume
that side effects will be preserved.  It therefore seemed better
to deal with that separately.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* system.h (optimization_assert): New macro.
* gimple.h (gimple_seq_last): Use it to assert that s->prev is
never null.
---
 gcc/gimple.h |  2 ++
 gcc/system.h | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index f7fdefc5362..8162c1ea507 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1723,6 +1723,8 @@ gimple_seq_first_stmt_as_a_bind (gimple_seq s)
 static inline gimple_seq_node
 gimple_seq_last (gimple_seq s)
 {
+  /* Helps to avoid a double branch in callers.  */
+  optimization_assert (!s || s->prev);
   return s ? s->prev : NULL;
 }
 
diff --git a/gcc/system.h b/gcc/system.h
index 4ac656c9c3c..6e27b3270b9 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -778,6 +778,16 @@ extern void fancy_abort (const char *, int, const char *)
 ATTRIBUTE_NORETURN ATTRIBUTE_COLD;
 #define abort() fancy_abort (__FILE__, __LINE__, __FUNCTION__)
 
+/* Tell the compiler of GCC that EXPR is known to be true.  This is purely
+   an optimization hint and EXPR must not have any side effects.  */
+#if (GCC_VERSION >= 4005)
+#define optimization_assert(EXPR)  \
+  __extension__ ({ if (!(EXPR)) __builtin_unreachable (); (void) 0; })
+#else
+/* Include EXPR, so that unused variable warnings do not occur.  */
+#define optimization_assert(EXPR) ((void)(0 && (EXPR)))
+#endif
+
 /* Use gcc_assert(EXPR) to test invariants.  */
 #if ENABLE_ASSERT_CHECKING
 #define gcc_assert(EXPR)   \
-- 
2.31.1



[PATCH] [PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-12-05 Thread Dan Li via Gcc-patches
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
+   struct attribute_spec.handler.  */
+static tree
+handle_no_sanitize

Re: [PATCH] gcc: vxworks: fix providing stdint.h header

2021-12-05 Thread Alexandre Oliva via Gcc-patches
On Dec  3, 2021, Olivier Hainque  wrote:

> Alex, how does that look to you?

LGTM, thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] [PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-12-05 Thread Dan Li via Gcc-patches




On 12/6/21 10:41 AM, Dan Li wrote:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 


Add more reviewers :)


[PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-05 Thread liuhongt via Gcc-patches
When moves between integer and sse registers are cheap.

2021-12-06  Hongtao Liu  
Uroš Bizjak  
gcc/ChangeLog:

PR target/95740
* config/i386/i386.c (ix86_preferred_reload_class): Allow
integer regs when moves between register units are cheap.
* config/i386/i386.h (INT_SSE_CLASS_P): New.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr95740.c: New test.
---
 gcc/config/i386/i386.c  | 12 ++--
 gcc/config/i386/i386.h  |  2 ++
 gcc/testsuite/gcc.target/i386/pr95740.c | 26 +
 3 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr95740.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 80fee627358..e3c2e294988 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19194,9 +19194,17 @@ ix86_preferred_reload_class (rtx x, reg_class_t 
regclass)
   return NO_REGS;
 }
 
-  /* Prefer SSE regs only, if we can use them for math.  */
+  /* Prefer SSE if we can use them for math.  Also allow integer regs
+ when moves between register units are cheap.  */
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
-return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
+{
+  if (TARGET_INTER_UNIT_MOVES_FROM_VEC
+ && TARGET_INTER_UNIT_MOVES_TO_VEC
+ && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (word_mode))
+   return INT_SSE_CLASS_P (regclass) ? regclass : NO_REGS;
+  else
+   return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
+}
 
   /* Generally when we see PLUS here, it's the function invariant
  (plus soft-fp const_int).  Which can only be computed into general
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2fda1e0686e..ec90e47904b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1283,6 +1283,8 @@ enum reg_class
   reg_class_subset_p ((CLASS), FLOAT_REGS)
 #define SSE_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), ALL_SSE_REGS)
+#define INT_SSE_CLASS_P(CLASS) \
+  reg_class_subset_p ((CLASS), INT_SSE_REGS)
 #define MMX_CLASS_P(CLASS) \
   ((CLASS) == MMX_REGS)
 #define MASK_CLASS_P(CLASS) \
diff --git a/gcc/testsuite/gcc.target/i386/pr95740.c 
b/gcc/testsuite/gcc.target/i386/pr95740.c
new file mode 100644
index 000..7ecd71ba8c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95740.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-msse2 -O2 -mtune=generic -mtune-ctrl=use_incdec -masm=att 
-mfpmath=sse" } */
+/* { dg-final { scan-assembler-times {(?n)movd[\t ]*%xmm0.*%eax} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)incl[\t ]*%eax} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)movq[\t ]*%xmm0.*%rax} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)incq[\t ]*%rax} 1 } } */
+
+int
+foo (float a)
+{
+  union{
+int b;
+float a;}u;
+  u.a = a;
+  return u.b + 1;
+}
+
+long long
+foo1 (double a)
+{
+  union{
+long long b;
+double a;}u;
+  u.a = a;
+  return u.b + 1;
+}
-- 
2.18.2



Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-05 Thread Hongtao Liu via Gcc-patches
Forget --in-reply-to when git send-email.

> I was thinking about:
>
> --cut here--
> @@ -19194,9 +19194,17 @@ ix86_preferred_reload_class (rtx x,
> reg_class_t regclass)
>   return NO_REGS;
> }
>
> -  /* Prefer SSE regs only, if we can use them for math.  */
> +  /* Prefer SSE if we can use them for math.  Also allow integer regs
> + when moves between register units are cheap.  */
>   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> -return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +{
> +  if (TARGET_INTER_UNIT_MOVES_FROM_VEC
> + && TARGET_INTER_UNIT_MOVES_TO_VEC
> + && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (word_mode))
> +   return INT_SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +  else
> +   return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +}
>
>   /* Generally when we see PLUS here, it's the function invariant
>  (plus soft-fp const_int).  Which can only be computed into general
> --cut here--
>
> So, INT_SSE class is allowed when interunit moves are enabled. The
> patch also takes care for 64-bit moves which are expensive on 32-bit
> targets.

I like your version, update patch.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} w/ and w/o -march=k8.


On Mon, Dec 6, 2021 at 11:41 AM liuhongt  wrote:
>
> When moves between integer and sse registers are cheap.
>
> 2021-12-06  Hongtao Liu  
> Uroš Bizjak  
> gcc/ChangeLog:
>
> PR target/95740
> * config/i386/i386.c (ix86_preferred_reload_class): Allow
> integer regs when moves between register units are cheap.
> * config/i386/i386.h (INT_SSE_CLASS_P): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr95740.c: New test.
> ---
>  gcc/config/i386/i386.c  | 12 ++--
>  gcc/config/i386/i386.h  |  2 ++
>  gcc/testsuite/gcc.target/i386/pr95740.c | 26 +
>  3 files changed, 38 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr95740.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 80fee627358..e3c2e294988 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19194,9 +19194,17 @@ ix86_preferred_reload_class (rtx x, reg_class_t 
> regclass)
>return NO_REGS;
>  }
>
> -  /* Prefer SSE regs only, if we can use them for math.  */
> +  /* Prefer SSE if we can use them for math.  Also allow integer regs
> + when moves between register units are cheap.  */
>if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> -return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +{
> +  if (TARGET_INTER_UNIT_MOVES_FROM_VEC
> + && TARGET_INTER_UNIT_MOVES_TO_VEC
> + && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (word_mode))
> +   return INT_SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +  else
> +   return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +}
>
>/* Generally when we see PLUS here, it's the function invariant
>   (plus soft-fp const_int).  Which can only be computed into general
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 2fda1e0686e..ec90e47904b 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1283,6 +1283,8 @@ enum reg_class
>reg_class_subset_p ((CLASS), FLOAT_REGS)
>  #define SSE_CLASS_P(CLASS) \
>reg_class_subset_p ((CLASS), ALL_SSE_REGS)
> +#define INT_SSE_CLASS_P(CLASS) \
> +  reg_class_subset_p ((CLASS), INT_SSE_REGS)
>  #define MMX_CLASS_P(CLASS) \
>((CLASS) == MMX_REGS)
>  #define MASK_CLASS_P(CLASS) \
> diff --git a/gcc/testsuite/gcc.target/i386/pr95740.c 
> b/gcc/testsuite/gcc.target/i386/pr95740.c
> new file mode 100644
> index 000..7ecd71ba8c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr95740.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-msse2 -O2 -mtune=generic -mtune-ctrl=use_incdec -masm=att 
> -mfpmath=sse" } */
> +/* { dg-final { scan-assembler-times {(?n)movd[\t ]*%xmm0.*%eax} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)incl[\t ]*%eax} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)movq[\t ]*%xmm0.*%rax} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)incq[\t ]*%rax} 1 } } */
> +
> +int
> +foo (float a)
> +{
> +  union{
> +int b;
> +float a;}u;
> +  u.a = a;
> +  return u.b + 1;
> +}
> +
> +long long
> +foo1 (double a)
> +{
> +  union{
> +long long b;
> +double a;}u;
> +  u.a = a;
> +  return u.b + 1;
> +}
> --
> 2.18.2
>


--
BR,
Hongtao


Re: [PATCH v8 2/2] Don't move cold code out of loop by checking bb count

2021-12-05 Thread Xionghu Luo via Gcc-patches



On 2021/12/1 18:09, Richard Biener wrote:
> On Wed, Nov 10, 2021 at 4:08 AM Xionghu Luo  wrote:
>>
>>
>>
>> On 2021/11/4 21:00, Richard Biener wrote:
>>> On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo  wrote:


> +  while (outmost_loop != loop)
> +{
> +  if (bb_colder_than_loop_preheader (loop_preheader_edge
> (outmost_loop)->src,
> +loop_preheader_edge 
> (cold_loop)->src))
> +   cold_loop = outmost_loop;
> +  outmost_loop = superloop_at_depth (loop, loop_depth (outmost_loop) 
> + 1);
> +}
>
> could be instead written as
>
>   coldest_loop = coldest_outermost_loop[loop->num];
>   if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
> return outermost_loop;
>   return coldest_loop;
>
> ?  And in the usual case coldest_outermost_loop[L] would be the loop tree 
> root.
> It should be possible to compute such cache in a DFS walk of the loop tree
> (the loop iterator by default visits in such order).


 Thanks.  Updated the patch with your suggestion.  Not sure whether it 
 strictly
 conforms to your comments.  Though the patch passed all my added 
 tests(coverage not enough),
 I am still a bit worried if pre-computed coldest_loop is outside of 
 outermost_loop, but
 outermost_loop is not the COLDEST LOOP, i.e. (outer->inner)

  [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, 
 ..., loop],

 then function find_coldest_out_loop will return a loop NOT accord with our
 expectation, that should return second_coldest_loop instead of 
 outermost_loop?
>>> Hmm, interesting - yes.  I guess the common case will be that the 
>>> pre-computed
>>> outermost loop will be the loop at depth 1 since outer loops tend to
>>> be colder than
>>> inner loops?  That would then defeat the whole exercise.
>>
>> It is not easy to construct such cases, But finally I got below results,
>>
>> 1) many cases inner loop is hotter than outer loop, for example:
>>
>> loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL
>> loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1
>> loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2
>> loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2
>>
>>
>> 2) But there are also cases inner loop is colder than outer loop, like:
>>
>> loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL
>> loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL
>> loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL
>>
>>
>>>
>>> To optimize the common case but not avoiding iteration in the cases we care
>>> about we could instead cache the next outermost loop that is _not_ colder
>>> than loop.  So for your [ ... ] example above we'd have> 
>>> hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the
>>> candidate would then be 'second_coldest_loop' and we'd then iterate
>>> to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next
>>> cold candidate we can compare against?  For the common case we'd
>>> have hotter_than_inner_loop[looo] == NULL (no such loop) and we then
>>> simply pick 'outermost_loop'.
>>
>> Thanks.  It was difficult to understand, but finally I got to know what you
>> want to express :)
>>
>> We should cache the next loop that is *colder* than loop instead of '_not_ 
>> colder
>> than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop',
>> then it makes sense if the coldest loop is outside of outermost loop, 
>> continue to
>> find a colder loop between outermost loop and current loop in
>> colder_than_inner_loop[loop->num]?  Hope I understood you correctly...
> 
> Heh, looking at the patch - I don't know.
> 
> To make the calls to bb_colder_than_loop_preheader more obvious can you
> change that function to
> 
> +bool bb_colder_than_loop_preheader (basic_block bb, loop *loop)
> +{
> +  return bb->count < loop_preheader_edge (loop)->src->count;
> 
> ?
> 
> +  if (colder_than_inner_loop[loop->num] != NULL
> + && loop_depth (outermost_loop)
> +  < loop_depth (colder_than_inner_loop[loop->num]))
> +   return colder_than_inner_loop[loop->num];
> 
> that is
> 
> +  3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a
> +  colder loop between OUTERMOST_LOOP and loop in pre-computed
> +  COLDER_THAN_INNER_LOOP, return it, otherwise return OUTERMOST_LOOP.  */
> 
> since we index colder_than_inner_loop by loop->num as well this doesn't find 
> the
> coldest outer loop of 'loop' within the nest up to outermost_loop,
> correct?  For the
> "common" case of each successive outer loop being colder than the immediate
> inner loop doesn't that result in only hoisting one level in case the
> coldest outer loop
> is outside of the outermost loop we can legally hoist to?
> 
> My s

Ping: [PATCH v2] Fix incorrect loop exit edge probability [PR103270]

2021-12-05 Thread Xionghu Luo via Gcc-patches
Hi Honza,

Gentle ping for this :), thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585289.html


On 2021/11/24 13:03, Xionghu Luo via Gcc-patches wrote:
> On 2021/11/23 17:50, Jan Hubicka wrote:
>>> On Tue, Nov 23, 2021 at 6:52 AM Xionghu Luo  wrote:

 r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
 profile-estimate when predict_extra_loop_exits, outer loop's exit edge
 is marked as inner loop's extra loop exit and set with incorrect
 prediction, then a hot inner loop will become cold loop finally through
 optimizations, this patch ignores the EDGE_DFS_BACK edge when searching
 extra exit edges to avoid unexpected predict_edge.
>>>
>>> Not sure how outer vs. inner loop exit correlates with EDGE_DFS_BACK,
>>> I have expected a check based on which loop is exited by the edge instead?
>>> A backedge should never be an exit, no?
>>>
>>> Note that the profile pass does not yet mark backedges so EDGE_DFS_BACK
>>> settings are unreliable.
>>
>> So we have two nested loops and an exit which goes from inner loop and
>> exists both loops.  While processing outer loop we set pretty high exit
>> probability that is not good for inner loop?
> 
> No, the edge only belongs to outer loop only.  Can an exit edge belongs to
> two different loops at the same time?
> Exit edges are iterated with LI_FROM_INNERMOST in predict_loops, if an edge
> already has prediction by querying edge_predicted_by_p, maybe_predict_edge
> will early return to not set it again.
> 
> The CFG is:
> 
> 2
> |
> 8< // l1
> | \   |
> 10 9  |
> | |
>   7
> 6  | | 
> 11| 
> | |
> 4<-   |// l3
> | \|  |
> 5  3  |
> | |
> --
> 
> l2's edge (6->11,6->7) is set to (33%,67%) by l3 unexpectedly.
> 
> FYI: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103270#c5
> 
>>
>> I guess we could just check if exit edge source basic block has same
>> loop depth as the loop we are processing?
>>
> 
> 
> Thanks for the suggestion, it works.  Loop checks already existed in
> predict_paths_for_bb, just need pass down the loop argument.
> Updated as v2 patch.
> 
> 
> v2-0001-Fix-incorrect-loop-exit-edge-probability-PR103270.patch
> 
> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> is marked as inner loop's extra loop exit and set with incorrect
> prediction, then a hot inner loop will become cold loop finally through
> optimizations, this patch add loop check when searching extra exit edges
> to avoid unexpected predict_edge from predict_paths_for_bb.
> 
> Regression tested pass on P8 & x86, OK for master?
> 
> gcc/ChangeLog:
> 
>   PR middle-end/103270
>   * predict.c (predict_extra_loop_exits): Add loop parameter.
>   (predict_loops): Call with loop argument.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/103270
>   * gcc.dg/pr103270.c: New test.
> ---
>  gcc/predict.c   | 10 ++
>  gcc/testsuite/gcc.dg/pr103270.c | 19 +++
>  2 files changed, 25 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr103270.c
> 
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 68b11135680..082782ec4e9 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
> exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
> 
>  static void
> -predict_extra_loop_exits (edge exit_edge)
> +predict_extra_loop_exits (class loop *loop, edge exit_edge)
>  {
>unsigned i;
>bool check_value_one;
> @@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
>   continue;
>if (EDGE_COUNT (e->src->succs) != 1)
>   {
> -   predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> +   predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
> +  loop);
> continue;
>   }
> 
>FOR_EACH_EDGE (e1, ei, e->src->preds)
> - predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> + predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
> +loop);
>  }
>  }
> 
> @@ -2009,7 +2011,7 @@ predict_loops (void)
>ex->src->index, ex->dest->index);
> continue;
>   }
> -   predict_extra_loop_exits (ex);
> +   predict_extra_loop_exits (loop, ex);
> 
> if (number_of_iterations_exit (loop, ex, &niter_desc, false, false))
>   niter = niter_desc.niter;
> diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
> new file mode 100644
> index 000..819310e360e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr103270.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
> +
> +void test(int a, int* i)
> +{
> +  for (; a < 5; ++a)

Re: [PATCH v8 2/2] Don't move cold code out of loop by checking bb count

2021-12-05 Thread Xionghu Luo via Gcc-patches



On 2021/12/6 13:09, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/1 18:09, Richard Biener wrote:
>> On Wed, Nov 10, 2021 at 4:08 AM Xionghu Luo  wrote:
>>>
>>>
>>>
>>> On 2021/11/4 21:00, Richard Biener wrote:
 On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo  wrote:
>
>
>> +  while (outmost_loop != loop)
>> +{
>> +  if (bb_colder_than_loop_preheader (loop_preheader_edge
>> (outmost_loop)->src,
>> +loop_preheader_edge 
>> (cold_loop)->src))
>> +   cold_loop = outmost_loop;
>> +  outmost_loop = superloop_at_depth (loop, loop_depth 
>> (outmost_loop) + 1);
>> +}
>>
>> could be instead written as
>>
>>   coldest_loop = coldest_outermost_loop[loop->num];
>>   if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
>> return outermost_loop;
>>   return coldest_loop;
>>
>> ?  And in the usual case coldest_outermost_loop[L] would be the loop 
>> tree root.
>> It should be possible to compute such cache in a DFS walk of the loop 
>> tree
>> (the loop iterator by default visits in such order).
>
>
> Thanks.  Updated the patch with your suggestion.  Not sure whether it 
> strictly
> conforms to your comments.  Though the patch passed all my added 
> tests(coverage not enough),
> I am still a bit worried if pre-computed coldest_loop is outside of 
> outermost_loop, but
> outermost_loop is not the COLDEST LOOP, i.e. (outer->inner)
>
>  [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, 
> ..., loop],
>
> then function find_coldest_out_loop will return a loop NOT accord with our
> expectation, that should return second_coldest_loop instead of 
> outermost_loop?
 Hmm, interesting - yes.  I guess the common case will be that the 
 pre-computed
 outermost loop will be the loop at depth 1 since outer loops tend to
 be colder than
 inner loops?  That would then defeat the whole exercise.
>>>
>>> It is not easy to construct such cases, But finally I got below results,
>>>
>>> 1) many cases inner loop is hotter than outer loop, for example:
>>>
>>> loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL
>>> loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1
>>> loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2
>>> loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2
>>>
>>>
>>> 2) But there are also cases inner loop is colder than outer loop, like:
>>>
>>> loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL
>>> loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL
>>> loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL
>>>
>>>

 To optimize the common case but not avoiding iteration in the cases we care
 about we could instead cache the next outermost loop that is _not_ colder
 than loop.  So for your [ ... ] example above we'd have> 
 hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the
 candidate would then be 'second_coldest_loop' and we'd then iterate
 to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next
 cold candidate we can compare against?  For the common case we'd
 have hotter_than_inner_loop[looo] == NULL (no such loop) and we then
 simply pick 'outermost_loop'.
>>>
>>> Thanks.  It was difficult to understand, but finally I got to know what you
>>> want to express :)
>>>
>>> We should cache the next loop that is *colder* than loop instead of '_not_ 
>>> colder
>>> than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop',
>>> then it makes sense if the coldest loop is outside of outermost loop, 
>>> continue to
>>> find a colder loop between outermost loop and current loop in
>>> colder_than_inner_loop[loop->num]?  Hope I understood you correctly...
>>
>> Heh, looking at the patch - I don't know.
>>
>> To make the calls to bb_colder_than_loop_preheader more obvious can you
>> change that function to
>>
>> +bool bb_colder_than_loop_preheader (basic_block bb, loop *loop)
>> +{
>> +  return bb->count < loop_preheader_edge (loop)->src->count;
>>
>> ?
>>
>> +  if (colder_than_inner_loop[loop->num] != NULL
>> + && loop_depth (outermost_loop)
>> +  < loop_depth (colder_than_inner_loop[loop->num]))
>> +   return colder_than_inner_loop[loop->num];
>>
>> that is
>>
>> +  3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a
>> +  colder loop between OUTERMOST_LOOP and loop in pre-computed
>> +  COLDER_THAN_INNER_LOOP, return it, otherwise return OUTERMOST_LOOP.  */
>>
>> since we index colder_than_inner_loop by loop->num as well this doesn't find 
>> the
>> coldest outer loop of 'loop' within the nest up to outermost_loop,
>> correct?  For the
>> "common" case of each successive outer loop being colder 

Re: [PATCH 3/5] gcc: Add --nostdlib++ option

2021-12-05 Thread Fāng-ruì Sòng via Gcc-patches
On Sun, Dec 5, 2021 at 6:43 AM Richard Purdie via Gcc-patches
 wrote:
>
> On Thu, 2021-12-02 at 20:04 -0700, Jeff Law wrote:
> >
> > On 10/28/2021 10:39 AM, Richard Purdie wrote:
> > > On Thu, 2021-10-28 at 08:51 -0600, Jeff Law wrote:
> > > > On 10/27/2021 2:05 PM, Richard Purdie via Gcc-patches wrote:
> > > > > OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime 
> > > > > libraries
> > > > > separately from the compiler and slightly differently to the standard 
> > > > > gcc build.
> > > > >
> > > > > In general this works well but in trying to build them separately we 
> > > > > run into
> > > > > an issue since we're using our gcc, not xgcc and there is no way to 
> > > > > tell configure
> > > > > to use libgcc but not look for libstdc++.
> > > > >
> > > > > This adds such an option allowing such configurations to work.
> > > > >
> > > > > 2021-10-26 Richard Purdie 
> > > > >
> > > > > gcc/c-family/ChangeLog:
> > > > >
> > > > >   * c.opt: Add --nostdlib++ option
> > > > >
> > > > > gcc/cp/ChangeLog:
> > > > >
> > > > >   * g++spec.c (lang_specific_driver): Add --nostdlib++ option
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * doc/invoke.texi: Document --nostdlib++ option
> > > > >   * gcc.c: Add --nostdlib++ option
> > > > Couldn't you use -nostdlib then explicitly add -lgcc?
> > > >
> > > > If that works, that would seem better to me compared to adding an option
> > > > to specs processing that is really only useful to one build
> > > > system/procedure.
> > > It sounds great in principle but I've never been able to get it to work. 
> > > With
> > > "-nostdinc++ -nostdlib" I miss the startup files so I also tried 
> > > "-nostdinc++ -
> > > nodefaultlibs -lgcc". The latter gets further and I can build libstdc++ 
> > > but the
> > > resulting library doesn't link into applications correctly.
> > Can you be more specific about "doesn't link into applications
> > correctly".  I'm still hesitant to add another option if we can
> > reasonably avoid it.
>
> I took a step back and had another look at what our build is doing and why we
> need this. Our build builds the different components separately in many cases 
> so
> libstdc++ is built separately since that allows us to tune it to specific
> targets whilst the core gcc is architecture specific.
>
> When we run configure for libstdc++, we have to configure CXX. We can 
> configure
> it as:
>
> CXX="${HOST_PREFIX}g++ -nostdinc++"
>
> however that gives errors about a missing libstdc++ during configure tests 
> (e.g.
> the atomic ones) since the library isn't present yet and we're trying to build
> it. When I last ran into this I added the nostdlib++ option to mirror the
> stdinc++ one and this solved the problem.
>
> Adding -lgcc doesn't seem to work, binaries using libstdc++ segfault on some
> architectures (mips64 and ppc). I suspect this is a link ordering issue which 
> we
> have little control of from the CXX variable but I haven't got deeply into it 
> as
> I got the feeling it would be a pain to try and correct, we need the 
> compiler's
> automatic linking behaviour which you can't emulate from one commandline.
>
> I did also try -nostdlib and -nodefaultlibs with various libraries added in 
> but
> it never seems to work well giving segfaults, again as I suspect the linking 
> is
> order specific.
>
> Thinking about the problem from scratch again, I wondered if a dummy libstdc++
> would be enough to make the configure tests work correctly. I've found that I
> can do something like:
>
> mkdir -p XXX/dummylib
> touch XXX/dummylib/libstdc++.so
> export CXX="${CXX} -nostdinc++ -LXXX/dummylib"
>
> and the configure works correctly and the resulting libs don't segfault on
> target.
>
> This is a bit of a hack but probably no worse than some of the other things we
> have to do to build so if you're not keen on the patch we could go with this. 
> It
> did seem at least worth discussing our use case though! :)
>
> Cheers,
>
> Richard
>

Drive-by: is -nostdlib++ is implemented, hope its semantics are
similar to clang -nostdlib++ (https://reviews.llvm.org/D35780 since
2017)


Re: [PATCH 1/2] - Add BB option for outgoing_edge_range_p.

2021-12-05 Thread Richard Biener via Gcc-patches
On Fri, Dec 3, 2021 at 9:41 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> has_edge_range_p() and may_recompute_p() currently only take an optional
> edge as a parameter.  They only indicate if a range *might* be
> calculated for a name, but they do not do any calculations.
>
> To determine the results, they always consult the exports for the
> edge->src block. the value is true or false for every edge, so its
> really a basic block property.  This patch makes the option available to
> consult directly with the BB, and calls that for the edge version.
>
> Without this, if you want to know if a basic block can affect a range,
> you have to pick an arbitrary outgoing edge and ask with that.. which
> seems a little awkward.
>
> This patch is used by the next one.
>
> Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

OK.

>
> Andrew


Re: [PATCH 2/2] Use dominators to reduce ranger cache-flling.

2021-12-05 Thread Richard Biener via Gcc-patches
On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> When a request is made for the range of an ssa_name at some location,
> the first thing we do is invoke range_of_stmt() to ensure we have looked
> at the definition and have an evaluation for the name at a global
> level.  I recently added a patch which dramatically reduces the call
> stack requirements for that call.
>
> Once we have confirmed the definition range has been set, a call is made
> for the range-on-entry to the block of the use.  This is performed by
> the cache, which proceeds to walk the CFG predecessors looking for when
> ranges are created  (exported), existing range-on-entry cache hits,  or
> the definition block. Once this list of  predecessors has been created,
> a forward walk is done, pushing the range's through successor edges of
> all the blocks  were visited in the initial walk.
>
> If the use is far from the definition, we end up filling a lot of the
> same value on these paths.  Also uses which are far from a
> range-modifying statement push the same value into a lot of blocks.
>
> This patch tries to address at least some inefficiencies.  It recognizes
> that
>
> First, if there is no range modifying stmt between this use and the last
> range we saw in a dominating block, we can just use the value from the
> dominating block and not fill in all the cache entries between here and
> there.  This is the biggest win.
>
> Second. if there is a range modifying statement at the end of some
> block, we will have to do the appropriate cache walk to this point, but
> its possible the range-on-entry to THAT block might be able to use a
> dominating range, and we can prevent the walk from going any further
> than this block
>
> Combined, this should prevent a lot of unnecessary ranges from being
> plugging into the cache.
>
> ie, to visualize:
>
> bb4:
>a = foo()
> <..>
> bb60:
> if (a < 30)
> <...>
> bb110:
>  g = a + 10
>
> if the first place we ask for a is in bb110, we walk the CFG from 110
> all the way back to bb4, on all paths leading back. then fill all those
> cache entries.
> With this patch,
>a) if bb60 does not dominate bb110, the request will scan the
> dominators, arrive at the definition block, have seen no range modifier,
> and simply set the on-entry for 110 to the range of a. done.
>b) if bb60 does dominate 110, we have no idea which edge out of 60
> dominates it, so we will revert to he existing cache algorithm.  Before
> doing so, it checks and determines that there are no modifiers between
> bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be
> the range of a.   Now when we do the cache fill walk, it only has to go
> back as far as bb60 instead of all the way to bb4.
>
> Otherwise we just revert to what we do now (or if dominators are not
> available).   I have yet to see a case where we miss something we use to
> get, but that does not mean there isn't one :-).
>
> The cumulative performance impact of this compiling a set of 390 GCC
> source files at -O2 (measured via callgrind) is pretty decent:  Negative
> numbers are a compile time decrease.  Thus -10% is 10% faster
> compilation time.
>
> EVRP : %change from trunk is -26.31% (!)
> VRP2 : %change from trunk is -9.53%
> thread_jumps_full   : %change from trunk is -15.8%
> Total compilation time  : %change from trunk is -1.06%
>
> So its not insignificant.
>
> Risk would be very low, unless dominators are screwed up mid-pass.. but
> then the relation code would be screwed up too.
>
> Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

OK.

Wow - I didn't realize we have this backwards CFG walk w/o dominators ...
I wonder if you can add a counter to visualize places we end up using this path.
I suggest to add statistics_counter_event (cfun, "slow range query", 1); or so
and see with -fdump-statistics-stats which passes eventually trigger that
(I suspect none?).

Thanks,
Richard.

> Andrew
>
>
>
>


Re: [PATCH] ranger: Optimise irange_union

2021-12-05 Thread Richard Biener via Gcc-patches
On Sun, Dec 5, 2021 at 10:55 PM Richard Sandiford via Gcc-patches
 wrote:
>
> When compiling an optabs.ii at -O2 with a release-checking build,
> the hottest function in the profile was irange_union.  This patch
> tries to optimise it a bit.  The specific changes are:
>
> - Use quick_push rather than safe_push, since the final number
>   of entries is known in advance.
>
> - Avoid assigning wi::to_wide & co. to a temporary wide_int,
>   such as in:
>
> wide_int val_j = wi::to_wide (res[j]);
>
>   wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
>   storage.  Assigning the result to wide_int forces an unnecessary
>   copy to temporary storage.
>
>   This is one area where "auto" helps a lot.  In the end though,
>   it seemed more readable to inline the wi::to_*s rather than
>   use auto.
>
> - Use to_widest_int rather than to_wide_int.  Both are functionally
>   correct, but to_widest_int is more efficient, for three reasons:
>
>   - to_wide returns a wide-int representation in which the most
> significant element might not be canonically sign-extended.
> This is because we want to allow the storage of an INTEGER_CST
> like 0x1U << 31 to be accessed directly with both a wide_int view
> (where only 32 bits matter) and a widest_int view (where many more
> bits matter, and where the 32 bits are zero-extended to match the
> unsigned type).  However, operating on uncanonicalised wide_int
> forms is less efficient than operating on canonicalised forms.
>
>   - to_widest_int has a constant rather than variable precision and
> there are never any redundant upper bits to worry about.
>
>   - Using widest_int avoids the need for an overflow check, since
> there is enough precision to add 1 to any IL constant without
> wrap-around.
>
> This gives a ~2% compile-time speed up with the test above.
>
> I also tried adding a path for two single-pair ranges, but it
> wasn't a win.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> * value-range.cc (irange::irange_union): Use quick_push rather
> than safe_push.  Use widest_int rather than wide_int.  Avoid
> assigning wi::to_* results to wide*_int temporaries.
> ---
>  gcc/value-range.cc | 46 +-
>  1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 82509fa55a7..d38d0786072 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -1550,70 +1550,50 @@ irange::irange_union (const irange &r)
>// the merge is performed.
>//
>// [Xi,Yi]..[Xn,Yn]  U  [Xj,Yj]..[Xm,Ym]   -->  [Xk,Yk]..[Xp,Yp]
> -  tree ttype = r.type ();
> -  signop sign = TYPE_SIGN (ttype);
> -
> -  auto_vec res;
> -  wide_int u1 ;
> -  wi::overflow_type ovf;
> +  auto_vec res (m_num_ranges * 2 + r.m_num_ranges * 2);
>unsigned i = 0, j = 0, k = 0;
>
>while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2)
>  {
>// lower of Xi and Xj is the lowest point.
> -  if (wi::le_p (wi::to_wide (m_base[i]), wi::to_wide (r.m_base[j]), 
> sign))
> +  if (wi::to_widest (m_base[i]) <= wi::to_widest (r.m_base[j]))
> {
> - res.safe_push (m_base[i]);
> - res.safe_push (m_base[i + 1]);
> + res.quick_push (m_base[i]);
> + res.quick_push (m_base[i + 1]);
>   k += 2;
>   i += 2;
> }
>else
> {
> - res.safe_push (r.m_base[j]);
> - res.safe_push (r.m_base[j + 1]);
> + res.quick_push (r.m_base[j]);
> + res.quick_push (r.m_base[j + 1]);
>   k += 2;
>   j += 2;
> }
>  }
>for ( ; i < m_num_ranges * 2; i += 2)
>  {
> -  res.safe_push (m_base[i]);
> -  res.safe_push (m_base[i + 1]);
> +  res.quick_push (m_base[i]);
> +  res.quick_push (m_base[i + 1]);
>k += 2;
>  }
>for ( ; j < r.m_num_ranges * 2; j += 2)
>  {
> -  res.safe_push (r.m_base[j]);
> -  res.safe_push (r.m_base[j + 1]);
> +  res.quick_push (r.m_base[j]);
> +  res.quick_push (r.m_base[j + 1]);
>k += 2;
>  }
>
>// Now normalize the vector removing any overlaps.
>i = 2;
> -  int prec = TYPE_PRECISION (ttype);
> -  wide_int max_val = wi::max_value (prec, sign);
>for (j = 2; j < k ; j += 2)
>  {
> -  wide_int val_im1 = wi::to_wide (res[i - 1]);
> -  if (val_im1 == max_val)
> -   break;
> -  u1 = wi::add (val_im1, 1, sign, &ovf);
> -
> -  // Overflow indicates we are at MAX already.
> -  // A wide int bug requires the previous max_val check
> -  // trigger: gcc.c-torture/compile/pr80443.c  with -O3
> -  if (ovf == wi::OVF_OVERFLOW)
> -   break;
> -
> -  wide_int val_j = wi::to_wide (res[j]);
> -  wide_int val_jp1 = wi::to_wide (res[j+1]);
>// Current upper+1 is >= lower bound next pair, then we merge ranges.
> - 

Re: [PATCH] ranger: Add shortcuts for single-successor blocks

2021-12-05 Thread Richard Biener via Gcc-patches
On Sun, Dec 5, 2021 at 10:58 PM Richard Sandiford via Gcc-patches
 wrote:
>
> When compiling an optabs.ii at -O2 with a release-checking build,
> there were 6,643,575 calls to gimple_outgoing_range_stmt_p.  96.8% of
> them were for blocks with a single successor, which never have a control
> statement that generates new range info.  This patch therefore adds a
> shortcut for that case.
>
> This gives a ~1% compile-time improvement for the test.
>
> I tried making the function inline (in the header) so that the
> single_succ_p didn't need to be repeated, but it seemed to make things
> slightly worse.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> gcc/
> * gimple-range-edge.cc (gimple_outgoing_range::edge_range_p): Add
> a shortcut for blocks with single successors.
> * gimple-range-gori.cc (gori_map::calculate_gori): Likewise.
> ---
>  gcc/gimple-range-edge.cc | 3 +++
>  gcc/gimple-range-gori.cc | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/gimple-range-edge.cc b/gcc/gimple-range-edge.cc
> index afffc8dbcae..9e805230004 100644
> --- a/gcc/gimple-range-edge.cc
> +++ b/gcc/gimple-range-edge.cc
> @@ -182,6 +182,9 @@ gimple_outgoing_range::calc_switch_ranges (gswitch *sw)
>  gimple *
>  gimple_outgoing_range::edge_range_p (irange &r, edge e)
>  {
> +  if (single_succ_p (e->src))
> +return NULL;
> +
>// Determine if there is an outgoing edge.
>gimple *s = gimple_outgoing_range_stmt_p (e->src);
>if (!s)
> diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
> index 0dba34b58c5..b256e4afd15 100644
> --- a/gcc/gimple-range-gori.cc
> +++ b/gcc/gimple-range-gori.cc
> @@ -555,6 +555,9 @@ gori_map::calculate_gori (basic_block bb)
>m_outgoing[bb->index] = BITMAP_ALLOC (&m_bitmaps);
>m_incoming[bb->index] = BITMAP_ALLOC (&m_bitmaps);
>
> +  if (single_succ_p (bb))
> +return;
> +
>// If this block's last statement may generate range informaiton, go
>// calculate it.
>gimple *stmt = gimple_outgoing_range_stmt_p (bb);
> --
> 2.31.1
>


Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-05 Thread Richard Biener via Gcc-patches
On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
 wrote:
>
> When compiling an optabs.ii at -O2 with a release-checking build,
> we spent a lot of time in call_may_clobber_ref_p.  One of the
> first things the function tries is the get_modref_function_summary
> approach, but that also seems to be the most expensive check.
> At least for the optabs.ii test, most of the things g_m_f_s could
> handle could also be handled by points-to analysis, which is much
> cheaper.
>
> This patch therefore rearranges the tests in approximate order
> of cheapness.  One wart is that points-to analysis can't be
> used on its own for this purpose; it has to be used in
> conjunction with check_fnspec.  This is because the argument
> information in the fnspec is not reflected in the points-to
> information, which instead contains overly optimistic (rather
> than conservatively pessimistic) information.
>
> Specifically, find_func_aliases_for_builtin_call short-circuits
> the handle_rhs_call/handle_call_arg logic and doesn't itself add
> any compensating information about the arguments.  This means that:
>
>   free (foo)
>
> does not have an points-to information for foo (it has an empty
> set instead).

Huh?  The gimple_call_use/clobber_sets should be conservative
and stand on their own.  Can you share a testcase that breaks when
returning early if independent_pt?  Does this  problem also exist
on the branch?

I wonder if, at this point, we should split the patch up to do the
trivial move of the ao_ref_base dependent and volatile checks
and leave this more complicated detail for a second patch?

The first part is pre-approved, I'd like to understand the second part
before considering it.

Thanks,
Richard.

>
> It would be good to fix that for compile-time reasons if nothing else.
> Doing points-to analysis without check_fnspec gave a measureable
> speed-up, since PTA resolved the vast majority of queries, and
> (as expected) the vast majority of calls had no fnspec.
>
> This gave about a ~2% compile-time improvement for the test above.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * tree-ssa-alias.c (call_may_clobber_ref_p_1): Reorder checks
> to the cheaper ones come first.
> ---
>  gcc/tree-ssa-alias.c | 145 +--
>  1 file changed, 70 insertions(+), 75 deletions(-)
>
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index 88fd7821c5e..573766278bc 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -2952,9 +2952,6 @@ ref_maybe_used_by_stmt_p (gimple *stmt, tree ref, bool 
> tbaa_p)
>  bool
>  call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p)
>  {
> -  tree base;
> -  tree callee;
> -
>/* If the call is pure or const it cannot clobber anything.  */
>if (gimple_call_flags (call)
>& (ECF_PURE|ECF_CONST|ECF_LOOPING_CONST_OR_PURE|ECF_NOVOPS))
> @@ -2977,9 +2974,64 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
> break;
>}
>
> -  callee = gimple_call_fndecl (call);
> +  tree base = ao_ref_base (ref);
> +  if (base
> +  && (TREE_CODE (base) == SSA_NAME
> + || CONSTANT_CLASS_P (base)))
> +return false;
> +
> +  /* A call that is not without side-effects might involve volatile
> + accesses and thus conflicts with all other volatile accesses.  */
> +  if (ref->volatile_p)
> +return true;
> +
> +  bool independent_pt = false;
> +  if (base && DECL_P (base))
> +{
> +  /* If the reference is based on a decl that is not aliased the call
> +cannot possibly clobber it.  */
> +  if (!may_be_aliased (base)
> + /* But local non-readonly statics can be modified through
> +recursion or the call may implement a threading barrier which
> +we must treat as may-def.  */
> + && (TREE_READONLY (base)
> + || !is_global_var (base)))
> +   return false;
>
> -  if (callee != NULL_TREE && !ref->volatile_p)
> +  auto clobbers = gimple_call_clobber_set (call);
> +  independent_pt = !pt_solution_includes (clobbers, base);
> +}
> +  else if (base
> +  && (TREE_CODE (base) == MEM_REF
> +  || TREE_CODE (base) == TARGET_MEM_REF)
> +  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
> +{
> +  tree addr = TREE_OPERAND (base, 0);
> +
> +  /* If the reference is based on a pointer that points to memory that
> +may not be written to then the call cannot possibly clobber it.  */
> +  if (SSA_NAME_POINTS_TO_READONLY_MEMORY (addr))
> +   return false;
> +
> +  /* Check if the base variable is call-clobbered.  */
> +  if (auto *pi = SSA_NAME_PTR_INFO (addr))
> +   {
> + auto clobbers = gimple_call_clobber_set (call);
> + independent_pt = !pt_solutions_intersect (clobbers, &pi->pt);
> +   }
> +}
> +
> +  /* Points-to information needs to be tested in conju

Re: [PATCH] fold: Optimise fold_view_convert_expr

2021-12-05 Thread Richard Biener via Gcc-patches
On Sun, Dec 5, 2021 at 11:00 PM Richard Sandiford via Gcc-patches
 wrote:
>
> When compiling an optabs.ii at -O2 with a release-checking build,
> there were 210,172 calls to fold_view_convert_expr.  99.8% of them
> were conversions of an INTEGER_CST to an ENUMERAL_TYPE, so this patch
> adds a fast(er) path for that case.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * fold-const.c (fold_view_convert_expr): Add a fast path for
> conversions of INTEGER_CSTs to other integral types.
> ---
>  gcc/fold-const.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0b9a42f764a..19613015ddb 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -9128,6 +9128,13 @@ fold_view_convert_vector_encoding (tree type, tree 
> expr)
>  static tree
>  fold_view_convert_expr (tree type, tree expr)
>  {
> +  /* Conversions of INTEGER_CSTs to ENUMERAL_TYPEs are very common,
> + so handle them directly.  */
> +  if (INTEGRAL_TYPE_P (type)
> +  && TREE_CODE (expr) == INTEGER_CST
> +  && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (expr)))
> +return wide_int_to_tree (type, wi::to_wide (expr));

I suppose that TREE_CODE (expr) == INTEGER_CST &&
tree_nop_conversion_p (type, TREE_TYPE (expr))
might be more complete?  In fact, it might be possible to do

  if (TREE_CODE (expr) == INTEGER_CST)
{
   gcc_checking_assert (tree_nop_conversion_p (type, TREE_TYPE (expr));
   return wide_int_to_tree (type, wi::to_wide (expr));
}

or at least

  if (TREE_CODE (expr) == INTEGER_CST)
{
   gcc_checking_assert (operand_equal_p (TYPE_SIZE (type),
TYPE_SIZE (TREE_TYPE (expr)));
   return wide_int_to_tree (type, wi::zext (wi::to_widest (expr),
TYPE_PRECISION (TREE_TYPE (expr)));
}

but I realize IL verification around VIEW_CONVERTs is incomplete and
the above case
likely mostly happens with match.pd unifying the vector/non-vector
sign conversion path.

That said, the intent was to not have VIEW_CONVERT between types with
different precision
since on GIMPLE/GENERIC the off-precision bits have no representation
and thus cannot be
"converted" as unchanging.

Richard.


> +
>/* We support up to 512-bit values (for V8DFmode).  */
>unsigned char buffer[64];
>int len;
> --
> 2.31.1
>