Re: Has GCC completed C++ 20 module support?

2021-11-02 Thread Jonathan Wakely via Gcc
Please don't cross-post to the gcc and gcc-help lists. Either you are
asking about GCC development of asking about using it, not both. Pick one
list.

On Tue, 2 Nov 2021, 04:22 sotrdg sotrdg via Gcc-help, 
wrote:

> It looks like It is still in early phase. the fmodule-ts still emits dead
> code for example.
>
> any progress here?
>

I wouldn't say early, but it is incomplete, as documented in the GCC 11
release notes and at https://gcc.gnu.org/projects/cxx-status.html#cxx20


Question on cgraph_node::force_output

2021-11-02 Thread Erick Ochoa via Gcc
Hi,

I am looking at tree-ssa-structalias.c looking at what makes a
function nonlocal during IPA-PTA. I am having some problems
understanding force_output and when it is set or unset.

1. What is the meaning of force_output? cgraph.h gives an example that
force output means that the symbol might be used in an invisible way.
I believe this means some sort of "unanalyzable" way. However, for a
few tests I've made, all functions have the field force_output set to
true.
2. Does this value depend on some other pass?

At the moment, I am looking at this field within my own passes
(IPA_PASS and SIMPLE_IPA_PASS), but I would like to inspect the
dump_file(s) which show information about force_output to make sure
that it doesn't depend on pass order or even my own flags.

3. What flags should I use to inspect force_output?

Thanks!


Re: [PATCH] Add -fopt-builtin optimization option

2021-11-02 Thread Richard Biener via Gcc
On Sun, Oct 31, 2021 at 11:13 AM Keith Packard via Gcc-patches
 wrote:
>
> This option (enabled by default) controls optimizations which convert
> a sequence of operations into an equivalent sequence that includes
> calls to builtin functions. Typical cases here are code which matches
> memcpy, calloc, sincos.
>
> The -ftree-loop-distribute-patterns flag only covers converting loops
> into builtin calls, not numerous other places where knowledge of
> builtin function semantics changes the generated code.
>
> The goal is to allow built-in functions to be declared by the compiler
> and used directly by the application, but to disable optimizations
> which create new calls to them, and to allow this optimization
> behavior to be changed for individual functions by decorating the
> function definition like this:
>
> void
> attribute((optimize("no-opt-builtin")))
> sincos(double x, double *s, double *c)
> {
> *s = sin(x);
> *c = cos(x);
> }
>
> This also avoids converting loops into library calls like this:
>
> void *
> attribute((optimize("no-opt-builtin")))
> memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n)
> {
> char *d = dst;
> const char *s = src;
>
> while (n--)
> *d++ = *s++;
> return dst;
> }
>
> As well as disabling analysis of memory lifetimes around free as in
> this example:
>
> void *
> attribute((optimize("no-opt-builtin")))
> erase_and_free(void *ptr)
> {
> memset(ptr, '\0', malloc_usable_size(ptr));
> free(ptr);
> }
>
> Clang has a more sophisticated version of this mechanism which
> can disable all builtins, or disable a specific builtin:
>
> double
> attribute((no_builtin("exp2")))
> exp2(double x)
> {
> return pow (2.0, x);
> }

I don't think it reliably works the way you implement it.  It's also having
more side-effects than what you document, in particular

  pow (2.0, x);

will now clobber and use global memory (besides errno).

I think you may want to instead change builtin_decl_implicit
to avoid code-generating a specific builtin.

Generally we'd also want sth like the clang attribute and _not_
use optimize("") for this or a global flag_*, so the behavior can
be more readily encoded in the IL.  In fact a flag on the call
statement could be added to denote the desired effect on it.

I also don't see the advantage compared to -fno-builtin[-foo].
Declaring the function should be something that's already done.

Richard.

> Signed-off-by: Keith Packard 
> ---
>  gcc/builtins.c   | 6 ++
>  gcc/common.opt   | 4 
>  gcc/gimple.c | 3 +++
>  gcc/tree-loop-distribution.c | 2 ++
>  4 files changed, 15 insertions(+)
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 7d0f61fc98b..7aae57deab5 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -1922,6 +1922,9 @@ mathfn_built_in_2 (tree type, combined_fn fn)
>built_in_function fcodef64x = END_BUILTINS;
>built_in_function fcodef128x = END_BUILTINS;
>
> +  if (flag_no_opt_builtin)
> +return END_BUILTINS;
> +
>switch (fn)
>  {
>  #define SEQ_OF_CASE_MATHFN \
> @@ -2125,6 +2128,9 @@ mathfn_built_in_type (combined_fn fn)
>case CFN_BUILT_IN_##MATHFN##L_R: \
>  return long_double_type_node;
>
> +  if (flag_no_opt_builtin)
> +return NULL_TREE;
> +
>switch (fn)
>  {
>  SEQ_OF_CASE_MATHFN
> diff --git a/gcc/common.opt b/gcc/common.opt
> index eeba1a727f2..d6111cc776a 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2142,6 +2142,10 @@ fomit-frame-pointer
>  Common Var(flag_omit_frame_pointer) Optimization
>  When possible do not generate stack frames.
>
> +fopt-builtin
> +Common Var(flag_no_opt_builtin, 0) Optimization
> +Match code sequences equivalent to builtin functions
> +
>  fopt-info
>  Common Var(flag_opt_info) Optimization
>  Enable all optimization info dumps on stderr.
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 22dd6417d19..5b82b9409c0 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2790,6 +2790,9 @@ gimple_builtin_call_types_compatible_p (const gimple 
> *stmt, tree fndecl)
>  {
>gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN);
>
> +  if (flag_no_opt_builtin)
> +return false;
> +
>tree ret = gimple_call_lhs (stmt);
>if (ret
>&& !useless_type_conversion_p (TREE_TYPE (ret),
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 583c01a42d8..43f22a3c7ce 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -1859,6 +1859,7 @@ loop_distribution::classify_partition (loop_p loop,
>
>/* Perform general partition disqualification for builtins.  */
>if

Re: -Wuninitialized false positives and threading knobs

2021-11-02 Thread Richard Biener via Gcc
On Mon, Nov 1, 2021 at 4:18 PM Jeff Law via Gcc  wrote:
>
>
>
> On 10/31/2021 6:12 AM, Aldy Hernandez wrote:
> > After Jeff's explanation of the symbiosis between jump threading and
> > the uninit pass, I'm beginning to see that (almost) every
> > Wuninitialized warning is cause for reflection.  It usually hides a
> > missing jump thread.  I investigated one such false positive
> > (uninit-pred-7_a.c) and indeed, there's a missing thread.  The
> > question is what to do about it.
> >
> > This seemingly simple test is now regressing as can be seen by the
> > xfail I added.
> This looks amazingly familiar.  You might want to look at this old thread:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2017-May/474229.html
>
>
> What happened was that threading did a better job, but in the process
> the shape of the CFG changed in ways that made it harder for the
> predicate analysis pass to prune paths.  Richi & I never reached any
> kind of conclusion on that patch, so it's never been applied.

Now there's also rangers relation oracle (not sure if that's even moderately
powerful enough to cobble up predicates of two points in the CFG and
relate them)
and Martin(?) has split out the predicate analysis bits from uninit analysis.

My stance is still that the machinery needs generalization.

> Remember, that the whole point behind the predicate analysis pass is to
> deal with infeasible paths that may be the in the CFG, including cases
> where the threaders may have found a jump thread, but not optimized it
> due to code size considerations.
>
> So one of the first things I'd do is look at the dumps prior to your
> changes and see if the uninitialized use was still in the IL in
> the.uninit dump, but was analyzed as properly guarded by predicate analysis.
>
> >
> > What happens is that we now thread far more than before, causing the
> > distance from definition to use to expand.  The threading candidate
> > that would make the Wuninitialized go away is there, and the backward
> > threader can see it, but it refuses to thread it because the number of
> > statements would be too large.
> Right.
>
> >
> > This is interesting because it means threading is causing larger IL
> > that in turn keeps us from threading some unreachable paths later on
> > because the paths are too large.
> Yes.  This is not unexpected.  Jump threading reduces dynamic
> conditional jumps and statements executed, but often at the expense of
> increasing code size, much like PRE.  Jump threading also can create
> scenarios that can't be handled by the predicate analysis pass.
>
> The other thing to review is whether or not you're accounting for
> statements that are going to be removed as a result of jump threading.
> I had Alex implement that a few years back for the forward threader.
> Essentially statements which exist merely to compute the conditional we
> thread are going to be removed and we need not worry about the cost of
> copying them which allowed us to thread many cases we had missed before
> without increasing codesize.

Yeah, and code size is important so simply upping the limit isn't the way
to go since there's usually zero chance of a reverse transform later.

> Anyway, those are the research areas to look at first, then we'll figure
> out what the next steps are.
>
> JEff
>


Re: libgfortran.so SONAME and powerpc64le-linux ABI changes (work in progress patches)

2021-11-02 Thread Michael Meissner via Gcc
On Mon, Nov 01, 2021 at 10:56:33AM -0500, Bill Schmidt wrote:
> Would starting from Advance Toolchain 15 with the most recent glibc make 
> things easier for Thomas to test?

The problem is gcc135 runs Centos 7.x which is not compatible with AT 13-15.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] Add -fopt-builtin optimization option

2021-11-02 Thread Keith Packard via Gcc
Richard Biener  writes:

> I don't think it reliably works the way you implement it.  It's also having
> more side-effects than what you document, in particular

Yeah, I made a 'minimal' patch that had the effect I needed, but it's
clearly in the wrong place as it disables the matching of builtins
against the incoming source code instead of the generation of new
builtin references from the tree.

> I think you may want to instead change builtin_decl_implicit
> to avoid code-generating a specific builtin.

Yup, I looked at that and there are numerous places which assume that
will work, so it will be a more complicated patch.

> Generally we'd also want sth like the clang attribute and _not_
> use optimize("") for this or a global flag_*, so the behavior can
> be more readily encoded in the IL.  In fact a flag on the call
> statement could be added to denote the desired effect on it.

Agreed, using the existing optimize attribute was a short-cut to
leverage the existing code handling that case. If we think providing
something that matches the clang attribute would be useful, it makes
sense to provide it using the same syntax.

> I also don't see the advantage compared to -fno-builtin[-foo].
> Declaring the function should be something that's already done.

The semantic of the clang option is not to completely disable access
to the given builtin function, but rather to stop the optimizer from
creating new builtin function references (either to a specific builtin,
or to all builtins).

If I could use "no-builtin" in a function attribute, I probably wouldn't
have bothered looking to implement the clang semantics, but -fno-builtin
isn't supported in this way. But, now that I think I understand the
behavior of attribute((no_builtin)) in clang, I think it has value
beyond what -fno-builtin performs as you can still gain access to
builtin functions when they are directly named.

I'll go implement changes in builtin_decl_implicit and all of the
affected call sites and see what that looks like.

Thanks much for your review!

-- 
-keith


signature.asc
Description: PGP signature


[PATCH] Add 'no_builtin' function attribute

2021-11-02 Thread Keith Packard via Gcc
This attribute controls optimizations which make assumptions about the
semantics of builtin functions. Typical cases here are code which
match memcpy, calloc, sincos, or which call builtins like free.

This extends on things like the -ftree-loop-distribute-patterns
flag. That flag only covers converting loops into builtin calls, not
numerous other places where knowledge of builtin function semantics
changes the generated code.

The goal is to allow built-in functions to be declared by the compiler
and used directly by the application, but to disable optimizations
which take advantage of compiler knowledge about their semantics, and
to allow this optimization behavior to be changed for individual
functions.

One place where this behavior is especially useful is when compiling
the builtin functions that gcc knows about, as in the C
library. Currently, C library source code and build systems have
various kludges to work around the compilers operations in these
areas, using a combination of -fno-tree-loop-distribute-patterns,
-fno-builtins and even symbol aliases to keep GCC from generating
infinite recursions.

This can be applied globally to a file using the -fno-optimize-builtin
flag.

This disables optimizations which translate a sequence of builtin calls
into an equivalent sequence:

void
attribute((no_builtin))
sincos(double x, double *s, double *c)
{
*s = sin(x);
*c = cos(x);
}

This also avoids converting loops into builtin calls like this:

void *
attribute((no_builtin))
memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n)
{
char *d = dst;
const char *s = src;

while (n--)
*d++ = *s++;
return dst;
}

As well as disabling analysis of memory lifetimes around free as in
this example:

void *
attribute((no_builtin))
erase_and_free(void *ptr)
{
memset(ptr, '\0', malloc_usable_size(ptr));
free(ptr);
}

It also prevents converting builtin calls into inline code:

void
attribute((no_builtin))
copy_fixed(char *dest)
{
strcpy(dest, "hello world");
}

Clang has a more sophisticated version of this mechanism which
can disable specific builtins:

double
attribute((no_builtin("exp2")))
exp2(double x)
{
return pow (2.0, x);
}

The general approach in this change is to introduce checks in some
places where builtin functions are used to see if the specific
function is 'allowed' to be used for optimization, skipping the
optimization when the desired function has been disabled.

Three new functions, builtin_decl_implicit_opt_p,
builtin_decl_explicit_opt and builtin_decl_implicit_opt are introduced
which add checks for whether the compiler can assume standard
semantics for the specified function for purposes of
optimization. These are used throughout the compiler wherever
appropriate. Code which must use builtins for correct operation
(e.g. struct assignment) are not affected.

The machinery proposed here could be extended to support the
additional clang feature by extending the attribute parsing function
and creating a list of disabled builtins checked by the builtin_decl
functions described above.

Signed-off-by: Keith Packard 
---
 gcc/builtins.c   | 12 +++---
 gcc/c-family/c-attribs.c | 68 ++
 gcc/common.opt   |  4 ++
 gcc/gimple-fold.c| 72 ++--
 gcc/gimple-match-head.c  |  2 +-
 gcc/tree-loop-distribution.c |  7 
 gcc/tree-ssa-alias.c |  3 +-
 gcc/tree-ssa-strlen.c| 48 ++--
 gcc/tree-ssa-structalias.c   |  3 +-
 gcc/tree.h   | 39 +++
 10 files changed, 194 insertions(+), 64 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 7d0f61fc98b..d665ee716e8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2061,7 +2061,7 @@ mathfn_built_in_1 (tree type, combined_fn fn, bool 
implicit_p)
   if (fcode2 == END_BUILTINS)
 return NULL_TREE;
 
-  if (implicit_p && !builtin_decl_implicit_p (fcode2))
+  if (implicit_p && !builtin_decl_implicit_opt_p (fcode2))
 return NULL_TREE;
 
   return builtin_decl_explicit (fcode2);
@@ -3481,9 +3481,9 @@ expand_builtin_stpcpy_1 (tree exp, rtx target, 
machine_mode mode)
   src = CALL_EXPR_ARG (exp, 1);
 
   /* If return value is ignored, transform stpcpy into strcpy.  */
-  if (target == const0_rtx && builtin_decl_implicit (BUILT_IN_STRCPY))
+  if (target == const0_rtx && builtin_decl_implicit_opt (BUILT_IN_STRCPY))
 {
-  tree fn = builtin_decl_implicit (BUILT_IN_STRCPY);
+  tree fn = builtin_decl_implicit_opt (BUILT_IN_STRCPY);
   tree result = build_call_nofold_loc (loc, fn, 2, dst