from:"Alexandre Oliva"

[PATCH] return edge in make_eh_edges

2023-10-19 Thread Alexandre Oliva



The need to initialize edge probabilities has made make_eh_edges
undesirably hard to use.  I suppose we don't want make_eh_edges to
initialize the probability of the newly-added edge itself, so that the
caller takes care of it, but identifying the added edge in need of
adjustments is inefficient and cumbersome.  Change make_eh_edges so
that it returns the added edge.

Regstrapped on x86_64-linux-gnu, and (along with various hardening
patches) on ppc64el-linux-gnu.  Also tested on multiple other targets,
on older versions of GCC.  The returned value is unused in code already
in the compiler.  This is a preparatory patch for uses to be introduced
along with stack scrubbing and control flow redundancy.  Ok to install?


for  gcc/ChangeLog

* tree-eh.cc (make_eh_edges): Return the new edge.
* tree-eh.h (make_eh_edges): Likewise.
---
 gcc/tree-eh.cc |6 +++---
 gcc/tree-eh.h  |2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
index e8ceff36cc6e7..1cb8e08652909 100644
--- a/gcc/tree-eh.cc
+++ b/gcc/tree-eh.cc
@@ -2274,7 +2274,7 @@ make_eh_dispatch_edges (geh_dispatch *stmt)
 /* Create the single EH edge from STMT to its nearest landing pad,
if there is such a landing pad within the current function.  */
 
-void
+edge
 make_eh_edges (gimple *stmt)
 {
   basic_block src, dst;
@@ -2283,14 +2283,14 @@ make_eh_edges (gimple *stmt)
 
   lp_nr = lookup_stmt_eh_lp (stmt);
   if (lp_nr <= 0)
-return;
+return NULL;
 
   lp = get_eh_landing_pad_from_number (lp_nr);
   gcc_assert (lp != NULL);
 
   src = gimple_bb (stmt);
   dst = label_to_block (cfun, lp->post_landing_pad);
-  make_edge (src, dst, EDGE_EH);
+  return make_edge (src, dst, EDGE_EH);
 }
 
 /* Do the work in redirecting EDGE_IN to NEW_BB within the EH region tree;
diff --git a/gcc/tree-eh.h b/gcc/tree-eh.h
index 771be50fe9a1d..1382568b7c919 100644
--- a/gcc/tree-eh.h
+++ b/gcc/tree-eh.h
@@ -30,7 +30,7 @@ extern bool remove_stmt_from_eh_lp (gimple *);
 extern int lookup_stmt_eh_lp_fn (struct function *, const gimple *);
 extern int lookup_stmt_eh_lp (const gimple *);
 extern bool make_eh_dispatch_edges (geh_dispatch *);
-extern void make_eh_edges (gimple *);
+extern edge make_eh_edges (gimple *);
 extern edge redirect_eh_edge (edge, basic_block);
 extern void redirect_eh_dispatch_edge (geh_dispatch *, edge, basic_block);
 extern bool operation_could_trap_helper_p (enum tree_code, bool, bool, bool,


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [RFC][debug] Add -fadd-debug-nops

2018-07-16 Thread Alexandre Oliva

On Jul 16, 2018, Tom de Vries  wrote:

> On 07/16/2018 03:34 PM, Jakub Jelinek wrote:
>> So is this essentially a workaround for GDB not supporting the statement
>> frontiers?

> AFAIU now, the concept of location views addresses this problem, so yes.

Nice!  A preview for what can be obtained with LVu support in the
debugger!

> Right, but in the mean time I don't mind having an option that lets me
> filter out noise in guality test results.

FWIW, I'm a bit concerned about working around legitimate problems, as
in modifying testcases so that they pass.  This hides actual problems,
that we'd like to fix eventually by adjusting the compiler, not the
testcases.

That said, thank you for the attention you've given to the guality
testsuite recently.  It's appreciated.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

[PATCH] Introduce instance discriminators

2018-07-17 Thread Alexandre Oliva

This patch is a rewrite of an earlier patch submitted at
https://gcc.gnu.org/ml/gcc-patches/2012-11/msg02340.html

With -gnateS, the Ada compiler sets itself up to output discriminators
for different instantiations of generics, but the middle and back ends
have lacked support for that.  This patch introduces the missing bits,
translating the GNAT-internal representation of the instance map to an
instance_table that maps ordinary line-map indices to instance
discriminators.

Instance discriminators are not compatible with LTO, in that the
instance mapping is not preserved in LTO dumps.  There are no plans to
preserve discriminators in them.

This patch (minus whitespace changes and tests) was regstrapped on
x86_64-linux-gnu.  The final form of the patch was tested with a
non-bootstrap build, and a single-test check-gnat run.  Ok to install?


From: Olivier Hainque 
for  libcpp/ChangeLog

* include/line-map.h (ORDINARY_MAP_INDEX): New.

for  gcc/ChangeLog

* einput.c: New file.  Allow associating "line context"
extension data to instruction location info, for sets of
locations covered by an ordinary line_map structure.
* einput.h: Likewise.
* Makefile.in (OBJS): Add einput.o.
* input.c (expand_location_1): On request, provide pointer to the
line map that was used to resolve the input location.
(map_expand_location): New function.  Same as expand_location,
also providing the map from which the input location was resolved.
(expand_location, expand_location_to_spelling_point): Adjust calls
to expand_location_1.
(linemap_client_expand_location_to_spelling_point): Likewise.
* input.h (map_expand_location): Declare.
* emit-rtl.c (insn_location): Handle a location_lc* argument.
* rtl.h (insn_location): Adjust prototype.
* print-rtl.c (print_rtx): Adjust call to insn_location.
* modulo-sched.c (dump_insn_location): Likewise.
* tree-inline.c (copy_bb): Copy discriminator field as well.
* flag-types.h (loc_discriminator_type): New enum, allowing BB
or INSTANCE_ID discriminators.
* common.opt (loc_discriminator_kind): New variable, conveying the
kinf of discriminator we want to see emited with source locations.
* final.c (bb_discriminator, last_bb_discriminator): New statics,
to track basic block discriminators.
(final_start_function_1): Initialize them.
(final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track
bb_discriminator.
(notice_source_line): If INSN_HAS_LOCATION, update current
discriminator from BB or INSTANCE_ID depending on the kind we're
requested to convey.  When deciding to emit, account for both
possible kinds of discriminators.

for  gcc/ada

* trans.c (gigi): When requested so, allocate and populate
the gcc table controlling the emission of per-instance debug
    info.

From: Alexandre Oliva  , Olivier Hainque  

for  gcc/testsuite/ChangeLog

* gnat.dg/dinst.adb: New.
* gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New.
---
 gcc/Makefile.in |1 +
 gcc/ada/gcc-interface/trans.c   |   10 ++
 gcc/common.opt  |   12 
 gcc/einput.c|   55 +++
 gcc/einput.h|   50 
 gcc/emit-rtl.c  |   11 +--
 gcc/final.c |   29 +++---
 gcc/flag-types.h|   14 +
 gcc/input.c |   32 +---
 gcc/input.h |2 +
 gcc/modulo-sched.c  |2 +
 gcc/print-rtl.c |2 +
 gcc/rtl.h   |3 +-
 gcc/testsuite/gnat.dg/dinst.adb |   20 +
 gcc/testsuite/gnat.dg/dinst_pkg.adb |7 
 gcc/testsuite/gnat.dg/dinst_pkg.ads |4 +++
 gcc/tree-inline.c   |2 +
 libcpp/include/line-map.h   |8 +
 18 files changed, 247 insertions(+), 17 deletions(-)
 create mode 100644 gcc/einput.c
 create mode 100644 gcc/einput.h
 create mode 100644 gcc/testsuite/gnat.dg/dinst.adb
 create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.adb
 create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.ads

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2a05a66ea9b87..f9a9fe8726b18 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1285,6 +1285,7 @@ OBJS = \
dwarf2cfi.o \
dwarf2out.o \
early-remat.o \
+   einput.o \
emit-rtl.o \
et-forest.o \
except.o \
diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 31e098a0c707a..3ad3f83fd60f5 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -45,6 +45,7 @@
 #include "tree-i

Re: [PATCH] Introduce instance discriminators

2018-07-19 Thread Alexandre Oliva

On Jul 18, 2018, Richard Biener  wrote:

> On Wed, Jul 18, 2018 at 8:53 AM Alexandre Oliva  wrote:
>> Instance discriminators are not compatible with LTO, in that the
>> instance mapping is not preserved in LTO dumps.  There are no plans to
>> preserve discriminators in them.

> Because...

... it follows existing practice.  BB discriminators are not saved for
LTO either.  They could be saved along with the CFG, but AFAICT they
aren't.  As for instance discriminators, we might be able to save them
along with LOC information, but that would be quite wasteful, and
because of the way ordinary maps are reconstructed when reading in the
LTO data, we'd end up with yet another internal representation for
line_maps.  I was told there was no interest from our customers in using
the converage and monitoring aspects of instance discriminators when
performing link-time optimizations, and thus that it made sense to
follow existing practice.

I suspect there might be a way to assign instance discriminator numbers
to individual function DECLs, and then walk up the lexical block tree to
identify the DECL containing the block so as to obtain the discriminator
number.  This would be a lot less efficient, algorithmically speaking,
but, provided that LTO dumps discriminator numbers as part of the decls,
and enough info to reconstruct the lexical block trees, including the
inlined-function enclosing blocks, that should work even for LTO, at
least as long as different decls are maintained for each instance.

Indeed, if this is the case, code ranges of lexical blocks in inlined
subroutines would suffice to identify each instantiation, without the
need for discriminators.  It would be a lot more expensive to gather the
information from that debug info than from discriminators, though.

All this said, there doesn't seem to be much interest in that from Ada
users to justify by itself the pursuit of yet another internal
representation.  I wonder if this sort of discriminator might be of
interest for users of C++ templates as well.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [PATCH] Introduce instance discriminators

2018-07-24 Thread Alexandre Oliva

On Jul 19, 2018, Richard Biener  wrote:

> Oh, that probably wasn't omitted on purpose.  Cary said it was used
> for profiling but I can't see any such use.

> Is the instance discriminator stuff also used for profiling?

Not that I know, but...  I probably wouldn't know yet ;-)

Anyway, it was easy enough to implement this:

>> I suspect there might be a way to assign instance discriminator numbers
>> to individual function DECLs, and then walk up the lexical block tree to
>> identify the DECL containing the block so as to obtain the discriminator
>> number.

and then, in a subsequent patch, I went ahead and added support for LTO,
saving and recovering discriminator info for instances and, while at
that, for basic blocks too.

Besides sucessfully regstrapping the first two patches on
x86_64-linux-gnu, I have tested this patchset with an additional
bootstrap with the third (throw-away) patch, that adds -gnateS to Ada
compilations in gcc/ada, libada and gnattools.  I also tested the saving
and restoring of discriminators for LTO by manually inspecting the line
number tables in LTO-recompiled executables, to check that they retained
the instance or BB discriminator numbers that went into the non-LTO
object files.

Ok to install the first two patches?  (the third is just for reference)

Introduce instance discriminators

From: Alexandre Oliva 

With -gnateS, the Ada compiler sets itself up to output discriminators
for different instantiations of generics, but the middle and back ends
have lacked support for that.  This patch introduces the missing bits,
translating the GNAT-internal representation of the per-file instance
map to an instance_table that maps decls to instance discriminators.

From: Alexandre Oliva  , Olivier Hainque  

for  gcc/ChangeLog

* debug.h (decl_to_instance_map_t): New type.
(decl_to_instance_map): Declare.
(maybe_create_decl_to_instance_map): New inline function.
* final.c (bb_discriminator, last_bb_discriminator): New statics,
to track basic block discriminators.
(final_start_function_1): Initialize them.
(final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track
bb_discriminator.
(decl_to_instance_map): New variable.
(map_decl_to_instance, maybe_set_discriminator): New functions.
(notice_source_line): Set discriminator.

for  gcc/ada

* trans.c: Include debug.h.
(file_map): New static variable.
(gigi): Set it.  Create decl_to_instance_map when needed.
(Subprogram_Body_to_gnu): Pass gnu_subprog_decl to...
(Sloc_to_locus): ... this.  Add decl parm, map it to instance.
* gigi.h (Sloc_to_locus): Adjust declaration.

for  gcc/testsuite/ChangeLog

* gnat.dg/dinst.adb: New.
* gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New.
---
 gcc/ada/gcc-interface/gigi.h|2 +
 gcc/ada/gcc-interface/trans.c   |   29 ---
 gcc/debug.h |   15 
 gcc/final.c |   70 +--
 gcc/testsuite/gnat.dg/dinst.adb |   20 ++
 gcc/testsuite/gnat.dg/dinst_pkg.adb |7 
 gcc/testsuite/gnat.dg/dinst_pkg.ads |4 ++
 7 files changed, 137 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/dinst.adb
 create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.adb
 create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.ads

diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h
index a75cb9094491..b890195cefc3 100644
--- a/gcc/ada/gcc-interface/gigi.h
+++ b/gcc/ada/gcc-interface/gigi.h
@@ -285,7 +285,7 @@ extern void process_type (Entity_Id gnat_entity);
location and false if it doesn't.  If CLEAR_COLUMN is true, set the column
information to 0.  */
 extern bool Sloc_to_locus (Source_Ptr Sloc, location_t *locus,
-  bool clear_column = false);
+  bool clear_column = false, const_tree decl = 0);

 /* Post an error message.  MSG is the error message, properly annotated.
NODE is the node at which to post the error and the node to use for the
diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 31e098a0c707..0371d00fce18 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -41,6 +41,7 @@
 #include "stmt.h"
 #include "varasm.h"
 #include "output.h"
+#include "debug.h"
 #include "libfuncs.h"  /* For set_stack_check_libfunc.  */
 #include "tree-iterator.h"
 #include "gimplify.h"
@@ -255,6 +256,12 @@ static tree create_init_temporary (const char *, tree, 
tree *, Node_Id);
 static const char *extract_encoding (const char *) ATTRIBUTE_UNUSED;
 static const char *decode_name (const char *) ATTRIBUTE_UNUSED;

+/* This makes gigi's file_info_ptr visible in this translation unit,
+   so that

Re: [RFC 1/3, debug] Add fdebug-nops

2018-07-24 Thread Alexandre Oliva

On Jul 24, 2018, Tom de Vries  wrote:

> There's a design principle in GCC that code generation and debug generation
> are independent.  This guarantees that if you're encountering a problem in an
> application without debug info, you can recompile it with -g and be certain
> that you can reproduce the same problem, and use the debug info to debug the
> problem.  This invariant is enforced by bootstrap-debug.  The fdebug-nops
> breaks this invariant

I thought of a way to not break it: enable the debug info generation
machinery, including VTA and SFN, but discard those only at the very end
if -g is not enabled.  The downside is that it would likely slow -Og
down significantly, but who uses it without -g anyway?

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Alexandre Oliva

On Jul 24, 2018, Tom de Vries  wrote:

> This patch adds fake uses of user variables at the point where they go out of
> scope, to keep user variables inspectable throughout the application.

I suggest also adding such uses before sets, so that variables aren't
regarded as dead and get optimized out in ranges between the end of a
live range and a subsequent assignment.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-24 Thread Alexandre Oliva

Hello, Christina,

On Jul 24, 2018, Tamar Christina  wrote:

> gcc/
> 2018-07-24  Tamar Christina  

>   PR target/86486
>   * configure.ac: Add stack-clash-protection-guard-size.
>   * doc/install.texi: Document it.
>   * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
>   * params.def: Update comment for guard-size.
>   * configure: Regenerate.

The configury bits look almost good to me.

I wish the help message, comments and docs expressed somehow that the
given power of two expresses a size in bytes, rather than in kilobytes,
bits or any other unit that might be reasonably assumed to express stack
sizes.  I'm afraid I don't know the best way to accomplish that in a few
words.

> +stk_clash_default=12

This seems to be left-over from an earlier patch, as it is now unused
AFAICT.

Thanks,

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-26 Thread Alexandre Oliva

On Jul 25, 2018, Tamar Christina  wrote:

> gcc/
> 2018-07-25  Tamar Christina  

>   PR target/86486
>   * configure.ac: Add stack-clash-protection-guard-size.
>   * doc/install.texi: Document it.
>   * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
>   * params.def: Update comment for guard-size.
>   (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
>   PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Update description.
>   * configure: Regenerate.

Thanks.  No objections from me.  I don't see any use of the new config
knob, though; assuming it's in a subsequent patch, I guess this one is
fine, but I'm not sure I'm entitled to approve it.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-26 Thread Alexandre Oliva

On Jul 25, 2018, Jakub Jelinek  wrote:

> On Tue, Jul 24, 2018 at 04:11:11PM -0300, Alexandre Oliva wrote:
>> On Jul 24, 2018, Tom de Vries  wrote:
>> 
>> > This patch adds fake uses of user variables at the point where they go out 
>> > of
>> > scope, to keep user variables inspectable throughout the application.
>> 
>> I suggest also adding such uses before sets, so that variables aren't
>> regarded as dead and get optimized out in ranges between the end of a
>> live range and a subsequent assignment.

> But that can be done incrementally, right

Sure

> the optimizers could still move it appart

*nod*

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [RFC 1/3, debug] Add fdebug-nops

2018-07-26 Thread Alexandre Oliva

On Jul 24, 2018, Tom de Vries  wrote:

>> I thought of a way to not break it: enable the debug info generation
>> machinery, including VTA and SFN, but discard those only at the very end
>> if -g is not enabled.  The downside is that it would likely slow -Og
>> down significantly, but who uses it without -g anyway?

> I thought of the same.  I've submitted a patch here that uses SFN:
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01391.html .

Nice!

> VTA is not needed AFAIU.

Yes, indeed.  It could avoid inserting some nops, if you were to refrain
from emitting them if there aren't any binds between neighbor SFNs, but
I like it better your way: it's even more like SFN support in the
debugger :-)

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

Re: [PATCH] Introduce instance discriminators

2018-07-30 Thread Alexandre Oliva

On Jul 24, 2018, Alexandre Oliva  wrote:

> Ok to install the first two patches?  (the third is just for reference)

Ping?

https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01419.html


> Introduce instance discriminators

> From: Alexandre Oliva 

> With -gnateS, the Ada compiler sets itself up to output discriminators
> for different instantiations of generics, but the middle and back ends
> have lacked support for that.  This patch introduces the missing bits,
> translating the GNAT-internal representation of the per-file instance
> map to an instance_table that maps decls to instance discriminators.


> From: Alexandre Oliva  , Olivier Hainque  
> 
> for  gcc/ChangeLog

>   * debug.h (decl_to_instance_map_t): New type.
>   (decl_to_instance_map): Declare.
>   (maybe_create_decl_to_instance_map): New inline function.
>   * final.c (bb_discriminator, last_bb_discriminator): New statics,
>   to track basic block discriminators.
>   (final_start_function_1): Initialize them.
>   (final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track
>   bb_discriminator.
>   (decl_to_instance_map): New variable.
>   (map_decl_to_instance, maybe_set_discriminator): New functions.
>   (notice_source_line): Set discriminator.

> for  gcc/ada

>   * trans.c: Include debug.h.
>   (file_map): New static variable.
>   (gigi): Set it.  Create decl_to_instance_map when needed.
>   (Subprogram_Body_to_gnu): Pass gnu_subprog_decl to...
>   (Sloc_to_locus): ... this.  Add decl parm, map it to instance.
>   * gigi.h (Sloc_to_locus): Adjust declaration.

> for  gcc/testsuite/ChangeLog

>   * gnat.dg/dinst.adb: New.
>   * gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New.

[...]

> Save discriminator info for LTO

> From: Alexandre Oliva 

> for  gcc/ChangeLog

>   * gimple-streamer-in.c (input_bb): Restore BB discriminator.
>   * gimple-streamer-out.c (output_bb): Save it.
>   * lto-streamer-in.c (input_struct_function_base): Restore
>   instance discriminator if available.  Create map on demand.
>   * lto-streamer-out.c (output_struct_function_base): Save it if
>   available.
>   * final.c (decl_to_instance_map): Document LTO strategy.


-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist

[PATCH] testsuite: check for and use -mno-strict-align where needed

2023-10-19 Thread Alexandre Oliva

On Mar 10, 2021, Alexandre Oliva  wrote:

> ppc configurations that have -mstrict-align enabled by default fail
> gcc.dg/strlenopt-80.c, because some memcpy calls don't get turned into
> MEM_REFs, which defeats the tested-for strlen optimization.

I've combined this patch with other patches that added -mno-strict-align
to tests that needed it on targets configured with -mstrict-align
enabled by default, and conditioned the use of the flag to targets that
support it.

Regstrapped on x86_64-linux-gnu, ppc64le-linux-gnu, also tested on a
ppc-vx7r2 configured with -mstrict-align.  Ok to install?



Various tests fail on powerpc if the toolchain is configured to enable
-mstrict-align by default.  This patch introduces -mno-strict-align on
tests found to fail that way, when the target supports this option.

I suppose !non_strict_align could be used to skip tests, instead of or
in addition to this tweak, and that might be desirable if they still
fail on targets that do no support -mno-strict-align, but I haven't
observed such scenarios.


The p9-vec-length tests expect vectorization on loop bodies and
epilogues that reference arrays that are not known to be more aligned
than their small element types.

Though VSX vectors work best with 32- or 64-bit alignment, unaligned
vector loads and stores are expected by the tests.  However, with
-mstrict-align by default, vector loads and stores not known to be
aligned end up open coded, which doesn't match the asm output
expectations coded in the tests.


for  gcc/ChangeLog

* doc/sourcebuild.texi (opt_mstrict_align): New target.

for  gcc/testsuite/ChangeLog

* lib/target-supports.exp
(check_effective_target_opt_mstrict_align): New.
* gcc.dg/strlenopt-80.c: Add -mno-strict-align if supported.
* gcc.target/powerpc/prefix-ds-dq.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-8.c: Likewise.
---
 gcc/doc/sourcebuild.texi   |3 +++
 gcc/testsuite/gcc.dg/strlenopt-80.c|4 
 .../gcc.target/powerpc/p9-vec-length-epil-1.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-2.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-3.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-4.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-5.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-6.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-7.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-8.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-1.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-2.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-3.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-4.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-5.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-6.c  |2 ++
 .../gcc.target/powerpc/p9-ve

Re: [PATCH] return edge in make_eh_edges

2023-10-19 Thread Alexandre Oliva

On Oct 19, 2023, Richard Biener  wrote:

> OK.  Maybe time to do s/make_eh_edges/make_eh_edge/ though.

Thanks, will do, ideally on top of the already-tested refreshed patches
that I'm going to post shortly.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Alexandre Oliva

On Oct 19, 2023, Thomas Schwinge  wrote:

> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
>> I guess I can ask, why there is not a recursive approach for configuring
>> GCC. e.g. AC_SUBDIRS in the top level?

> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
> ask the right questions...  ;-)

> What do people think about the attached
> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.

Interesting idea!

It is a little hackish, in that it seems to exploit an implementation
detail in AC_CONFIG_SUBDIRS rather than a documented feature.

I like it!

The autoconf documentation suggests that optional directories can be
tested for:

  if test -d "$srcdir/foo"; then
AC_CONFIG_SUBDIRS([foo])
  fi

We could use a macro that takes a list and iterates over the list (untested):

dnl Handle a list of optional subdirs.
dnl After AC_OUTPUT, affects autoreconf runs, but not configure runs.
AC_DEFUN([AC_CONFIG_SUBDIRS_OPT], [
  m4_foreach_w([dir], [$1], [
if test -d "$srcdir/dir"; then
  AC_CONFIG_SUBDIRS(dir)
fi
  ])
])

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH v4] Introduce hardbool attribute for C

2023-10-19 Thread Alexandre Oliva

g/torture/hardbool-ll-5a.c
new file mode 100644
index 0..14438c5104f07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-options "-w" } */
+
+#define falseval 0x781ee187a53cc35all
+
+#include "hardbool-ll.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-ll.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-ll.c
new file mode 100644
index 0..d4d498c6f2af1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-ll.c
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+
+#define basetype long long
+
+#include "hardbool.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c
new file mode 100644
index 0..e38a56b5deb05
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-options "-w" } */
+
+#define falseval 0x5aa5
+
+#include "hardbool-s.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-s.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-s.c
new file mode 100644
index 0..942300be2072a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-s.c
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+
+#define basetype short
+
+#include "hardbool.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c
new file mode 100644
index 0..7beec578ff89c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-options "-w" } */
+
+#define falseval 0xa53cc35a
+
+#include "hardbool-ul.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-ul.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-ul.c
new file mode 100644
index 0..841c1d4bc2ec8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-ul.c
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+
+#define basetype unsigned long
+
+#include "hardbool.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c
new file mode 100644
index 0..5bfc922795d3d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-options "-w" } */
+
+#define falseval 0xa55a
+
+#include "hardbool-us.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-us.c 
b/gcc/testsuite/gcc.dg/torture/hardbool-us.c
new file mode 100644
index 0..e9feec681c41e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool-us.c
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+
+#define basetype unsigned short
+
+#include "hardbool.c"
diff --git a/gcc/testsuite/gcc.dg/torture/hardbool.c 
b/gcc/testsuite/gcc.dg/torture/hardbool.c
new file mode 100644
index 0..01684952a2a9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/hardbool.c
@@ -0,0 +1,118 @@
+/* { dg-do run } */
+
+#include 
+
+#ifndef basetype
+#define basetype char
+#endif
+
+#ifndef falseval
+#define falseval 0
+#endif
+
+#ifndef trueval
+#define trueval ~falseval
+#endif
+
+/* hardbool may be #defined so as to drop parms in other tests.  */
+typedef basetype __attribute__ ((hardbool (falseval, trueval))) hbool;
+
+typedef unsigned char __attribute__ ((__hardbool__ (1, 0))) zbool;
+
+struct hs {
+  hbool a[2];
+  hbool x:2;
+  hbool y:5;
+  zbool z:1;
+};
+
+hbool var = 0;
+
+struct hs x = { { 1, 0 }, 2, 0, 2 };
+
+int f(hbool v) {
+  return !v;
+}
+
+int g(int i) {
+  return f(i);
+}
+
+hbool h(hbool x) {
+  return x;
+}
+
+hbool h2(hbool x) {
+  return h(x);
+}
+
+int hsx(struct hs v) {
+  return v.x;
+}
+
+int ghs(hbool s) {
+  struct hs v = { {s, !s}, s, !s, s };
+  return hsx (v);
+}
+
+int t = (hbool)2;
+
+void check_pfalse (hbool *p)
+{
+  assert (!*p);
+  assert (*(basetype*)p == (basetype)falseval);
+  assert (!(int)(hbool)*p);
+}
+
+void check_ptrue (hbool *p)
+{
+  assert (*p);
+  assert (*(basetype*)p == (basetype)trueval);
+  assert ((int)(hbool)*p);
+}
+
+void check_vfalse (hbool v)
+{
+  check_pfalse (&v);
+}
+
+void check_vtrue (hbool v)
+{
+  check_ptrue (&v);
+}
+
+int main () {
+  check_pfalse (&var);
+  var = !(int)(hbool)(_Bool)var;
+  check_ptrue (&var);
+  var = (zbool)var;
+  check_ptrue (&var);
+
+  check_ptrue (&x.a[0]);
+  check_pfalse (&x.a[1]);
+  check_vtrue (x.x);
+  check_vfalse (x.y);
+  check_vtrue (x.z);
+
+  check_vtrue (t);
+
+  check_vtrue (var && t);
+  check_vfalse (!var || x.y);
+
+  check_vfalse (f (2));
+  check_vfalse (f (1));
+  check_vtrue (f (0));
+
+  check_vfalse (g (2));
+  check_vfalse (g (1));
+  check_vtrue (g (0));
+
+  check_vtrue (h (2));
+  check_vtrue (h (1));
+  check_vfalse (h (0));
+
+  check_vtrue (h2 (2));
+  check_vtrue (h2 (1));
+  check_vfalse (h2 (0));
+}
+


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] rename make_eh_edges to make_eh_edge (was: return edge in make_eh_edges)

2023-10-19 Thread Alexandre Oliva

);
+extern edge make_eh_edge (gimple *);
 extern edge redirect_eh_edge (edge, basic_block);
 extern void redirect_eh_dispatch_edge (geh_dispatch *, edge, basic_block);
 extern bool operation_could_trap_helper_p (enum tree_code, bool, bool, bool,
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index d63060c9429c4..867f3b22b589a 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -2615,7 +2615,7 @@ copy_edges_for_bb (basic_block bb, profile_count num, 
profile_count den,
}
   else if (can_throw)
{
- make_eh_edges (copy_stmt);
+ make_eh_edge (copy_stmt);
  update_probs = true;
}
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH v3] Control flow redundancy hardening

2023-10-20 Thread Alexandre Oliva

-cfr-always.c: New.
* c-c++-common/torture/harden-cfr-never.c: New.
* c-c++-common/torture/harden-cfr-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-always.c: New.
* c-c++-common/torture/harden-cfr-bret-never.c: New.
* c-c++-common/torture/harden-cfr-bret-noopt.c: New.
* c-c++-common/torture/harden-cfr-bret-noret.c: New.
* c-c++-common/torture/harden-cfr-bret-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-bret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-retcl.c: New.
* c-c++-common/torture/harden-cfr-bret.c: New.
* g++.dg/harden-cfr-throw-always-O0.C: New.
* g++.dg/harden-cfr-throw-returning-O0.C: New.
* g++.dg/torture/harden-cfr-noret-always-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-never-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-always.C: New.
* g++.dg/torture/harden-cfr-throw-never.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow-expected.C: New.
* g++.dg/torture/harden-cfr-throw-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-nocleanup.C: New.
* g++.dg/torture/harden-cfr-throw-returning.C: New.
* g++.dg/torture/harden-cfr-throw.C: New.
* gcc.dg/torture/harden-cfr-noret-no-nothrow.c: New.
* gcc.dg/torture/harden-cfr-tail-ub.c: New.
* gnat.dg/hardcfr.adb: New.

for  libgcc/ChangeLog

* Makefile.in (LIB2ADD): Add hardcfr.c.
* hardcfr.c: New.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-10-20 Thread Alexandre Oliva

On Aug 29, 2023, FX Coudert  wrote:

>> I think a build machinery review is needed.
> Thanks. CC’ing the relevant maintainers for review of the build part.
> The driver part and the darwin-specific part are already okayed.

The build machinery bits look reasonable to me.  I have no specific
knowledge of darwin, but I expect darwin maintainers understand and
approve of the bits that are going into build machinery files.

Apologies for the delay.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice but
very few check the facts.  Think Assange & Stallman.  The empires strike back

Re: [PATCH RESEND] libatomic: drop redundant all-multi command

2023-10-20 Thread Alexandre Oliva

On Aug  1, 2023, Jan Beulich via Gcc-patches  wrote:

>   * Makefile.am (all-multi): Drop commands.
>   * Makefile.in: Update accordingly.

LGTM, thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice but
very few check the facts.  Think Assange & Stallman.  The empires strike back

Re: Remove stale Autoconf checks for Perl

2023-10-20 Thread Alexandre Oliva

On May 16, 2023, Thomas Schwinge  wrote:

> OK to push the attached "Remove stale Autoconf checks for Perl"?

LGTM, thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice but
very few check the facts.  Think Assange & Stallman.  The empires strike back

[PATCH] [PR111520] set hardcmp eh probs (was: rename make_eh_edges to make_eh_edge)

2023-10-21 Thread Alexandre Oliva

On Oct 20, 2023, Richard Biener  wrote:

>> * tree-eh.h (make_eh_edges): Rename to...
>> (make_eh_edge): ... this.
>> * tree-eh.cc: Likewise.  Adjust all callers.

Once the above goes in (it depends on the strub monster patch), the
following one should apply as well.  Regstrapped on x86_64-linux-gnu.
Ok to install?

Set execution count of EH blocks, and probability of EH edges.


for  gcc/ChangeLog

PR tree-optimization/111520
* gimple-harden-conditionals.cc
(pass_harden_compares::execute): Set EH edge probability and
EH block execution count.

for  gcc/testsuite/ChangeLog

PR tree-optimization/111520
* g++.dg/torture/harden-comp-pr111520.cc: New.
---
 gcc/gimple-harden-conditionals.cc  |   12 +++-
 .../g++.dg/torture/harden-comp-pr111520.cc |   17 +
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc

diff --git a/gcc/gimple-harden-conditionals.cc 
b/gcc/gimple-harden-conditionals.cc
index 1999e827a04ca..bded288985063 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -580,11 +580,21 @@ pass_harden_compares::execute (function *fun)
  if (throwing_compare_p)
{
  add_stmt_to_eh_lp (asgnck, lookup_stmt_eh_lp (asgn));
- make_eh_edge (asgnck);
+ edge eh = make_eh_edge (asgnck);
+ /* This compare looks like it could raise an exception,
+but it's dominated by the original compare, that
+would raise an exception first, so the EH edge from
+this one is never really taken.  */
+ eh->probability = profile_probability::never ();
+ if (eh->dest->count.initialized_p ())
+   eh->dest->count += eh->count ();
+ else
+   eh->dest->count = eh->count ();
 
  edge ckeh;
  basic_block nbb = split_edge (non_eh_succ_edge
(gimple_bb (asgnck), &ckeh));
+ gcc_checking_assert (eh == ckeh);
  gsi_split = gsi_start_bb (nbb);
 
  if (dump_file)
diff --git a/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc 
b/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc
new file mode 100644
index 0..b4381b4d84ec4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-fharden-compares -fsignaling-nans -fnon-call-exceptions" } */
+
+struct S
+{
+  S (bool);
+  ~S ();
+};
+
+float f;
+
+void
+foo ()
+{
+  S a = 0;
+  S b = f;
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] Ignore case of header line in dg-extract-results.py

2023-10-25 Thread Alexandre Oliva

On Oct 24, 2023, Paul Iannetta  wrote:

>   * dg-extract-results.py: Make the test_run regex case
> insensitive.

It looks reasonable to me, but I'm not sure this is a change I'm
entitled to approve.  Thanks!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing

2023-10-25 Thread Alexandre Oliva

Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.html

I'm combining the gcc/ipa-strub.cc bits from
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] rename make_eh_edges to make_eh_edge

2023-10-25 Thread Alexandre Oliva

On Oct 20, 2023, Richard Biener  wrote:

> OK.

Thanks.

>> * tree-eh.h (make_eh_edges): Rename to...
>> (make_eh_edge): ... this.
>> * tree-eh.cc: Likewise.  Adjust all callers.

Ugh, "Adjust all callers" is no longer enough, all files need to be
mentioned explicitly.  So I'm pushing it with:

* tree-eh.h (make_eh_edges): Rename to...
(make_eh_edge): ... this.
* tree-eh.cc: Likewise.  Adjust all callers...
* gimple-harden-conditionals.cc: ... here, ...
* gimple-harden-control-flow.cc: ... here, ...
* tree-cfg.cc: ... here, ...
* tree-inline.cc: ... and here.

The ipa-strub.cc is now part of the strub patch.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

hardcfr: support checking at abnormal edges [PR111943]

2023-10-26 Thread Alexandre Oliva



Control flow redundancy may choose abnormal edges for early checking,
but that breaks because we can't insert checks on such edges.

Introduce conditional checking on the dest block of abnormal edges,
and leave it for the optimizer to drop the conditional.

Also, oops, I noticed the new files went in with an incorrect copyright
notice, that this patch fixes.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/111943
* gimple-harden-control-flow.cc: Adjust copyright year.
(rt_bb_visited): Add vfalse and vtrue data members.
Zero-initialize them in the ctor.
(rt_bb_visited::insert_exit_check_on_edge): Upon encountering
abnormal edges, insert initializers for vfalse and vtrue on
entry, and insert the check sequence guarded by a conditional
in the dest block.

for  libgcc/ChangeLog

* hardcfr.c: Adjust copyright year.

for  gcc/testsuite/ChangeLog

PR tree-optimization/111943
* gcc.dg/harden-cfr-pr111943.c: New.
---
 gcc/gimple-harden-control-flow.cc  |   78 +++-
 gcc/testsuite/gcc.dg/harden-cfr-pr111943.c |   33 
 libgcc/hardcfr.c   |2 -
 3 files changed, 109 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/harden-cfr-pr111943.c

diff --git a/gcc/gimple-harden-control-flow.cc 
b/gcc/gimple-harden-control-flow.cc
index 3711b25d09123..77c140178060e 100644
--- a/gcc/gimple-harden-control-flow.cc
+++ b/gcc/gimple-harden-control-flow.cc
@@ -1,5 +1,5 @@
 /* Control flow redundancy hardening.
-   Copyright (C) 2022 Free Software Foundation, Inc.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
Contributed by Alexandre Oliva .
 
 This file is part of GCC.
@@ -460,6 +460,10 @@ class rt_bb_visited
  at the end of a block's predecessors or successors list.  */
   tree ckfail, ckpart, ckinv, ckblk;
 
+  /* If we need to deal with abnormal edges, we insert SSA_NAMEs for
+ boolean true and false.  */
+  tree vfalse, vtrue;
+
   /* Convert a block index N to a block vindex, the index used to
  identify it in the VISITED array.  Check that it's in range:
  neither ENTRY nor EXIT, but maybe one-past-the-end, to compute
@@ -596,7 +600,8 @@ public:
   /* Prepare to add control flow redundancy testing to CFUN.  */
   rt_bb_visited (int checkpoints)
 : nblocks (n_basic_blocks_for_fn (cfun)),
-  vword_type (NULL), ckseq (NULL), rtcfg (NULL)
+  vword_type (NULL), ckseq (NULL), rtcfg (NULL),
+  vfalse (NULL), vtrue (NULL)
   {
 /* If we've already added a declaration for the builtin checker,
extract vword_type and vword_bits from its declaration.  */
@@ -703,7 +708,74 @@ public:
   /* Insert SEQ on E.  */
   void insert_exit_check_on_edge (gimple_seq seq, edge e)
   {
-gsi_insert_seq_on_edge_immediate (e, seq);
+if (!(e->flags & EDGE_ABNORMAL))
+  {
+   gsi_insert_seq_on_edge_immediate (e, seq);
+   return;
+  }
+
+/* Initialize SSA boolean constants for use in abnormal PHIs.  */
+if (!vfalse)
+  {
+   vfalse = make_ssa_name (boolean_type_node);
+   vtrue = make_ssa_name (boolean_type_node);
+
+   gimple_seq vft_seq = NULL;
+   gassign *vfalse_init = gimple_build_assign (vfalse, boolean_false_node);
+   gimple_seq_add_stmt (&vft_seq, vfalse_init);
+   gassign *vtrue_init = gimple_build_assign (vtrue, boolean_true_node);
+   gimple_seq_add_stmt (&vft_seq, vtrue_init);
+
+   gsi_insert_seq_on_edge_immediate (single_succ_edge
+ (ENTRY_BLOCK_PTR_FOR_FN (cfun)),
+ vft_seq);
+  }
+
+/* We can't insert on abnormal edges, but we can arrange for SEQ
+   to execute conditionally at dest.  Add a PHI boolean with TRUE
+   from E and FALES from other preds, split the whole block, add a
+   test for the PHI to run a new block with SEQ or skip straight
+   to the original block.  If there are multiple incoming abnormal
+   edges, we'll do this multiple times.  ??? Unless there are
+   multiple abnormal edges with different postcheck status, we
+   could split the block and redirect other edges, rearranging the
+   PHI nodes.  Optimizers already know how to do this, so we can
+   keep things simple here.  */
+basic_block bb = e->dest;
+basic_block bb_postcheck = split_block_after_labels (bb)->dest;
+
+basic_block bb_check = create_empty_bb (e->dest);
+bb_check->count = e->count ();
+if (dom_info_available_p (CDI_DOMINATORS))
+  set_immediate_dominator (CDI_DOMINATORS, bb_check, bb);
+if (current_loops)
+  add_bb_to_loop (bb_check, current_loops->tree_root);
+
+gimple_stmt_iterator chkpt = gsi_after_labels (bb_check);
+gsi_insert_seq_before_without_update (&chkpt, seq, GSI_SAME_

Re: hardcfr: support checking at abnormal edges [PR111943]

2023-10-31 Thread Alexandre Oliva

[adding list]

On Oct 27, 2023, rep.dot@gmail.com wrote:

> +   from E and FALES from other preds, split the whole block, add a

> s/FALES/FALSE/

Thanks, I've just installed the patch including the typo fix.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

follow SSA defs for asan base

2021-01-21 Thread Alexandre Oliva



Ada makes extensive use of nested functions, which turn all automatic
variables of the enclosing function that are used in nested ones into
members of an artificial FRAME record type.

The address of a local variable is usually passed to asan marking
functions without using a temporary.  Taking the address of a member
of FRAME within a nested function, however, is not regarded as a
gimple val: while introducing FRAME variables, current_function_decl
is always the outermost function, even while processing a nested
function, so decl_address_invariant_p returns false for such
ADDR_EXPRs.  So, as automatic variables are moved into FRAME, any asan
call that marks such a variable has its ADDR_EXPR replaced with a
SSA_NAME set to the ADDR_EXPR of the FRAME member.

asan_expand_mark_ifn was not prepared to deal with ADDR_EXPRs split
out into SSA_NAMEs.  This patch deals with such cases.

[It does NOT deal with PHI nodes and whatnot.  I'm not even sure it
should.  Maybe we want the ADDR_EXPR to be a gimple val instead, but
this more conservative fix felt more appropriate for this stage.]

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* asan.c (asan_expand_mark_ifn): Follow SSA_NAME defs for
an ADDR_EXPR base.

for  gcc/testsuite/ChangeLog

* gcc.dg/asan/nested-1.c: New.
---
 gcc/asan.c   |   21 +
 gcc/testsuite/gcc.dg/asan/nested-1.c |   24 
 2 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 89ecd99b18294..2d2fb97098b2f 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -3629,6 +3629,27 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
   bool is_poison = ((asan_mark_flags)flag) == ASAN_MARK_POISON;
 
   tree base = gimple_call_arg (g, 1);
+  while (TREE_CODE (base) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (base);
+  if (!def)
+   break;
+
+  if (!is_gimple_assign (def))
+   break;
+
+  if (!SINGLE_SSA_TREE_OPERAND (def, SSA_OP_DEF))
+   break;
+
+  if (gimple_num_ops (def) != 2)
+   break;
+
+  if (gimple_expr_code (def) == ADDR_EXPR
+ || gimple_expr_code (def) == SSA_NAME)
+   base = gimple_assign_rhs1 (def);
+  else
+   break;
+}
   gcc_checking_assert (TREE_CODE (base) == ADDR_EXPR);
   tree decl = TREE_OPERAND (base, 0);
 
diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c 
b/gcc/testsuite/gcc.dg/asan/nested-1.c
new file mode 100644
index 0..87e842098077c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/nested-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=address" } */
+
+int f(int i) {
+  auto int h() {
+int r;
+int *p;
+
+{
+  int x[3];
+
+  auto int g() {
+   return x[i];
+  }
+
+  p = &r;
+  *p = g();
+}
+
+return *p;
+  }
+
+  return h();
+}


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva



GNAT may create temporaries to hold return values of function calls.
If such a temporary is created as part of a dynamic initializer of a
variable in a unit other than the one being compiled, the initializer
is dropped, including the temporary and its binding block.

Don't issue asan mark calls for such variables, they are gone.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimplify.c (gimplify_decl_expr): Skip asan marking calls for
temporaries not seen in binding block, and not about to be
added as gimple variables.

for  gcc/testsuite/ChangeLog

* gnat.dg/asan1.adb: New test.
* gnat.dg/asan1_pkg.ads: New additional source.
---
 gcc/gimplify.c  |8 +++-
 gcc/testsuite/gnat.dg/asan1.adb |   15 +++
 gcc/testsuite/gnat.dg/asan1_pkg.ads |9 +
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gnat.dg/asan1.adb
 create mode 100644 gcc/testsuite/gnat.dg/asan1_pkg.ads

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d2ac5f913593f..95d55bb8ba4c7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1795,7 +1795,13 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
  && !DECL_HAS_VALUE_EXPR_P (decl)
  && DECL_ALIGN (decl) <= MAX_SUPPORTED_STACK_ALIGNMENT
  && dbg_cnt (asan_use_after_scope)
- && !gimplify_omp_ctxp)
+ && !gimplify_omp_ctxp
+ /* GNAT introduces temporaries to hold return values of calls in
+initializers of variables defined in other units, so the
+declaration of the variable is discarded completely.  We do not
+want to issue poison calls for such dropped variables.  */
+ && (DECL_SEEN_IN_BIND_EXPR_P (decl)
+ || (DECL_ARTIFICIAL (decl) && DECL_NAME (decl) == NULL_TREE)))
{
  asan_poisoned_variables->add (decl);
  asan_poison_variable (decl, false, seq_p);
diff --git a/gcc/testsuite/gnat.dg/asan1.adb b/gcc/testsuite/gnat.dg/asan1.adb
new file mode 100644
index 0..a4bc59a9a2143
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/asan1.adb
@@ -0,0 +1,15 @@
+--  { dg-do compile }
+--  { dg-additional-sources asan1_pkg.ads }
+--  { dg-options "-fsanitize=address" }
+--  { dg-skip-if "" no_fsanitize_address }
+
+with Asan1_Pkg;
+
+procedure Asan1 is
+   use Asan1_Pkg;
+
+   X, Y : E;
+begin
+   X := C (N);
+   Y := V;
+end Asan1;
diff --git a/gcc/testsuite/gnat.dg/asan1_pkg.ads 
b/gcc/testsuite/gnat.dg/asan1_pkg.ads
new file mode 100644
index 0..fbbc1c5e7f5bd
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/asan1_pkg.ads
@@ -0,0 +1,9 @@
+package Asan1_Pkg is
+   subtype E is Integer;
+   type T is array (1..32) of E;
+
+   function N return T;
+   function C (P : T) return E;
+
+   V : constant E := C (N);
+end Asan1_Pkg;


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva

On Jan 21, 2021, Jakub Jelinek  wrote:

> Does that affect only Ada and not other languages?

I haven't observed it on other languages, but I didn't try really hard.
Doing that now, with an assert that the newly-added condition doesn't
ever hit.  I'd already completed a bootstrap-asan the other day, but not
with all languages.

The kind of variable that triggers the problem is created within
gcc/ada/gcc-interface/trans.c:Call_to_gnu, specifically within
create_temporary in the same file.  In the provided testcase, it injects
the temporary created for the return value from N, passed as a parameter
to C.  The binding block containing that temporary ends up dropped when
the initializer of V is discarded, because it's dynamic and V is
imported from a different unit.


I figured if the added condition were to ever hit before, we would add a
poison call to a function that did not have gimple_add_tmp_var called on
it, and that would NOT have it called in the block right after the one I
proposed to modify:

  if (!DECL_SEEN_IN_BIND_EXPR_P (decl)
  && DECL_ARTIFICIAL (decl) && DECL_NAME (decl) == NULL_TREE)
gimple_add_tmp_var (decl);

Without gimple_add_tmp_var, we wouldn't allocate automatic storage to
the variable in expand, and then the attempt to take its address for the
poisoning call would explode in make_decl_rtl like this testcase does,
because make_decl_rtl is not to be called for automatic variables.

Since this didn't happen, I figured the new condition would not be hit
except for the failing case.  But I was wrong.  The bootstrap with the
added assert has just failed, as early as stage2 libiberty.  Looking
into it...

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva

On Jan 21, 2021, Alexandre Oliva  wrote:

> But I was wrong.  The bootstrap with the added assert has just failed,
> as early as stage2 libiberty.  Looking into it...

Uhh, I take that back.  I just goofed in the assert, inverting the
condition.  Long day...

With the correct condition, it's got past the stage2 compilation of all
of the gcc deps and Ada sources, and then some.  Maybe my reasoning
wasn't wrong, after all ;-)

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: skip asan-poisoning of discarded vars

2021-01-21 Thread Alexandre Oliva

On Jan 21, 2021, Alexandre Oliva  wrote:

> On Jan 21, 2021, Alexandre Oliva  wrote:
>> But I was wrong.  The bootstrap with the added assert has just failed,
>> as early as stage2 libiberty.  Looking into it...

> Uhh, I take that back.  I just goofed in the assert, inverting the
> condition.  Long day...

> With the correct condition, it's got past the stage2 compilation of all
> of the gcc deps and Ada sources, and then some.  Maybe my reasoning
> wasn't wrong, after all ;-)

Yeah, confirmed, bootstrap-asan (and -ubsan) completed on
x86_64-linux-gnu, all languages, with the following patchlet
(cut&pasted, then retabified manually, so it may not apply
mechanically):

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d2ac5f9..c0dcb39 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1797,6 +1797,9 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
  && dbg_cnt (asan_use_after_scope)
  && !gimplify_omp_ctxp)
{
+ gcc_assert (DECL_SEEN_IN_BIND_EXPR_P (decl)
+ || (DECL_ARTIFICIAL (decl) 
+ && DECL_NAME (decl) == NULL_TREE));
  asan_poisoned_variables->add (decl);
  asan_poison_variable (decl, false, seq_p);
  if (!DECL_ARTIFICIAL (decl) && gimplify_ctxp->live_switch_vars)


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: follow SSA defs for asan base

2021-01-26 Thread Alexandre Oliva

On Jan 22, 2021, Richard Biener via Gcc-patches  wrote:

> Yeah, I guess such addresses could be decl_address_invariant_p by changing
> the

>   || DECL_CONTEXT (op) == current_function_decl
>   || decl_function_context (op) == current_function_decl)

> to sth like

>|| auto_var_p (op)

> but I guess it won't help much since the access in the nested function
> will be through the static chain pointer and thus &chain->member
> which definitely isn't a gimple val.

I think we only get poison/unpoison calls in the scope of the automatic
variables, so it won't be the case that these calls will get such
indirect frame references: they will only get references to the own
function's frame, and those have invariant addresses, and thus can be
regarded as such, and as gimple vals.

So I gave this alternate change a spin, and both regstrap and asan+ubsan
bootstrap completed successfully.

Given the considerations of risk about the assert you pointed out, I'm
now inclined to regard this change as safer and superior.  Do you all
concur?  Ok to install if so?

(I see the code in tree.c was untabified, and one of my changes
introduced a tab that misaligned stuff.  Tabify, untabify, or leave it
inconsistent as in the tested patch below?

regard the address of auto vars and consts as invariant

From: Alexandre Oliva 

Ada makes extensive use of nested functions, which turn all automatic
variables of the enclosing function that are used in nested ones into
members of an artificial FRAME record type.

The address of a local variable is usually passed to asan marking
functions without using a temporary.  asan_expand_mark_ifn will reject
an ADDR_EXPRs if it's split out from the call into an SSA_NAMEs.

Taking the address of a member of FRAME within a nested function was
not regarded as a gimple val: while introducing FRAME variables,
current_function_decl pointed to the outermost function, even while
processing a nested function, so decl_address_invariant_p, checking
that the context of the variable is current_function_decl, returned
false for such ADDR_EXPRs.

This patch changes decl_address_invariant_p to disregard
current_function_decl, and regard all automatic variables as having
invariant addresses, regardless of nesting.  This may initially
include references to variables in other nesting levels, but once they
become references to enclosing frames, the indirection makes them
non-gimple_vals.  As long as poisoning and unpoisoning calls doesn't
kick in for variables in other frames, this shouldn't be a problem.

for  gcc/ChangeLog

* tree.c (decl_address_invariant_p): Accept auto variables and
constants.

for  gcc/testsuite/ChangeLog

* gcc.dg/asan/nested-1.c: New.
---
 gcc/testsuite/gcc.dg/asan/nested-1.c |   24 
 gcc/tree.c   |5 ++---
 2 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c

diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c 
b/gcc/testsuite/gcc.dg/asan/nested-1.c
new file mode 100644
index 0..87e842098077c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/nested-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=address" } */
+
+int f(int i) {
+  auto int h() {
+int r;
+int *p;
+
+{
+  int x[3];
+
+  auto int g() {
+   return x[i];
+  }
+
+  p = &r;
+  *p = g();
+}
+
+return *p;
+  }
+
+  return h();
+}
diff --git a/gcc/tree.c b/gcc/tree.c
index 287e5001dc3b3..3de3085f42c8a 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -3590,14 +3590,13 @@ decl_address_invariant_p (const_tree op)
 case VAR_DECL:
   if ((TREE_STATIC (op) || DECL_EXTERNAL (op))
   || DECL_THREAD_LOCAL_P (op)
-  || DECL_CONTEXT (op) == current_function_decl
-  || decl_function_context (op) == current_function_decl)
+ || auto_var_p (op))
 return true;
   break;

 case CONST_DECL:
   if ((TREE_STATIC (op) || DECL_EXTERNAL (op))
-  || decl_function_context (op) == current_function_decl)
+  || auto_var_p (op))
 return true;
   break;

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: follow SSA defs for asan base

2021-01-27 Thread Alexandre Oliva

On Jan 26, 2021, Richard Biener  wrote:

> So while I think it's safe let's look at if we can improve tree-nested.c,
> like I see (probably not the correct place):

*nod*, it's just not the *only* place.

> seeing how we adjust current_function_decl around the
> recompute_tree_invariant_for_addr_expr call but not the
> gsi_gimplify_val one (we already pass it a nesting_info,
> not sure if wi->info is the same as the 'info' used above though),
> so eventually we can fix it in one place?

There are pieces of nested function lowering for which we set cfun and
current_function_decl while walking each function, and there are other
pieces that just don't bother, and we only set up current_function_decl
temporarily for ADDR_EXPR handling.

This patch adjusts both of the ADDR_EXPR handlers that override
current_function_decl, so that the temporary overriding remains in
effect during the re-gimplification.  That is enough to avoid the
problem.  But I'm not very happy with this temporary overriding, it
seems fishy.  I'd rather we set things up for the entire duration of the
walking of each function.

But that's only relevant because we rely on current_function_decl for
address handling.  It's not clear to me that we should, as the other
patch demonstrated.  With it, we could probably even do away with these
overriders.

But, for this stage, this is probably as conservative a change as we
could possibly hope for.  I've regstrapped it on x86_64-linux-gnu, and
also bootstrapped it with asan and ubsan.  Ok to install?

restore current_function_decl after re-gimplifying nested ADDR_EXPRs

From: Alexandre Oliva 

Ada makes extensive use of nested functions, which turn all automatic
variables of the enclosing function that are used in nested ones into
members of an artificial FRAME record type.

The address of a local variable is usually passed to asan marking
functions without using a temporary.  asan_expand_mark_ifn will reject
an ADDR_EXPRs if it's split out from the call into an SSA_NAMEs.

Taking the address of a member of FRAME within a nested function was
not regarded as a gimple val: while introducing FRAME variables,
current_function_decl pointed to the outermost function, even while
processing a nested function, so decl_address_invariant_p, checking
that the context of the variable is current_function_decl, returned
false for such ADDR_EXPRs.

decl_address_invariant_p, called when determining whether an
expression is a legitimate gimple value, compares the context of
automatic variables with current_function_decl.  Some of the
tree-nested function processing doesn't set current_function_decl, but
ADDR_EXPR-processing bits temporarily override it.  However, they
restore it before re-gimplifying, which causes even ADDR_EXPRs
referencing automatic variables in the FRAME struct of a nested
function to not be regarded as address-invariant.

This patch moves the restores of current_function_decl in the
ADDR_EXPR-handling bits after the re-gimplification, so that the
correct current_function_decl is used when testing for address
invariance.

for  gcc/ChangeLog

* tree-nested.c (convert_nonlocal_reference_op): Move
current_function_decl restore after re-gimplification.
(convert_local_reference_op): Likewise.

for  gcc/testsuite/ChangeLog

* gcc.dg/asan/nested-1.c: New.
---
 gcc/testsuite/gcc.dg/asan/nested-1.c |   24 
 gcc/tree-nested.c|4 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c

diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c 
b/gcc/testsuite/gcc.dg/asan/nested-1.c
new file mode 100644
index 0..87e842098077c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/nested-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=address" } */
+
+int f(int i) {
+  auto int h() {
+int r;
+int *p;
+
+{
+  int x[3];
+
+  auto int g() {
+   return x[i];
+  }
+
+  p = &r;
+  *p = g();
+}
+
+return *p;
+  }
+
+  return h();
+}
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 1b52669b622aa..addd6eef9aba6 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1214,7 +1214,6 @@ convert_nonlocal_reference_op (tree *tp, int 
*walk_subtrees, void *data)
save_context = current_function_decl;
current_function_decl = info->context;
recompute_tree_invariant_for_addr_expr (t);
-   current_function_decl = save_context;

/* If the callback converted the address argument in a context
   where we only accept variables (and min_invariant, presumably),
@@ -1222,6 +1221,7 @@ convert_nonlocal_reference_op (tree *tp, int 
*walk_subtrees, void *data)
if (save_val_only)
  *tp = gsi_gimplify_val ((struct nesting_info *) wi-&

[RFC] test builtin ratio for loop distribution

2021-01-27 Thread Alexandre Oliva



This patch attempts to fix a libgcc codegen regression introduced in
gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2.


The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch adds to the loop distribution pass some cost analysis based
on preexisting *_RATIO macros, so that we won't transform loops with
trip counts as low as the ratios we'd rather expand inline.


This patch is not finished; it needs adjustments to the testsuite, to
make up for the behavior changes it brings about.  Specifically, on a
x86_64-linux-gnu regstrap, it regresses:

> FAIL: gcc.dg/pr53265.c  (test for warnings, line 40)
> FAIL: gcc.dg/pr53265.c  (test for warnings, line 42)
> FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 1 
> library cal> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++14  scan-tree-dump 
> ldist "split to 0 loops and 1 library calls"
> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++17  scan-tree-dump ldist "split to 
> 0 loops and 1 library calls"
> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++2a  scan-tree-dump ldist "split to 
> 0 loops and 1 library calls"

I suppose just lengthening the loops will take care of ldist-38 and
pr78847, but the loss of the warnings in pr53265 is more concerning, and
will require investigation.

Nevertheless, I seek feedback on whether this is an acceptable approach,
or whether we should use alternate tuning parameters for ldist, or
something entirely different.  Thanks in advance,


for  gcc/ChangeLog

* tree-loop-distribution.c (maybe_normalize_partition): New.
(loop_distribution::distribute_loop): Call it.

[requires testsuite adjustments and investigation of a warning regression]
---
 gcc/tree-loop-distribution.c |   54 ++
 1 file changed, 54 insertions(+)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index bb15fd3723fb6..b5198652817ee 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec 
*partitions)
 }
 }
 
+/* Return false if it's profitable to turn the LOOP PARTITION into a builtin
+   call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL.  */
+
+static bool
+maybe_normalize_partition (class loop *loop, struct partition *partition)
+{
+  unsigned HOST_WIDE_INT ratio;
+
+  switch (partition->kind)
+{
+case PKIND_NORMAL:
+case PKIND_PARTIAL_MEMSET:
+  return false;
+
+case PKIND_MEMSET:
+  if (integer_zerop (gimple_assign_rhs1 (DR_STMT
+(partition->builtin->dst_dr
+   ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop));
+  else
+   ratio = SET_RATIO (optimize_loop_for_speed_p (loop));
+  break;
+
+case PKIND_MEMCPY:
+case PKIND_MEMMOVE:
+  ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+return false;
+
+  wide_int minit, maxit;
+  value_range_kind vrk = determine_value_range (niters, &minit, &maxit);
+  if (vrk == VR_RANGE && wi::ltu_p (maxit, ratio))
+{
+  partition->kind = PKIND_NORMAL;
+  return true;
+}
+
+  return false;
+}
+
 void
 loop_distribution::finalize_partitions (class loop *loop,
vec *partitions,
@@ -3087,6 +3133,14 @@ loop_distribution::distribute_loop (class loop *loop, 
vec stmts,
 }
 
   finalize_partitions (loop, &partitions, &alias_ddrs);
+  {
+bool any_changes_p = false;
+for (i = 0; partitions.iterate (i, &partition); ++i)
+  if (maybe_normalize_partition (loop, partition))
+   any_changes_p = true;
+if (any_changes_p)
+  finalize_partitions (loop, &partitions, &alias_ddrs);
+  }
 
   /* If there is a reduction in all partitions make sure the last one
  is not classified for builtin code generation.  */

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

[RFC] mask out mult expr ctz bits from nonzero bits

2021-01-27 Thread Alexandre Oliva



While looking into the possibility of introducing setmemM patterns on
RISC-V to undo the transformation from loops of word writes into
memset, I was disappointed to find out that get_nonzero_bits would
take into account the range of the length passed to memset, but not
the trivially-available observation that this length was a multiple of
the word size.  This knowledge, if passed on to setmemM, could enable
setmemM to output more efficient code.

In the end, I did not introduce a setmemM pattern, nor the machinery
to pass the ctz of the length on to it along with other useful
information, but I figured this small improvement to nonzero_bits
could still improve code generation elsewhere.
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564341.html


Regstrapped on x86_64-linux-gnu.  No analysis of codegen impact yet.
Does this seem worth pursuing, presumably for stage1?


for  gcc/ChangeLog

* tree-ssanames.c (get_nonzero_bits): Zero out low bits of
integral types, when a MULT_EXPR INTEGER_CST operand ensures
the result will be a multiple of a power of two.
---
 gcc/tree-ssanames.c |   23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 51a26d2fce1c2..c4b5bf2a4999a 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -546,10 +546,29 @@ get_nonzero_bits (const_tree name)
 }
 
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
+  wide_int ret;
   if (!ri)
-return wi::shwi (-1, precision);
+ret = wi::shwi (-1, precision);
+  else
+ret = ri->get_nonzero_bits ();
+
+  /* If NAME is defined as a multiple of a constant C, we know the ctz(C) low
+ bits are zero.  ??? Should we handle LSHIFT_EXPR too?  Non-constants,
+ e.g. the minimum shift count, and ctz from both MULT_EXPR operands?  That
+ could make for deep recursion.  */
+  if (INTEGRAL_TYPE_P (TREE_TYPE (name))
+  && SSA_NAME_DEF_STMT (name)
+  && is_gimple_assign (SSA_NAME_DEF_STMT (name))
+  && gimple_assign_rhs_code (SSA_NAME_DEF_STMT (name)) == MULT_EXPR
+  && TREE_CODE (gimple_assign_rhs2 (SSA_NAME_DEF_STMT (name))) == 
INTEGER_CST)
+{
+  unsigned HOST_WIDE_INT bits
+   = tree_ctz (gimple_assign_rhs2 (SSA_NAME_DEF_STMT (name)));
+  wide_int mask = wi::shwi (-1, precision) << bits;
+  ret &= mask;
+}
 
-  return ri->get_nonzero_bits ();
+  return ret;
 }
 
 /* Return TRUE is OP, an SSA_NAME has a range of values [0..1], false


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-01-27 Thread Alexandre Oliva

On Jan 27, 2021, Richard Biener  wrote:

> That said, rather than not transforming the loop as you do I'd
> say we want to re-inline small copies more forcefully during
> loop distribution code-gen so we turn a loop that sets
> 3 'short int' to zero into a 'int' store and a 'short' store for example
> (we have more code-size leeway here because we formerly had
> a loop).

Ok, that makes sense to me, if there's a chance of growing the access
stride.

> Since you don't add a testcase

Uhh, sorry, I mentioned TFmode emulation calls, but I wasn't explicit I
meant the soft-fp ones from libgcc.

./xgcc -B./ -O2 $srcdir/libgcc/soft-fp/fixtfdi.c \
  -I$srcdir/libgcc/config/riscv -I$srcdir/include \
  -S -o - | grep memset

> I can't see whether the actual case would be fixed by setting SSA
> pointer alignment on the memset arguments

The dest pointer alignment is known at the builtin expansion time,
that's not a problem.  What isn't known (*) is that the length is a
multiple of the word size: what gets to the expander is that it's
between 4 and 12 bytes (targeting 32-bit risc-v), but not that it's
either 4, 8 or 12.  Coming up with an efficient inline expansion becomes
a bit of a challenge without that extra knowledge.

(*) at least before my related patch for get_nonzero_bits
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564344.html

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-02-02 Thread Alexandre Oliva

On Jan 28, 2021, Richard Biener  wrote:

> That would allow turning back the memset into the original loop (but
> with optimal IVs, etc.).

Is this sort of what you had in mind?

I haven't tested the inline expansion of memset much yet; and that of
memcpy, not at all; this really is mainly to check that I understood
what you had in mind before I spend further time polishing it.


test builtin ratio for loop distribution

From: Alexandre Oliva 

The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch adds, to the loop distribution pass, the ability to perform
inline expansion of bounded variable-length memset and memcpy in ways
that take advantage of known alignments and size's factors, when
preexisting *_RATIO macros suggest the inline expansion is
advantageous.


for  gcc/ChangeLog

* tree-loop-distribution.c: Include builtins.h.
(generate_memset_builtin): Introduce optional inline expansion
of bounded variable-sized memset calls.
(generate_memcpy_builtin): Likewise for memcpy only.
(loop_distribution::execute): Fix loop structure afterwards.
---
 gcc/tree-loop-distribution.c |  280 ++
 1 file changed, 279 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index bb15fd3723fb6..3be7a4c1ac281 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 #include "gimple-fold.h"
+#include "builtins.h"
 
 
 #define MAX_DATAREFS_NUM \
@@ -1148,6 +1149,23 @@ generate_memset_builtin (class loop *loop, partition 
*partition)
   /* The new statements will be placed before LOOP.  */
   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
 
+  /* Compute builtin->size range and ctz before it's gimplified; 
sub-expressions
+ thereof are rewritten in place, so they end up referencing SSA_NAMEs for
+ which we don't have VR info.  */
+  unsigned align = get_pointer_alignment (builtin->dst_base) / BITS_PER_UNIT;
+  unsigned alctz, szctz, xbits;
+  wide_int szmin, szmax;
+  value_range_kind vrk;
+  if (align)
+{
+  alctz = wi::ctz (align);
+  szctz = MIN (tree_ctz (builtin->size), alctz);
+  xbits = alctz - szctz;
+  vrk = determine_value_range (builtin->size, &szmin, &szmax);
+  if (szmin == szmax)
+   align = 0;
+}
+
   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
   false, GSI_CONTINUE_LINKING);
@@ -1172,6 +1190,127 @@ generate_memset_builtin (class loop *loop, partition 
*partition)
   val = tem;
 }
 
+  unsigned HOST_WIDE_INT ratio;
+  if (integer_zerop (val))
+ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop));
+  else
+ratio = SET_RATIO (optimize_loop_for_speed_p (loop));
+
+  /* Compare the ratio with the number of (most-aligned) required stores needed
+ for szmax bytes, with the ratio: a loop that iterates up to szmax >> 
alctz,
+ and then one conditional store for each extra bit of alignment that the
+ destination has over the size.  */
+  if (align && vrk == VR_RANGE
+  && wi::ltu_p (wi::rshift (szmax, alctz, UNSIGNED) + xbits, ratio))
+{
+  gimple *stmt;
+  tree bytes = create_tmp_var_raw (size_type_node, "ldistbytes");
+  tree ptr = create_tmp_var_raw (build_pointer_type (char_type_node),
+"ldistptr");
+  tree stval = force_gimple_operand_gsi (&gsi,
+fold_convert
+(unsigned_char_type_node, val),
+true, NULL_TREE, false,
+GSI_CONTINUE_LINKING);
+
+  for (unsigned i = 1; i != alctz; i *= 2)
+   {
+ unsigned bits = i * 2 * BITS_PER_UNIT;
+ tree type = build_nonstandard_integer_type (bits, true);
+ stval = fold_convert (type, stval);
+ tree op2 = fold_build2_loc (partition->loc, LSHIFT_EXPR,

Re: [RFC] test builtin ratio for loop distribution

2021-02-03 Thread Alexandre Oliva

On Feb  3, 2021, Richard Biener  wrote:

> So I think we should try to match what __builtin_memcpy/memset
> expansion would do here, taking advantage of extra alignment
> and size knowledge.  In particular,

>  a) if __builtin_memcpy/memset would use setmem/cpymem optabs
>  see if we can have variants of memcpy/memset transferring alignment
>  and size knowledge

We could add more optional parameters to them to convey the length's
known ctz.  However, the ctz can't be recovered reliably.  We can't even
recover it after gimplifying the length within ldist!

That said, my other patch already enables ctz calls to recover it, at
least in libgcc risc-v tfmode cases, and it's possible it's readily
available in other cases.  I'd rather leave that for someone dealing
with the machine-specific patterns to figure out whether a separate
argument would be useful.  RISC-V, which is what I'm dealing with,
doesn't have much to offer as far as these patterns are concerned.

>  b) if expansion would use BY_PIECES then expand to an unrolled loop

Why would that be better than keeping the constant-length memset call,
that would be turned into an unrolled loop during expand?

>  c) if expansion would emit a memset/memcpy call but we know
>  alignment and have a low bound on niters emit a loop (like your patch 
> does)

> a) might be difficult but adding the builtin variants may pay off in any case.

*nod*

> The patch itself could benefit from one or two helpers we already
> have, first of all there's create_empty_loop_on_edge (so you don't
> need the loop fixup)

Uhh, thanks, but...  you realize nearly all of the gimple-building code
is one and the same for the loop and for trailing count misalignment?
There doesn't seem to be a lot of benefit to using this primitive, aside
from its dealing with creating the loop data structure which, again, I'd
only want to do in the loop case.

(I guess I should add more comments as to the inline expansion
 strategy.  it's equivalent to:

 i = len, ptr = base, blksz = 1 << alctz;
 while (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += 
blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += 
blksz; }
 ... until blksz gets down to zero or to 1< Note that for memmove if we know the dependence direction, we
> can also emit a loop / unrolled code.

*nod*, but the logic would have to be quite different, using bit tests,
and odds are we won't know the direction and have to output a test and
code for both possibilities, which would be quite unlikely to be
beneficial.  Though the original code would quite likely make the
direction visible; perhaps if the size is small enough that we would
expand a memcpy inline, we should refrain from transforming the loop
into a memmove call.

In the case at hand, there's no benefit at all to these transformations:
we start from a loop with the known alignment and a small loop count (0
to 2 words copied), and if everything goes all right with the
transformation, we may be lucky to get back to that.  It's not like the
transformation could even increase the known alignment, so why bother,
and throw away debug info by rewriting the loop into the same code minus
debug?

> I think the builtins with alignment and calloc-style element count
> will be useful on its own.

Oh, I see, you're suggesting actual separate builtin functions.  Uhh...
I'm not sure I want to go there.  I'd much rather recover the ctz of the
length, and use it in existing code.

I'd also prefer if the generic memset (and memcpy and memmove?) builtin
expanders dealt with non-constant lengths in the way I implemented.
That feels like the right spot for it.  That deprives us of gimple loop
optimizations in the inlined loop generated by the current patch, but if
we expand an unrolled loop with compares and offsets with small
constants, loop optimizations might not even be relevant.

FWIW, the patch I posted yesterday is broken, the regstrap test did not
even build libstdc++-v3 successfully.  I'm not sure whether to pursue it
further, or to reimplement it in the expander.  Suggestions are welcome.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-02-04 Thread Alexandre Oliva

On Feb  4, 2021, Richard Biener  wrote:

>> >  b) if expansion would use BY_PIECES then expand to an unrolled loop
>> 
>> Why would that be better than keeping the constant-length memset call,
>> that would be turned into an unrolled loop during expand?

> Well, because of the possibly lost ctz and alignment info.

Funny you should mention that.  I got started with the expand-time
expansion yesterday, and found out that we're not using the alignment
information that is available.  Though the pointer is known to point to
an aligned object, we are going for 8-bit alignment for some reason.

The strategy I used there was to first check whether by_pieces would
expand inline a constant length near the max known length, then loop
over the bits in the variable length, expand in each iteration a
constant-length store-by-pieces for the fixed length corresponding to
that bit, and a test comparing the variable length with the fixed length
guarding the expansion of the store-by-pieces.  We may get larger code
this way (no loops), but only O(log(len)) compares.

I've also fixed some bugs in the ldist expander, so now it bootstraps,
but with a few regressions in the testsuite, that I'm yet to look into.

>> Uhh, thanks, but...  you realize nearly all of the gimple-building code
>> is one and the same for the loop and for trailing count misalignment?

> Sorry, the code lacked comments and so I didn't actually try decipering
> the code you generate ;)

Oh, come on, it was planly obscure ;-D

Sorry for posting an early-draft before polishing it up.

> The original motivation was really that esp. for small trip count loops
> the target knows best how to implement them.  Now, that completely
> fails of course in case the target doesn't implement any of this or
> the generic code fails because we lost ctz and alignment info.

In our case, generic code fails because it won't handle variable-sized
clear-by-pieces.  But then, I found out, when it's fixed-size, it also
makes the code worse, because it seems to expand to byte stores even
when the store-to object is known to have wider alignment:

union u {
  long long i;
  char c[8];
} x[8];
int s(union u *p, int k) {
  for (int i = k ? 0 : 3; i < 8; i++) {
for (int j = 0; j < 8; j++) {
  p[i].c[j] = 0;
} // becomes a memset to an 8-byte-aligned 8-byte object, then 8 byte stores
  }
}

>> > I think the builtins with alignment and calloc-style element count
>> > will be useful on its own.
>> 
>> Oh, I see, you're suggesting actual separate builtin functions.  Uhh...
>> I'm not sure I want to go there.  I'd much rather recover the ctz of the
>> length, and use it in existing code.

> Yeah, but when we generate memcpy there might not be a way to
> store the ctz info until RTL expansion where the magic should really happen 
> ...

True.  It can be recovered without much difficulty in the cases I've
looked at, but it could be lost in others.

> So I'd say go for improving RTL expansion.

'k, thanks

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: use -mfpu=auto for arm/simd/vmmla_1.c (was: use -mfpu=neon for arm/simd/vmmla_1.c)

2021-02-05 Thread Alexandre Oliva

On Jan 13, 2021, Alexandre Oliva  wrote:

> On Jan  5, 2021, Alexandre Oliva  wrote:
>> So this patch adds -mfpu=auto to the test, overriding any implicit
>> flags with the fpu implied by the arch.

>> * gcc.target/arm/simd/vmmla_1.c: Pass -mfpu=auto.

> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562798.html

Ping?

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: use -mfpu=neon for arm/simd/vmmla_1.c

2021-02-10 Thread Alexandre Oliva

On Feb  9, 2021, Kyrylo Tkachov  wrote:

> Ok. Aren't there more tests that have this problem?

Thanks.  This was the only test that exhibited this problem.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-02-11 Thread Alexandre Oliva

On Feb  4, 2021, Jim Wilson  wrote:

> FYI we have a bug report for this for a coremark regression which sounds
> like the same problem.
>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092

Indeed, thanks!

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-02-11 Thread Alexandre Oliva

On Feb  4, 2021, Alexandre Oliva  wrote:

> On Feb  4, 2021, Richard Biener  wrote:
>>> >  b) if expansion would use BY_PIECES then expand to an unrolled loop
>>> 
>>> Why would that be better than keeping the constant-length memset call,
>>> that would be turned into an unrolled loop during expand?

>> Well, because of the possibly lost ctz and alignment info.

> Funny you should mention that.  I got started with the expand-time
> expansion yesterday, and found out that we're not using the alignment
> information that is available.  Though the pointer is known to point to
> an aligned object, we are going for 8-bit alignment for some reason.

> The strategy I used there was to first check whether by_pieces would
> expand inline a constant length near the max known length, then loop
> over the bits in the variable length, expand in each iteration a
> constant-length store-by-pieces for the fixed length corresponding to
> that bit, and a test comparing the variable length with the fixed length
> guarding the expansion of the store-by-pieces.  We may get larger code
> this way (no loops), but only O(log(len)) compares.

> I've also fixed some bugs in the ldist expander, so now it bootstraps,
> but with a few regressions in the testsuite, that I'm yet to look into.

A few more fixes later, this seems to work.

It encompasses the patch to recover tree_ctz from a mult_expr def, it
adds code to set up the alignment data for the ldist-generated memset
dest, and then it expands memset as described above if length is not
constant, setmem is not available, but the by-pieces machinery would
still be used for nearby constants.

How does this look?

[PR94092] introduce try store by multiple pieces

From: Alexandre Oliva 

The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch handles variable lengths with multiple conditional
power-of-2-constant-sized stores-by-pieces, so as to reduce the
overhead of length compares.

It also propagates the alignment of the memset dest, introduced by
loop-distribution, to the SSA pointer used to hold it, so that it is
not discarded right away, and may be recovered by the memset builtin
expander.

Finally, it computes the ctz of the length, and uses it to omit blocks
for stores of small byte counts when we're storing whole words.
tree_ctz is improved so as to look at the length DEF stmt, and zero
out the least-significant bits if it's a multiply by a power of two.

for  gcc/ChangeLog

PR tree-optimization/94092
* builtins.c (try_store_by_multiple_pieces): New.
(expand_builtin_memset_args): Use it.  If target_char_cast
fails, proceed as for non-constant val.  Pass len's ctz to...
* expr.c (clear_storage_hints): ... this.  Try store by
multiple pieces after setmem.
(clear_storage): Adjust.
* expr.h (clear_storage_hints): Likewise.
(try_store_by_multiple_pieces): Declare.
* tree-loop-distribution.c: Include builtins.h.
(generate_memset_builtin): Propagate dst_base alignmen to mem.
* tree-ssanames.c (get_nonzero_bits): Zero out low bits of
integral types, when a MULT_EXPR INTEGER_CST operand ensures
the result will be a multiple of a power of two.
---
 gcc/builtins.c   |  113 +++---
 gcc/expr.c   |9 +++
 gcc/expr.h   |   12 
 gcc/tree-loop-distribution.c |9 +++
 gcc/tree-ssanames.c  |   23 -
 5 files changed, 154 insertions(+), 12 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 0aed008687ccc..44f3af92536a5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6600,6 +6600,97 @@ expand_builtin_memset (tree exp, rtx target, 
machine_mode mode)
   return expand_builtin_memset_args (dest, val, len, target, mode, exp);
 }

+/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
+   Return TRUE if successful, FALSE otherwise.  TO is assumed to be
+   aligned at an ALIGN boundary.  LEN is assumed to be a multiple of
+   1<= ctz_len; i--)
+{
+  unsigned HOST_WIDE_INT blksize = HOST_WIDE_INT_1U << i;
+  rtx_code_label *label = NULL;

Re: [RFC] test builtin ratio for loop distribution

2021-02-11 Thread Alexandre Oliva

On Feb 11, 2021, Alexandre Oliva  wrote:

> How does this look?

> for  gcc/ChangeLog

>   PR tree-optimization/94092
>   * builtins.c (try_store_by_multiple_pieces): New.
>   (expand_builtin_memset_args): Use it.  If target_char_cast
>   fails, proceed as for non-constant val.  Pass len's ctz to...
>   * expr.c (clear_storage_hints): ... this.  Try store by
>   multiple pieces after setmem.
>   (clear_storage): Adjust.
>   * expr.h (clear_storage_hints): Likewise.
>   (try_store_by_multiple_pieces): Declare.
>   * tree-loop-distribution.c: Include builtins.h.
>   (generate_memset_builtin): Propagate dst_base alignmen to mem.
>   * tree-ssanames.c (get_nonzero_bits): Zero out low bits of
>   integral types, when a MULT_EXPR INTEGER_CST operand ensures
>   the result will be a multiple of a power of two.

I forgot to mention this passed regstrap on x86_64-linux-gnu, as well as
some cross testing of riscv32-elf.

I've also regstrapped it on x86_64-linux-gnu along with a patch for
testing purposes, that tried try_store_by_multiple_pieces before setmem
in all 3 locations where they are called, which gives me some confidence
that the implementation is reasonably robust.

Is this ok to install?  (if not right now, perhaps in stage1)

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-02-15 Thread Alexandre Oliva

On Feb 12, 2021, Richard Biener  wrote:

>> +  if (TREE_CODE (mem) == SSA_NAME)
>> +if (ptr_info_def *pi = get_ptr_info (mem))
>> +  {
>> +   unsigned al = get_pointer_alignment (builtin->dst_base);
>> +   if (al > pi->align || pi->misalign)

> We still might prefer pi->align == 64 and pi->misalign == 32 over al == 16
> so maybe factor that in, too.

Ugh, apologies, I somehow posted an incorrect and outdated version of
the patch.  The improved (propagates both alignments) and fixed (divides
by BITS_PER_UNIT, fixing a regression in gfortran.dg/sms-2.f90) had
this alternate hunk as the only difference:

@@ -1155,6 +1156,16 @@ generate_memset_builtin (class loop *loop, partition 
*partition)
   mem = force_gimple_operand_gsi (&gsi, mem, true, NULL_TREE,
  false, GSI_CONTINUE_LINKING);
 
+  if (TREE_CODE (mem) == SSA_NAME)
+if (ptr_info_def *pi = get_ptr_info (mem))
+  {
+   unsigned al;
+   unsigned HOST_WIDE_INT misal;
+   if (get_pointer_alignment_1 (builtin->dst_base, &al, &misal))
+ set_ptr_info_alignment (pi, al / BITS_PER_UNIT,
+ misal / BITS_PER_UNIT);
+  }
+
   /* This exactly matches the pattern recognition in classify_partition.  */
   val = gimple_assign_rhs1 (DR_STMT (builtin->dst_dr));
   /* Handle constants like 0x15151515 and similarly



> So I wonder whether we should instead re-run CCP after loop opts which
> computes nonzero bits as well instead of the above "hack".  Would
> nonzero bits be ready to compute in the above way from loop distribution?
> That is, you can do set_nonzero_bits just like you did
> set_ptr_info_alignment ...

> Since CCP also performs copy propagation an obvious candidate would be
> to replace the last pass_copy_prop with pass_ccp (with a comment noting
> to compute up-to-date nonzero bits info).

I'll look into these possibilities.


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [RFC] test builtin ratio for loop distribution

2021-02-16 Thread Alexandre Oliva

On Feb 16, 2021, Alexandre Oliva  wrote:

>> So I wonder whether we should instead re-run CCP after loop opts which
>> computes nonzero bits as well instead of the above "hack".

That works.  It takes care of both the dest alignment and the len ctz.

Explicitly masking out the len tz from nonzero bits also works, but I
suppose the ccp pass does a better job, and it takes care of memcpy and
memmove as well.

I noticed that ccp releases CDI_DOMINATORS information.  It looks like
the last copy_prop pass, now replaced with ccp in my tree, was late
enough that it doesn't seem to matter, but at first I thought you meant
an earlier copy_prop, that runs before graphite, and dropping dominator
info at that point was a problem for e.g. cunroll.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

[PR94092] Re: [RFC] test builtin ratio for loop distribution

2021-02-19 Thread Alexandre Oliva

Here's an improved version of the patch.  Regstrapped on
x86_64-linux-gnu, with and without a patchlet that moved multi-pieces
ahead of setmem, and also tested with riscv32-elf.

Is it ok to install?  Or should it wait for stage1?


[PR94092] introduce try store by multiple pieces

From: Alexandre Oliva 

The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length clearing memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch handles variable lengths with multiple conditional
power-of-2-constant-sized stores-by-pieces, so as to reduce the
overhead of length compares.

It also changes the last copy-prop pass into ccp, so that pointer
alignment and length's nonzero bits are detected and made available
for the expander, even for ldist-introduced SSA_NAMEs.


for  gcc/ChangeLog

PR tree-optimization/94092
* builtins.c (try_store_by_multiple_pieces): New.
(expand_builtin_memset_args): Use it.  If target_char_cast
fails, proceed as for non-constant val.  Pass len's ctz to...
* expr.c (clear_storage_hints): ... this.  Try store by
multiple pieces after setmem.
(clear_storage): Adjust.
* expr.h (clear_storage_hints): Likewise.
(try_store_by_multiple_pieces): Declare.
* passes.def: Replace the last copy_prop to ccp.
---
 gcc/builtins.c |  182 ++--
 gcc/expr.c |9 ++-
 gcc/expr.h |   13 
 gcc/passes.def |5 +-
 4 files changed, 197 insertions(+), 12 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 0aed008687ccc..05b14f2a3f6d3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6600,6 +6600,166 @@ expand_builtin_memset (tree exp, rtx target, 
machine_mode mode)
   return expand_builtin_memset_args (dest, val, len, target, mode, exp);
 }
 
+/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
+   Return TRUE if successful, FALSE otherwise.  TO is assumed to be
+   aligned at an ALIGN-bits boundary.  LEN must be a multiple of
+   1<= 0);
+
+  if (val)
+valc = 1;
+
+  /* Bits more significant than TST_BITS are part of the shared prefix
+ in the binary representation of both min_len and max_len.  Since
+ they're identical, we don't need to test them in the loop.  */
+  int tst_bits = (max_bits != min_bits ? max_bits
+ : floor_log2 (max_len ^ min_len));
+
+  /* Check whether it's profitable to start by storing a fixed BLKSIZE
+ bytes, to lower max_bits.  In the unlikely case of a constant LEN
+ (implied by identical MAX_LEN and MIN_LEN), we want to issue a
+ single store_by_pieces, but otherwise, select the minimum multiple
+ of the ALIGN (in bytes) and of the MCD of the possible LENs, that
+ brings MAX_LEN below TST_BITS, if that's lower than min_len.  */
+  unsigned HOST_WIDE_INT blksize;
+  if (max_len > min_len)
+{
+  unsigned HOST_WIDE_INT alrng = MAX (HOST_WIDE_INT_1U << ctz_len,
+ align / BITS_PER_UNIT);
+  blksize = max_len- (HOST_WIDE_INT_1U << tst_bits) + alrng;
+  blksize &= ~(alrng - 1);
+}
+  else if (max_len == min_len)
+blksize = max_len;
+  else
+gcc_unreachable ();
+  if (min_len >= blksize)
+{
+  min_len -= blksize;
+  min_bits = floor_log2 (min_len);
+  max_len -= blksize;
+  max_bits = floor_log2 (max_len);
+
+  tst_bits = (max_bits != min_bits ? max_bits
+: floor_log2 (max_len ^ min_len));
+}
+  else
+blksize = 0;
+
+  /* Check that we can use store by pieces for the maximum store count
+ we may issue (initial fixed-size block, plus conditional
+ power-of-two-sized from max_bits to ctz_len.  */
+  unsigned HOST_WIDE_INT xlenest = blksize;
+  if (max_bits >= 0)
+xlenest += ((HOST_WIDE_INT_1U << max_bits) * 2
+   - (HOST_WIDE_INT_1U << ctz_len));
+  if (!can_store_by_pieces (xlenest, builtin_memset_read_str,
+   &valc, align, true))
+return false;
+
+  rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode);
+  void *constfundata;
+  if (val)
+{
+  constfun = builtin_memset_gen_str;
+  constfundata = val = force_reg (TYPE_MODE (unsigned_char_type_node),
+ val);
+}
+  else
+

add rv64im{,c,fc} multilibs

2021-02-23 Thread Alexandre Oliva



We've had customer demand for these multilibs.  We'd be happy to
maintain this change locally, but I thought I'd contribute the patch,
just in case there's wider interest in them.  WDYT?


for  gcc/ChangeLog

* config/riscv/t-elf-multilib: Add multilibs for rv64im,
rv64imc, and rv64imfc/lp64f.
---
 gcc/config/riscv/t-elf-multilib |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/t-elf-multilib b/gcc/config/riscv/t-elf-multilib
index 19f9434616c2d..b268e26c954c8 100644
--- a/gcc/config/riscv/t-elf-multilib
+++ b/gcc/config/riscv/t-elf-multilib
@@ -1,6 +1,6 @@
 # This file was generated by multilib-generator with the command:
-#  ./multilib-generator rv32i-ilp32--c rv32im-ilp32--c rv32iac-ilp32-- 
rv32imac-ilp32-- rv32imafc-ilp32f-rv32imafdc- rv64imac-lp64-- rv64imafdc-lp64d--
-MULTILIB_OPTIONS = 
march=rv32i/march=rv32ic/march=rv32im/march=rv32imc/march=rv32iac/march=rv32imac/march=rv32imafc/march=rv32imafdc/march=rv32gc/march=rv64imac/march=rv64imafdc/march=rv64gc
 mabi=ilp32/mabi=ilp32f/mabi=lp64/mabi=lp64d
+#  ./multilib-generator rv32i-ilp32--c rv32im-ilp32--c rv32iac-ilp32-- 
rv32imac-ilp32-- rv32imafc-ilp32f-rv32imafdc- rv64im-lp64-- rv64imc-lp64-- 
rv64imfc-lp64f-- rv64imac-lp64-- rv64imafdc-lp64d--
+MULTILIB_OPTIONS = 
march=rv32i/march=rv32ic/march=rv32im/march=rv32imc/march=rv32iac/march=rv32imac/march=rv32imafc/march=rv32imafdc/march=rv32gc/march=rv64im/march=rv64imc/march=rv64imfc/march=rv64imac/march=rv64imafdc/march=rv64gc
 mabi=ilp32/mabi=ilp32f/mabi=lp64/mabi=lp64f/mabi=lp64d
 MULTILIB_DIRNAMES = rv32i \
 rv32ic \
 rv32im \
@@ -10,17 +10,24 @@ rv32imac \
 rv32imafc \
 rv32imafdc \
 rv32gc \
+rv64im \
+rv64imc \
+rv64imfc \
 rv64imac \
 rv64imafdc \
 rv64gc ilp32 \
 ilp32f \
 lp64 \
+lp64f \
 lp64d
 MULTILIB_REQUIRED = march=rv32i/mabi=ilp32 \
 march=rv32im/mabi=ilp32 \
 march=rv32iac/mabi=ilp32 \
 march=rv32imac/mabi=ilp32 \
 march=rv32imafc/mabi=ilp32f \
+march=rv64im/mabi=lp64 \
+march=rv64imc/mabi=lp64 \
+march=rv64imfc/mabi=lp64f \
 march=rv64imac/mabi=lp64 \
 march=rv64imafdc/mabi=lp64d
 MULTILIB_REUSE = march.rv32i/mabi.ilp32=march.rv32ic/mabi.ilp32 \


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add rv64im{,c,fc} multilibs

2021-02-23 Thread Alexandre Oliva

On Feb 23, 2021, Kito Cheng  wrote:

> We've added a new configure option to allow you to override that
> without changing source code.

Ah, nice, thanks!  I'll add a note on our internal patch to switch to
that when we switch to GCC 11.

I take your response as confirming my expectation that the defaults are
to remain unchanged for now, and I will thus proceed under this
assumption.

Thanks for the prompt response!

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add rv64im{,c,fc} multilibs

2021-02-24 Thread Alexandre Oliva

On Feb 23, 2021, Jim Wilson  wrote:

> If we add default multilibs for you, then to be fair, we need to add
> default multilibs for other people that ask, and before long we are trying
> to build hundreds or maybe even thousands of multilibs by default which is
> unworkable.

*nod*, it's a very familiar issue to me, I know where that's coming
from, no worries.  I expected the change would be turned down, and for
good reason, unless there was say an emerging growth of adoption on
those multilibs, which I wasn't aware of.  The expectation didn't stop
me from offering the patch just in case, that's all.

> People that want a different set can define their own, and we have
> made it easy for people to define their own sets of multilibs as Kito
> pointed out.

*nod*, thanks,

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

[FYI] Add missing dg-require-effective-target fpic directives to powerpc tests

2021-02-26 Thread Alexandre Oliva

t { powerpc-*-linux* && ilp32 } } } */
 /* { dg-options "-O2 -fpic -fno-reorder-blocks" } */
+/* { dg-require-effective-target fpic } */
 
 /* Analog of pr79439-1.c for 32-bit Linux.  */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr83629.c 
b/gcc/testsuite/gcc.target/powerpc/pr83629.c
index 976b564e927d7..8900010fb1d0a 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr83629.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr83629.c
@@ -1,6 +1,7 @@
 /* { dg-require-effective-target ilp32 } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-O2 -fPIC -frename-registers 
--param=sched-autopref-queue-depth=0 -mdejagnu-cpu=603" } */
+/* { dg-require-effective-target fpic } */
 
 extern void bar (void *);
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr84112.c 
b/gcc/testsuite/gcc.target/powerpc/pr84112.c
index 8fbafa1b1ca59..82a7d2a60f055 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr84112.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr84112.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -fstack-protector-strong -fpic" } */
+/* { dg-require-effective-target fpic } */
 
 char *b;
 int c, d, e, f;


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [FYI] Add missing dg-require-effective-target fpic directives to powerpc tests

2021-02-26 Thread Alexandre Oliva

revert just-added duplicate fpic target requirement

A moment after pushing the previous patch, I noticed the fpic target
requirement markers had already been added to some of the files in the
patch from long ago that I've just contributed.  This patch reverts
the duplicates.


for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/pr67789.c: Revert fpic target requirement
duplication.
* gcc.target/powerpc/pr83629.c: Likewise.
* gcc.target/powerpc/pr84112.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/pr67789.c |1 -
 gcc/testsuite/gcc.target/powerpc/pr83629.c |1 -
 gcc/testsuite/gcc.target/powerpc/pr84112.c |1 -
 3 files changed, 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr67789.c 
b/gcc/testsuite/gcc.target/powerpc/pr67789.c
index ea77ec979cd5a..05d01ef20d777 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr67789.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr67789.c
@@ -1,7 +1,6 @@
 /* { dg-do assemble } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-O2 -msecure-plt -fPIC" } */
-/* { dg-require-effective-target fpic } */
 /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
 
 #define FE_TONEAREST 0
diff --git a/gcc/testsuite/gcc.target/powerpc/pr83629.c 
b/gcc/testsuite/gcc.target/powerpc/pr83629.c
index 8900010fb1d0a..976b564e927d7 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr83629.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr83629.c
@@ -1,7 +1,6 @@
 /* { dg-require-effective-target ilp32 } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-O2 -fPIC -frename-registers 
--param=sched-autopref-queue-depth=0 -mdejagnu-cpu=603" } */
-/* { dg-require-effective-target fpic } */
 
 extern void bar (void *);
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr84112.c 
b/gcc/testsuite/gcc.target/powerpc/pr84112.c
index 82a7d2a60f055..8fbafa1b1ca59 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr84112.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr84112.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -fstack-protector-strong -fpic" } */
-/* { dg-require-effective-target fpic } */
 
 char *b;
 int c, d, e, f;

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

add -mpowerpc-gpopt to options for sqrt insn on PowerPC

2021-02-26 Thread Alexandre Oliva

This patch avoids an ICE in gimplefe-28.c, in our ppc64-vxworks7r2
tests.  Tested on x86_64-linux-gnu, and on the affected platform.  Ok to
install?


From: Eric Botcazou 
for  gcc/testsuite/ChangeLog

* lib/target-supports.exp (add_options_for_sqrt_insn): For
PowerPC targets, add -mpowerpc-gpopt option.
---
 gcc/testsuite/lib/target-supports.exp |3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index af46c77921482..29faeeba67945 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7598,6 +7598,9 @@ proc add_options_for_sqrt_insn { flags } {
 if { [istarget arm*-*-*] } {
return [add_options_for_arm_vfp "$flags"]
 }
+if { [istarget powerpc*-*-*] } {
+   return "$flags -mpowerpc-gpopt"
+}
 return $flags
 }
 


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

add powerpc_vsx_ok requirement to undef-bool tests

2021-02-26 Thread Alexandre Oliva



These tests use -mvsx in their dg-options list, so they are only
applicable if the -mvsx option is supported by the compiler.

Tested with target ppc64-vx7r2, configured to force altivec disabled,
and thus to reject vsx.  Ok to install?


From: Joel Brobecker 
gcc/testsuite/ChangeLog:

* gcc.target/powerpc/undef-bool-2.c: Add
dg-require-effective-target powerpc_vsx_ok directive.
* g++.dg/ext/undef-bool-1.C: Add dg-require-effective-target
powerpc_vsx_ok directive.
---
 gcc/testsuite/g++.dg/ext/undef-bool-1.C |1 +
 gcc/testsuite/gcc.target/powerpc/undef-bool-2.c |1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.dg/ext/undef-bool-1.C 
b/gcc/testsuite/g++.dg/ext/undef-bool-1.C
index 716e06c1ce413..9cc1cd8726098 100644
--- a/gcc/testsuite/g++.dg/ext/undef-bool-1.C
+++ b/gcc/testsuite/g++.dg/ext/undef-bool-1.C
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-options "-O2 -DNO_WARN_X86_INTRINSICS -mvsx" } */
 /* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
 
 /* Test to ensure that "bool" gets undef'd in xmmintrin.h when
we require strict ANSI.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c 
b/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c
index d4944ab1ca6c7..7bc5d18484039 100644
--- a/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -std=c11 -DNO_WARN_X86_INTRINSICS -mvsx" } */
 /* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
 
 /* Test to ensure that "bool" gets undef'd in xmmintrin.h when
we require strict ANSI.  Subsequent use of bool needs stdbool.h.


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC

2021-03-02 Thread Alexandre Oliva

On Feb 26, 2021, Segher Boessenkool  wrote:

> On Fri, Feb 26, 2021 at 12:31:16PM -0500, David Edelsohn wrote:
>> On Fri, Feb 26, 2021 at 11:09 AM Alexandre Oliva  wrote:
>> >
>> > This patch avoids an ICE in gimplefe-28.c, in our ppc64-vxworks7r2
>> > tests.  Tested on x86_64-linux-gnu, and on the affected platform.  Ok to
>> > install?
>> 
>> I'm sort of surprised that sqrt instruction would be available for the
>> target but not enabled by default.  Is this really the method that
>> customers would use?  It seems that it might be more appropriate to
>> xfail or skip the test for PPC64 VxWorks.

> You should essentially never use -mpowerpc-gpopt, but instead use a
> -mcpu= that has it.  You also of course whould not do that in run tests
> if the cpu does not support those insns.

I think the feedback is missing the point of the obvious bug that Eric's
patch fixes.

We have a dejagnu proc that should return any target-specific options
necessary for a sqrt insn to be available:

# Return any additional options to enable square root intructions.

powerpc has an optional sqrt insn, but the option that enables it is not
returned by that proc.  That's a blatang bug in that proc.  Do you see
that, or do you have any sensible reasoning to share to support the
position that the proc is NOT buggy?

This proc is presumably only used in compile tests; it wouldn't make
sense to assume an optional sqrt insn to be available on whatever
hardware variant you happen to be using for testing.

But the bug fixed by Eric's patch is pretty blatant, and I don't think
it makes sense to reject this fix, and insist on a fix of another bug
instead.  That would just leave *this* bug in the dejagnu proc unfixed.

Now, maybe a different flag is to be returned, but -mpowerpc-gpopt is
the flag that enables sqrt with minimal disturbance of any other cpu
selections in effect, so I believe that's the option that best serves
the purpose of making the hardware sqrt insn available, regardless of
whatever would be recommended to end users.

> But, Alex, you say this avoids an ICE?  You shouldn't avoid the ICE in
> that testcase then -- instead, fix it!  (Or report it).

Orthogonal issue, IMHO.  If you restate the response as "the proposed
patch is fine as long as a bug report is on file for the error
encountered when attempting to expand . when a sqrt intrinsic is
not available", I can go along with it.  But saying "we don't want to
fix this bug, we want to fix another vaguely related bug *instead*",
will make our stances mutually incompatible, and would likely result in
my pursuing neither bug.

While at that, if -mpowerpc-gpopt is not to be used, maybe you'll want
to fix, or file a bug report, about the multiple tests
gcc.target/powerpc/pr46728-*.c, that use it explicitly, and for the very
purpose of enabling the fsqrt insn.  Several of these are execution
tests, so they fail on systems that don't offer fsqrt.  I suppose these
should mention *_sqrt_insn target requirements and add options.  I
haven't looked into how they interact, if at all.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [PATCH] gcc.misc-tests/outputs.exp: assert unique test-names

2021-03-03 Thread Alexandre Oliva

On Feb 25, 2021, Hans-Peter Nilsson  wrote:

> Navigating and debugging causes for failing tests here isn't
> helped by the existence of tests with duplicate names.

Aah, I guess I see what happened: some test sets were copied to cover
additional cases I hadn't covered (cool :-), but the test names were not
changed.

I wonder if we wouldn't be better off using say $bp instead of $b in
test name strings, so that bp could be set to "$b whatever" or "$b"
before a group of tests.

This would make it easier to copy blocks of tests, but it would make it
harder to search for a failing test.

I suppose the changes you proposed, along with code that rejects
duplicates, is preferrable:

> gcc/testsuite:
>   * gcc.misc-tests/outputs.exp: Append discriminating
>   suffixes to tests with duplicate names.
>   (outest): Assert that each running test has a unique
>   name.

LGTM, thanks!

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: [PATCH] outputs.exp: skip @file -save-temps if target has -L or -I

2021-03-03 Thread Alexandre Oliva

On Feb 25, 2021, Hans-Peter Nilsson  wrote:

> Additional files are created in presence of @file option.

Oh, wow.  I hope nobody uses @file in target board files ;-)

> gcc/testsuite:
>   * gcc.misc-tests/outputs.exp: Skip @file -save-temps
>   tests if target test-framework has -L or -I options.

Thanks, this looks reasonable.

I suppose it might be possible to record the presence of -I and -L flags
separately, and adjust the @file tests so as to conditionally expect
extra files where they'd appear, but I don't see that this would
increase test coverage, so ISTM that would be a pointless effort.  So
we're good.  Thanks!

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC

2021-03-03 Thread Alexandre Oliva

On Mar  2, 2021, David Edelsohn  wrote:

> The procs are used in more than that one test.

Err, are you looking at the trunk?  In my tree, there are only two tests
that mention sqrt_insn, and it's two rather than one just because I
added the flag to another test myself, in a patch yet to be contributed.
Here's the patch that introduced the only use of the proc that is known
to me: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01204.html

As for the to-be-contributed patch, it adds the sqrt_insn feature option
to cdce3.c.  Like gimplefe-28.c, a compile-only front-end test that
depends on the existence of a sqrt insn to do its job, cdce3.c is a
compile-only test for a middle-end shrink-wrap optimization, that is
only performed when there is a sqrt insn.


While we could hide the bug/missing feature in add_options_for_sqrt_insn
by constraining check_effective_target_sqrt_insn, the result would be
just reduced test coverage for powerpc builds that defaulted to
-mno-powerpc-gpopt.  A downside without any upside.

Whereas if we fix the former proc to perform its documented function on
powerpc, namely return the options required to enable the fsqrt insn,
the end result is that, if the options do the job, we get those two
tests performed, whereas if they happen to be incompatible with other
settings to the point of raising an error, we (should?) skip the tests.
In my book, that's upsides without downsides.


Now, if the grounds for rejecting the patch was based on the
understanding that the proc was used elsewhere, with a different
function, I hope this demonstrates that this understanding was based on
incorrect information (that I may have hinted at myself; sorry if so),
and now that it's been corrected, I request the rejection of this
approach to be reconsidered.

I suppose the patch as-is might still be rejected, because it introduces
only -mpowerpc-gpopt, without -mno-soft-float.  Since in some
configurations it may take both of them to enable the fsqrt insn, would
an alternate version that returned both be deemed acceptable instead?

Maybe we also want to document in the proc that these added options can
only be used in compile tests, because there's no expectation or
guarantee that the so-enabled feature is available in the target under
test.  But AFAIK this has always been the case, and now I see and
confirm that a feature-option-adding call is always followed by a
feature-available-check call.


Thanks,

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC

2021-03-03 Thread Alexandre Oliva

On Mar  2, 2021, Segher Boessenkool  wrote:

> This is PR99352 now.

Thanks.  I've filed PR99371 for the add_options_for_sqrt_insn
incompleteness, and PR99372 for the gimplefe-28.c ICE.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC

2021-03-03 Thread Alexandre Oliva

On Mar  3, 2021, Segher Boessenkool  wrote:

> On Wed, Mar 03, 2021 at 04:45:23PM -0300, Alexandre Oliva wrote:
>> On Mar  2, 2021, Segher Boessenkool  wrote:
>> 
>> > This is PR99352 now.
>> 
>> Thanks.  I've filed PR99371 for the add_options_for_sqrt_insn
>> incompleteness,

> As I said before, I don't agree with that.

Maybe you'll change of mind if you try to make sense of why the proc
has, since its inception, added vfp options to enable sqrt on arm,
regardless of whether vfp is available with the processor being tested,
and realize this is not different from the case of powerpc.

> If a user disabled it, we should *not* reenable it, that reduces
> testing surface.

It's skipping the test, as the change you propose, that reduces testing
surface, when testing only a configuration that ends up skipping it.
Now, if you're testing multiple combinations, skipping or running does
*not* change the test surface.

So rejecting Eric's patch makes for a no-op in one case, and a reduction
in another.

Whereas installing it makes for a no-op in one case, and an increase in
another.  Please explain how you came to the conclusion that this
amounts to reducing hte test surface.  Something appears to be amiss in
that reasoning.

>> and PR99372 for the gimplefe-28.c ICE.

> This is fixed trivially by the PR99352 patch as far as I can see.

If your patch also deals with the ICE that appears with the options
named in PR99372, great.

> Please verify (I'll post it later today).

Please Cc me so I don't miss it, thanks,

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: add powerpc_vsx_ok requirement to undef-bool tests

2021-03-08 Thread Alexandre Oliva

On Feb 26, 2021, David Edelsohn  wrote:

> This patch is okay.

Thanks, I've finally checked it in.

>> From: Joel Brobecker 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/powerpc/undef-bool-2.c: Add
>> dg-require-effective-target powerpc_vsx_ok directive.
>> * g++.dg/ext/undef-bool-1.C: Add dg-require-effective-target
>> powerpc_vsx_ok directive.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Re: retain debug stmt order when moving to successors

2021-08-17 Thread Alexandre Oliva

On Jul 28, 2021, Richard Biener  wrote:

> OK.

Thanks, I've finally put this in.  Sorry about the delay.

> On Wed, Jul 28, 2021 at 10:12 AM Alexandre Oliva  wrote:
>> * tree-inline.c (maybe_move_debug_stmts_to_successors): Don't
>> reverse debug stmts.



-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Re: don't access cfun in dump_function_to_file

2021-08-17 Thread Alexandre Oliva

On Jul 28, 2021, Richard Biener  wrote:

> OK.

Thanks, I've finally put this in as well.

>> * tree-cfg.c (dump_function_to_file): Use fun, not cfun.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Re: move unreachable user labels to entry point

2021-08-17 Thread Alexandre Oliva

On Jul 13, 2021, Richard Biener  wrote:

> The right OMP region suggests something wrt correctness

Yeah, as Jakub wrote, we have to choose a block that's in the same
region the label belongs to.  The proposed patch doesn't change that, it
just uses the entry block instead of the previous block, if it satisfies
the requirement.

I found it made the logic simpler, slightly more efficient, and more
predictable, but I'm not attached to the change.  Since Jakub objected
to it, let's leave it alone.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Re: ipa-modref: merge flags when adding escape

2021-08-17 Thread Alexandre Oliva

On Aug 11, 2021, Jan Hubicka  wrote:

> This is improved patch

Thanks for the proper fix!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)

2021-08-20 Thread Alexandre Oliva



I've hit a bootstrap-debug error involving large subprograms in
gcc/ada/sem_ch12.adb.  I'm afraid I couldn't narrow it down to a
reasonable testcase.

thread1 made different decisions about a block containing a
builtin_eh_filter call because in one compilation, estimate_num_insns
found a cgraph_node for the builtin and could thus get to the
is_simple_builtin test, but in the other it didn't.  With different
insn counts, one stage jump-threaded and the other didn't, and the
resulting code diverged quite a bit.

The reason the builtin had a cgraph_node in one case but not the other
was that modref got a chance to analyze the builtin call when it was
the first stmt in the block, and that created the cgraph_node.
However, when it was preceded by debug stmts, the loop in
analyze_function was cut short after the first debug stmt, because the
summary so far was not useful.

This patch fixes both issues: skip debug stmts in the analyze_function
loop, so as to prevent them from affecting any decisions in the loop,
and enable the insn count estimator to get to the is_simple_builtin
test when a cgraph_node has not been created for the builtin.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* ipa-modref.c (analyze_function): Skip debug stmts.
* tree-inline.c (estimate_num_insn): Consider builtins even
without a cgraph_node.
---
 gcc/ipa-modref.c  |3 ++-
 gcc/tree-inline.c |4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index fafd804d4bae4..f0cddbb077aaa 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -2108,7 +2108,8 @@ analyze_function (function *f, bool ipa)
   FOR_EACH_BB_FN (bb, f)
 {
   gimple_stmt_iterator si;
-  for (si = gsi_after_labels (bb); !gsi_end_p (si); gsi_next (&si))
+  for (si = gsi_start_nondebug_after_labels_bb (bb);
+  !gsi_end_p (si); gsi_next_nondebug (&si))
{
  if (!analyze_stmt (summary, summary_lto,
 gsi_stmt (si), ipa, &recursive_calls)
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index d0e9f52d5f138..636130fe0019e 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4436,8 +4436,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
/* Do not special case builtins where we see the body.
   This just confuse inliner.  */
struct cgraph_node *node;
-   if (!(node = cgraph_node::get (decl))
-   || node->definition)
+   if ((node = cgraph_node::get (decl))
+   && node->definition)
  ;
/* For buitins that are likely expanded to nothing or
   inlined do not account operand costs.  */


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Re: fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)

2021-08-27 Thread Alexandre Oliva

On Aug 22, 2021, Jan Hubicka  wrote:

> OK, thanks for looking into this issue!

Thanks, I've finally installed it in the trunk.

> It seems that analye_stmt indeed does not skip debug stmts.  It is very
> odd we got so far without hitting build difference.

Indeed.  That got me thinking...  The comments state:

 If the statement cannot be analyzed (for any reason), the entire
 function cannot be analyzed by modref.

but the implementation also tests, for every statement:

  || ((!summary || !summary->useful_p (ecf_flags, false))
  && (!summary_lto
  || !summary_lto->useful_p (ecf_flags, false

which means that, if the first stmt of a block doesn't add useful
information to the summary, we give up.  Was this really the intent?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Re: [PATCH 2/4, revised patch applied] PowerPC: Rename functions for min, max, cmove

2020-09-15 Thread Alexandre Oliva

Hello, Mike,

On Sep 11, 2020, Michael Meissner via Gcc-patches  
wrote:

> +case SFmode:
> +case DFmode:

gcc110 (ppc64) in the build farm didn't like this.  The bootstrap
compiler barfs on these expressions, because of some constexpr issue I
haven't really looked into.

I'm testing this patch.  I'll check it in when I'm done.

use E_*mode instead of just *mode

From: Alexandre Oliva 

g++ 4.8.5 rejected cases with SFmode and DFmode, presumably due to
some bug in the constexpr implementation.

for  gcc/ChangeLog

* config/rs6000/rs6000.c (have_compare_and_set_mask): Use
E_*mode in cases.
---
 gcc/config/rs6000/rs6000.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6d0c550..b32fe91 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15190,8 +15190,8 @@ have_compare_and_set_mask (machine_mode mode)
 {
   switch (mode)
 {
-case SFmode:
-case DFmode:
+case E_SFmode:
+case E_DFmode:
   return TARGET_P9_MINMAX;

 default:

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

[PATCH] assorted improvements for fold_truth_andor_1

2020-09-25 Thread Alexandre Oliva



This patch introduces various improvements to the logic that merges
field compares.

Before the patch, we could merge:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

into something like:

  (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)

if both of A's fields live within the same alignment boundaries, and
so do B's, at the same relative positions.  Constants may be used
instead of the object B.

The initial goal of this patch was to enable such combinations when a
field crossed alignment boundaries, e.g. for packed types.  We can't
generally access such fields with a single memory access, so when we
come across such a compare, we will attempt to combine each access
separately.

Some merging opportunities were missed because of right-shifts,
compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
narrowing conversions, especially after earlier merges.  This patch
introduces handlers for several cases involving these.

Other merging opportunities were missed because of association.  The
existing logic would only succeed in merging a pair of consecutive
compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying
e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D.  I've generalized the
handling of the rightmost compare in the left-hand operand, going for
the leftmost compare in the right-hand operand, and then onto trying
to merge compares pairwise, one from each operand, even if they are
not consecutive, taking care to avoid merging operations with
intervening side effects, including volatile accesses.

When it is the second of a non-consecutive pair of compares that first
accesses a word, we may merge the first compare with part of the
second compare that refers to the same word, keeping the compare of
the remaining bits at the spot where the second compare used to be.

Handling compares with non-constant fields was somewhat generalized,
now handling non-adjacent fields.  When a field of one object crosses
an alignment boundary but the other doesn't, we issue the same load in
both compares; gimple optimizers will later turn it into a single
load, without our having to handle SAVE_EXPRs at this point.

The logic for issuing split loads and compares, and ordering them, is
now shared between all cases of compares with constants and with
another object.

The -Wno-error for toplev.o on rs6000 is because of toplev.c's:

  if ((flag_sanitize & SANITIZE_ADDRESS)
  && !FRAME_GROWS_DOWNWARD)

and rs6000.h's:

#define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0   \
  || (flag_sanitize & SANITIZE_ADDRESS) != 0)

The mutually exclusive conditions involving flag_sanitize are now
noticed and reported by fold-const.c's:

  warning (0,
   "% of mutually exclusive equal-tests"
   " is always 0");

This patch enables over 12k compare-merging opportunities that we used
to miss in a GCC bootstrap.

Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* fold-const.c (prepare_xor): New.
(decode_field_reference): Handle xor, shift, and narrowing
conversions.
(all_ones_mask_p): Remove.
(compute_split_boundary_from_align): New.
(build_split_load, reuse_split_load): New.
(fold_truth_andor_1): Add recursion to combine pairs of
non-neighboring compares.  Handle xor compared with zero.
Handle fields straddling across alignment boundaries.
Generalize handling of non-constant rhs.
(fold_truth_andor): Leave sub-expression handling to the
recursion above.
* config/rs6000/t-rs6000 (toplev.o-warn): Disable errors.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-1.c: New.
* gcc.dg/field-merge-2.c: New.
* gcc.dg/field-merge-3.c: New.
* gcc.dg/field-merge-4.c: New.
* gcc.dg/field-merge-5.c: New.
---
 gcc/config/rs6000/t-rs6000   |4 
 gcc/fold-const.c |  818 --
 gcc/testsuite/gcc.dg/field-merge-1.c |   64 +++
 gcc/testsuite/gcc.dg/field-merge-2.c |   31 +
 gcc/testsuite/gcc.dg/field-merge-3.c |   36 +
 gcc/testsuite/gcc.dg/field-merge-4.c |   40 ++
 gcc/testsuite/gcc.dg/field-merge-5.c |   40 ++
 7 files changed, 882 insertions(+), 151 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c

diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb572..516486d 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -52,6 +52,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: 
$(srcdir)/config/rs6000/genopt.sh \
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
$(

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Alexandre Oliva

On Sep 28, 2020, Richard Biener  wrote:

> On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva  wrote:

>> This patch introduces various improvements to the logic that merges
>> field compares.

> Sorry for throwing a wrench in here but doing this kind of transform
> during GENERIC folding is considered a bad thing.

Ugh.  Even tree-ifcombine itself relies on tree folding logic to perform
this and various similar sorts of optimizations.

Is the issue just that we don't want to perform this kind of
transformation while still in GENERIC, or that the logic must not even
be there?  I ask because it wouldn't be too hard to disable unwanted
cases of folding while in GENERIC, and extending it to follow SSA defs
so that ifcombine would those optimizations back at an acceptable spot.

Please help me understand what it is that makes this change as it is
undesirable, so that I can figure out how to do it in an acceptable way,
and justify the additional time and effort it will take.

I *think* ifcombine could even be extended so as to reuse the
separate-test logic I put in, by looking for non-immediate dominating
outer conditions for the inner condition.  A further modified version of
fold_truth_andor_1 could then be used to combine the separate tests.

As for reassoc...  It doesn't seem a good fit at all for reassociating
short-circuiting boolean operations, that normally get translated as
multiple basic blocks.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Alexandre Oliva

On Sep 29, 2020, Richard Biener  wrote:

> On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva  wrote:

>> On Sep 28, 2020, Richard Biener  wrote:

> ifcombine should stop using fold*, yeah

Wow, that's quite a lot of work for no expected improvement in codegen.
I don't expect to be able to justify such an undertaking :-(

> I also think it will not end up using the simplifications using loads.

Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
the inner block.  that won't do when the whole point is to merge loads
from memory.

That seems excessive.  Since we rule out any memory-changing side
effects, I suppose we could get away with checking for volatile operands
there.  Then, adding just a little SSA_DEF chasing, I believe I could
bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
without much difficulty, and then we could do away with at least that
part of fold_truth_andor.

> Specifically your patch seems to introduce splitting of loads
> at alignment boundaries

... when there's another compare involving a load from either side of
the crossed alignment boundary.  Even on arches that can do unaligned
loads, the result is no worse, and if there are multiple fields crossing
consecutive alignment boundaries, the codegen and performance difference
can be pretty significant.

>> I *think* ifcombine could even be extended so as to reuse the
>> separate-test logic I put in, by looking for non-immediate dominating
>> outer conditions for the inner condition.  A further modified version of
>> fold_truth_andor_1 could then be used to combine the separate tests.

> I think the structure of ifcombine doesn't exactly match what
> fold_truth_andor does

How so?  AFAICT ifcombine_ifandif deals exactly with the (gimplified
version of the) structure I described in the patch that started the
thread:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Alexandre Oliva

On Sep 29, 2020, Alexandre Oliva  wrote:

> Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
> the inner block.  that won't do when the whole point is to merge loads
> from memory.

> That seems excessive.  Since we rule out any memory-changing side
> effects, I suppose we could get away with checking for volatile operands
> there.  Then, adding just a little SSA_DEF chasing, I believe I could
> bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
> without much difficulty, and then we could do away with at least that
> part of fold_truth_andor.

Confirmed, a very ugly prototype seems to work!

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-10-02 Thread Alexandre Oliva

Here's what I got so far, passing regstrap except for field-merge-1.c,
that relies on combining non-adjacent compares, which I haven't
implemented yet.

I had to retain some parts of fold_truth_andor_1 to avoid regressions in
gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c.
gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required
turning sign tests into masks.

I moved the field-compare-merging bits of fold_truth_andor_1, and
auxiliary functions for it, into fold_truth_andor_maybe_separate, though
the ability to deal with non-adjacent compares is no longer used, at
least for the time being.

I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in
maybe_fold_and_comparisons, but...  the fact that it may return a tree
that isn't a gimple condexpr, e.g. when there is masking or shifting,
and that requires re-gimplification, suggests moving it to
tree-ssa-ifcombine.c where this can be dealt with, possibly even for
non-adjacent compares.  Right now, there is code in ifcombine to
re-gimplify results from the regular folder, which I've extended, for
the time being, to maybe_fold_and_comparisons results as well.

I've reworked decode_field_reference to seek and "substitute" SSA DEFs
in a way that avoids building new trees, but the test for
substitutability feels incomplete: there's nothing to ensure that vuses
aren't brought from much-earlier blocks, and that there couldn't
possibly be an intervening store.  I suspect I will have to give these
pieces a little more info for it to be able to tell whether the memory
accesses, if moved, would still get the same value.  Is there anything
easily reusable for this sort of test?

As for fields crossing alignment boundaries, the two-tests condition
currently returned by fold_truth_andor_maybe_separate, that ends up
gimplified into a new block, causes the chain of ifcombine optimizations
to stop, something that I'm yet to investigate.  My plan is to rearrange
ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and
to handle such resplit conditions by inserting one in each of the
preexisting blocks, to simplify the gimplification and in preparation
for combining non-adjacent compares, if we wish to do that.  Do we?  I
was convinced that it was a safe optimization, because of the absence of
intervening side effects and the undefinedness of intervening trapping
memory accesses, but combining later tests that access a word with
earlier tests that access the same word may cause intervening trapping
accesses to be skipped.  Is this a possibility we should enable or
disable on e.g. a per-language basis, avoid altogether, or ...  any
other suggestions?

Richi, is this sort of what you had in mind, or were you thinking of
something for match.pd or so?  Is match.pd even able to perform
cross-block multi-condition matches?

Any further advice as to my plans above?

Thanks,


diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0cc80ad..47b5419 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -125,7 +125,6 @@ static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
 static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, 
tree);
-static tree unextend (tree, int, int, tree);
 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
 static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
 static tree fold_binary_op_with_conditional_arg (location_t,
@@ -4353,7 +4352,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
is the original memory reference used to preserve the alias set of
the access.  */
 
-static tree
+tree
 make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
HOST_WIDE_INT bitsize, poly_int64 bitpos,
int unsignedp, int reversep)
@@ -4603,132 +4602,6 @@ optimize_bit_field_compare (location_t loc, enum 
tree_code code,
   return lhs;
 }
 
-/* Subroutine for fold_truth_andor_1: decode a field reference.
-
-   If EXP is a comparison reference, we return the innermost reference.
-
-   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
-   set to the starting bit number.
-
-   If the innermost field can be completely contained in a mode-sized
-   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
-
-   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
-   otherwise it is not changed.
-
-   *PUNSIGNEDP is set to the signedness of the field.
-
-   *PREVERSEP is set to the storage order of the field.
-
-   *PMASK is set to the mask used.  This is either contained in a
-   BIT_AND_EXPR or derived from the width of the field.
-
-   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
-
-   Return 0 if this is not a component reference or is one that we can't
-   do anything with.  */
-
-static tree
-decode_field_reference (location_t loc,

make sincos take type from intrinsic formal, not from result assignment

2020-10-05 Thread Alexandre Oliva



This is a first step towards enabling the sincos optimization in Ada.

The issue this patch solves is that sincos takes the type to be looked
up with mathfn_built_in from variables or temporaries in which results
of sin and cos are stored.  In Ada, sin and cos are declared in an
internal aux package, with uses thereof in a standard generic package,
which ensures that the types are not what mathfn_built_in expects.

Taking the type from the intrinsic's formal parameter, as in the
patch, ensures we get the type associated with the intrinsics,
regardless of the types used to declare and import them, so the lookup
of the CEXPI intrinsic for the same type finds it.


for  gcc/ChangeLog

* tree-ssa-math-opts.c (execute_cse_sincos_1): Take the type
for the cexpi/sincos intrinsic interface from formals of other
intrinsics.
---
 gcc/tree-ssa-math-opts.c |   26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 8423caa..31fd241 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1139,7 +1139,7 @@ execute_cse_sincos_1 (tree name)
 {
   gimple_stmt_iterator gsi;
   imm_use_iterator use_iter;
-  tree fndecl, res, type;
+  tree fndecl = NULL_TREE, res, type = NULL_TREE;
   gimple *def_stmt, *use_stmt, *stmt;
   int seen_cos = 0, seen_sin = 0, seen_cexpi = 0;
   auto_vec stmts;
@@ -1147,7 +1147,6 @@ execute_cse_sincos_1 (tree name)
   int i;
   bool cfg_changed = false;
 
-  type = TREE_TYPE (name);
   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
 {
   if (gimple_code (use_stmt) != GIMPLE_CALL
@@ -1169,15 +1168,34 @@ execute_cse_sincos_1 (tree name)
  break;
 
default:;
+ continue;
}
-}
 
+  tree t = TREE_VALUE (TYPE_ARG_TYPES (gimple_call_fntype (use_stmt)));
+  if (!type)
+   type = t;
+  else if (t != type)
+   {
+ if (!tree_nop_conversion_p (type, t))
+   return false;
+ /* If there is more than one type to choose from, prefer one
+that has a CEXPI builtin.  */
+ else if (!fndecl
+  && (fndecl = mathfn_built_in (t, BUILT_IN_CEXPI)))
+   type = t;
+   }
+}
   if (seen_cos + seen_sin + seen_cexpi <= 1)
 return false;
 
+  if (type != TREE_TYPE (name)
+  && !tree_nop_conversion_p (type, TREE_TYPE (name)))
+return false;
+
   /* Simply insert cexpi at the beginning of top_bb but not earlier than
  the name def statement.  */
-  fndecl = mathfn_built_in (type, BUILT_IN_CEXPI);
+  if (!fndecl)
+fndecl = mathfn_built_in (type, BUILT_IN_CEXPI);
   if (!fndecl)
 return false;
   stmt = gimple_build_call (fndecl, 1, name);


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Alexandre Oliva

On Oct  6, 2020, Richard Biener  wrote:

> On October 6, 2020 3:15:02 AM GMT+02:00, Alexandre Oliva  
> wrote:
>> 
>> This is a first step towards enabling the sincos optimization in Ada.
>> 
>> The issue this patch solves is that sincos takes the type to be looked
>> up with mathfn_built_in from variables or temporaries in which results
>> of sin and cos are stored.  In Ada, sin and cos are declared in an
>> internal aux package, with uses thereof in a standard generic package,
>> which ensures that the types are not what mathfn_built_in expects.

> But are they not compatible? 

They are, in that they use the same underlying representation, but
they're distinct types, not associated with the same TYPE_MAIN_VARIANT.

In Ada it's not unusual to have multiple floating-point types unrelated
to each other, even if they share identical underlying representation.
Each such type is a distinct type, in a similar sense that in C++ each
struct type holding a single double field is a distinct type.

Each such distinct FP type gets a different instantiation of
Ada.Numerics.Generic_Elementary_Functions, just as a C++ template taking
a parameter type would get different instantiations for such different
struct types.

Overall, it's a very confusing situation.  We use these alternate types
to declare the Sin and Cos functions imported from libm as intrinsics
(separate patch I've written very recently, yet to be contributed), and
they get matched to the libm intrinsics despite the distinct types, we
issue calls to them, passing variables of the alternate types without
explicit conversions, but when the sincos pass looks up the sincos/cexpi
intrinsic, it uses the alternate type taken from the variable and fails,
rather than the types declared as taken by the builtins.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-06 Thread Alexandre Oliva

On Oct  6, 2020, Richard Biener  wrote:

> OK, I see.  mathfn_built_in expects a type inter-operating with
> the C ABI types (float_type_node, double_type_node, etc.) where
> "inter-operating" means having the same main variant.

Yup.

> Now, I guess for the sincos pass we want to combine sinl + cosl
> to sincosl, independent on the case where the result would be
> assigned to a 'double' when 'double == long double'?

Sorry, I goofed in the patch description and misled you.

When looking at

  _d = sin (_s);

the sincos didn't take the type of _d, but that of _s.

I changed it so it takes the not from the actual passed to the
intrinsic, but from the formal in the intrinsic declaration.

If we had conversions of _s to different precisions, the optimization
wouldn't kick in: we'd have different actuals passed to sin and cos.
I'm not sure it makes much sense to try to turn e.g.

  _d1 = sin (_s);
  _t = (float) _s;
  _d2 = cosf (_t);

into:

  sincos (_s, &D1, &T);
  _d1 = D1;
  _td2 = T;
  _d2 = (float) _td2;

If someone goes through the trouble of computing sin and cos for the
same angle at different precisions, you might as well leave it alone.

> Now what about sinl + cos when 'double == long double'?

Now that might make more sense to optimize, but if we're going to do
such transformations, we might as well canonicalize the *l intrinsics to
the equivalent double versions (assuming long double and double have the
same precision), and then sincos will get it right.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-07 Thread Alexandre Oliva

On Oct  6, 2020, Richard Biener  wrote:

> So how about that mathfn_type helper instead of hard-wring this logic
> in sincos()?

Like this?

Regstrapped on x86_64-linux-gnu.  Ok to install?

I'm a little unhappy with the duplication of the CASE_MATHFN* sequence,
that ought to be kept in sync, , and considered turning that whole
sequence into a #define used in both places, but that would bloat the
patch further and make it less readable, so I figured I'd propose this
one first.  Please let me know if you agree this additional change would
make it better.

take type from intrinsic in sincos pass

From: Alexandre Oliva 

This is a first step towards enabling the sincos optimization in Ada.

The issue this patch solves is that sincos takes the type to be looked
up with mathfn_built_in from variables or temporaries passed as
arguments to SIN and COS intrinsics.  In Ada, different float types
may be used but, despite their representation equivalence, their
distinctness causes the optimization to be skipped, because they are
not the types that mathfn_built_in expects.

This patch introduces a function that maps intrinsics to the type
they're associated with, and uses that type, obtained from the
intrinsics used in calls to be optimized, to look up the correspoding
CEXPI intrinsic.

For the sake of defensive programming, when using the type obtained
from the intrinsic, it now checks that, if different types are found
for the used argument, or for other calls that use it, that the types
are interchangeable.

for  gcc/ChangeLog

* builtins.c (mathfn_built_in_type): New.
* builtins.h (mathfn_built_in_type): Declare.
* tree-ssa-math-opts.c (execute_cse_sincos_1): Use it to
obtain the type expected by the intrinsic.
---
 gcc/builtins.c   |  147 ++
 gcc/builtins.h   |1 
 gcc/tree-ssa-math-opts.c |   17 -
 3 files changed, 162 insertions(+), 3 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index f91266e4..5649242 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2160,6 +2160,7 @@ mathfn_built_in_2 (tree type, combined_fn fn)

   switch (fn)
 {
+  /* Copied to mathfn_built_in_type, please keep in sync.  */
 CASE_MATHFN (ACOS)
 CASE_MATHFN (ACOSH)
 CASE_MATHFN (ASIN)
@@ -2278,6 +2279,10 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 return END_BUILTINS;
 }

+#undef CASE_MATHFN
+#undef CASE_MATHFN_FLOATN
+#undef CASE_MATHFN_REENT
+
 /* Return mathematic function equivalent to FN but operating directly on TYPE,
if available.  If IMPLICIT_P is true use the implicit builtin declaration,
otherwise use the explicit declaration.  If we can't do the conversion,
@@ -2313,6 +2318,148 @@ mathfn_built_in (tree type, enum built_in_function fn)
   return mathfn_built_in_1 (type, as_combined_fn (fn), /*implicit=*/ 1);
 }

+/* Return the type associated with a built in function, i.e., the one
+   to be passed to mathfn_built_in to get the type-specific
+   function.  */
+
+tree
+mathfn_built_in_type (combined_fn fn)
+{
+#define CASE_MATHFN(MATHFN)\
+  case BUILT_IN_##MATHFN:  \
+return double_type_node;   \
+  case BUILT_IN_##MATHFN##F:   \
+return float_type_node;\
+  case BUILT_IN_##MATHFN##L:   \
+return long_double_type_node;
+
+#define CASE_MATHFN_FLOATN(MATHFN) \
+  CASE_MATHFN(MATHFN)  \
+  case BUILT_IN_##MATHFN##F16: \
+return float16_type_node;  \
+  case BUILT_IN_##MATHFN##F32: \
+return float32_type_node;  \
+  case BUILT_IN_##MATHFN##F64: \
+return float64_type_node;  \
+  case BUILT_IN_##MATHFN##F128:\
+return float128_type_node; \
+  case BUILT_IN_##MATHFN##F32X:\
+return float32x_type_node; \
+  case BUILT_IN_##MATHFN##F64X:\
+return float64x_type_node; \
+  case BUILT_IN_##MATHFN##F128X:   \
+return float128x_type_node;
+
+/* Similar to above, but appends _R after any F/L suffix.  */
+#define CASE_MATHFN_REENT(MATHFN) \
+  case BUILT_IN_##MATHFN##_R:  \
+return double_type_node;   \
+  case BUILT_IN_##MATHFN##F_R: \
+return float_type_node;\
+  case BUILT_IN_##MATHFN##L_R: \
+return long_double_type_node;
+
+  switch (fn)
+{
+  /* Copied from mathfn_built_in_2, please keep in sync.  */
+CASE_MATHFN (ACOS)
+CASE_MATHFN (ACOSH)
+CASE_MATHFN (ASIN)
+CASE_MATHFN (ASINH)
+CASE_MATHFN (ATAN)
+CASE_MATHFN (ATAN2)
+CASE_MATHFN (ATANH)
+CASE_MATHFN (CBRT)
+CASE_MATHFN_FLOATN (CEIL)
+CASE_MATHFN (CEXP

Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-08 Thread Alexandre Oliva

On Oct  8, 2020, Richard Biener  wrote:

> OK with a minor nit, see below

>> I'm a little unhappy with the duplication of the CASE_MATHFN* sequence,
>> that ought to be kept in sync, , and considered turning that whole
>> sequence into a #define used in both places, but that would bloat the
>> patch further and make it less readable, so I figured I'd propose this
>> one first.  Please let me know if you agree this additional change would
>> make it better.

> Yeah, I guess so.

Thanks, I folded that change into the patch.  Bloated version below :-)

> use !types_compatible_p (type, t) here if we don't want to go
> with the stricter type == t (both yours and types_compatible_p
> would treat long double and double the same in case they
> are mapped to the same FP mode).  types_compatible_p
> is the appropriate GIMPLE predicate here.

Thanks, fixed, regstrapped.

Here's what I'm installing.


take type from intrinsic in sincos pass

From: Alexandre Oliva 

This is a first step towards enabling the sincos optimization in Ada.

The issue this patch solves is that sincos takes the type to be looked
up with mathfn_built_in from variables or temporaries passed as
arguments to SIN and COS intrinsics.  In Ada, different float types
may be used but, despite their representation equivalence, their
distinctness causes the optimization to be skipped, because they are
not the types that mathfn_built_in expects.

This patch introduces a function that maps intrinsics to the type
they're associated with, and uses that type, obtained from the
intrinsics used in calls to be optimized, to look up the correspoding
CEXPI intrinsic.

For the sake of defensive programming, when using the type obtained
from the intrinsic, it now checks that, if different types are found
for the used argument, or for other calls that use it, that the types
are interchangeable.


for  gcc/ChangeLog

* builtins.c (mathfn_built_in_type): New.
* builtins.h (mathfn_built_in_type): Declare.
* tree-ssa-math-opts.c (execute_cse_sincos_1): Use it to
obtain the type expected by the intrinsic.
---
 gcc/builtins.c   |  236 +-
 gcc/builtins.h   |1 
 gcc/tree-ssa-math-opts.c |   17 +++
 3 files changed, 164 insertions(+), 90 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index f91266e4..284926f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2160,95 +2160,98 @@ mathfn_built_in_2 (tree type, combined_fn fn)
 
   switch (fn)
 {
-CASE_MATHFN (ACOS)
-CASE_MATHFN (ACOSH)
-CASE_MATHFN (ASIN)
-CASE_MATHFN (ASINH)
-CASE_MATHFN (ATAN)
-CASE_MATHFN (ATAN2)
-CASE_MATHFN (ATANH)
-CASE_MATHFN (CBRT)
-CASE_MATHFN_FLOATN (CEIL)
-CASE_MATHFN (CEXPI)
-CASE_MATHFN_FLOATN (COPYSIGN)
-CASE_MATHFN (COS)
-CASE_MATHFN (COSH)
-CASE_MATHFN (DREM)
-CASE_MATHFN (ERF)
-CASE_MATHFN (ERFC)
-CASE_MATHFN (EXP)
-CASE_MATHFN (EXP10)
-CASE_MATHFN (EXP2)
-CASE_MATHFN (EXPM1)
-CASE_MATHFN (FABS)
-CASE_MATHFN (FDIM)
-CASE_MATHFN_FLOATN (FLOOR)
-CASE_MATHFN_FLOATN (FMA)
-CASE_MATHFN_FLOATN (FMAX)
-CASE_MATHFN_FLOATN (FMIN)
-CASE_MATHFN (FMOD)
-CASE_MATHFN (FREXP)
-CASE_MATHFN (GAMMA)
-CASE_MATHFN_REENT (GAMMA) /* GAMMA_R */
-CASE_MATHFN (HUGE_VAL)
-CASE_MATHFN (HYPOT)
-CASE_MATHFN (ILOGB)
-CASE_MATHFN (ICEIL)
-CASE_MATHFN (IFLOOR)
-CASE_MATHFN (INF)
-CASE_MATHFN (IRINT)
-CASE_MATHFN (IROUND)
-CASE_MATHFN (ISINF)
-CASE_MATHFN (J0)
-CASE_MATHFN (J1)
-CASE_MATHFN (JN)
-CASE_MATHFN (LCEIL)
-CASE_MATHFN (LDEXP)
-CASE_MATHFN (LFLOOR)
-CASE_MATHFN (LGAMMA)
-CASE_MATHFN_REENT (LGAMMA) /* LGAMMA_R */
-CASE_MATHFN (LLCEIL)
-CASE_MATHFN (LLFLOOR)
-CASE_MATHFN (LLRINT)
-CASE_MATHFN (LLROUND)
-CASE_MATHFN (LOG)
-CASE_MATHFN (LOG10)
-CASE_MATHFN (LOG1P)
-CASE_MATHFN (LOG2)
-CASE_MATHFN (LOGB)
-CASE_MATHFN (LRINT)
-CASE_MATHFN (LROUND)
-CASE_MATHFN (MODF)
-CASE_MATHFN (NAN)
-CASE_MATHFN (NANS)
-CASE_MATHFN_FLOATN (NEARBYINT)
-CASE_MATHFN (NEXTAFTER)
-CASE_MATHFN (NEXTTOWARD)
-CASE_MATHFN (POW)
-CASE_MATHFN (POWI)
-CASE_MATHFN (POW10)
-CASE_MATHFN (REMAINDER)
-CASE_MATHFN (REMQUO)
-CASE_MATHFN_FLOATN (RINT)
-CASE_MATHFN_FLOATN (ROUND)
-CASE_MATHFN_FLOATN (ROUNDEVEN)
-CASE_MATHFN (SCALB)
-CASE_MATHFN (SCALBLN)
-CASE_MATHFN (SCALBN)
-CASE_MATHFN (SIGNBIT)
-CASE_MATHFN (SIGNIFICAND)
-CASE_MATHFN (SIN)
-CASE_MATHFN (SINCOS)
-CASE_MATHFN (SINH)
-CASE_MATHFN_FLOATN (SQRT)
-CASE_MATHFN (TAN)
-CASE_MATHFN (TANH)
-CASE_MATHFN (TGAMMA)
-CASE_MATHFN_FLOATN (TRUNC)
-CASE_MATHFN (Y0)
-CASE_MATHFN (Y1)
+#define SEQ_OF_CASE_MATHFN

import elementary functions as intrinsics

2020-10-11 Thread Alexandre Oliva

Intrinsic, Sqrt, "sqrt");
pragma Pure_Function (Sqrt);
 
function Log (X : Double) return Double;
-   pragma Import (C, Log, "log");
+   pragma Import (Intrinsic, Log, "log");
pragma Pure_Function (Log);
 
function Acos (X : Double) return Double;
-   pragma Import (C, Acos, "acos");
+   pragma Import (Intrinsic, Acos, "acos");
pragma Pure_Function (Acos);
 
function Asin (X : Double) return Double;
-   pragma Import (C, Asin, "asin");
+   pragma Import (Intrinsic, Asin, "asin");
pragma Pure_Function (Asin);
 
function Atan (X : Double) return Double;
-   pragma Import (C, Atan, "atan");
+   pragma Import (Intrinsic, Atan, "atan");
pragma Pure_Function (Atan);
 
function Sinh (X : Double) return Double;
-   pragma Import (C, Sinh, "sinh");
+   pragma Import (Intrinsic, Sinh, "sinh");
pragma Pure_Function (Sinh);
 
function Cosh (X : Double) return Double;
-   pragma Import (C, Cosh, "cosh");
+   pragma Import (Intrinsic, Cosh, "cosh");
pragma Pure_Function (Cosh);
 
function Tanh (X : Double) return Double;
-   pragma Import (C, Tanh, "tanh");
+   pragma Import (Intrinsic, Tanh, "tanh");
pragma Pure_Function (Tanh);
 
function Pow (X, Y : Double) return Double;
-   pragma Import (C, Pow, "pow");
+   pragma Import (Intrinsic, Pow, "pow");
pragma Pure_Function (Pow);
 
 end Ada.Numerics.Aux;


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

[FYI] fix cgraph comment

2019-10-25 Thread Alexandre Oliva

This comment cut&pasto fix was split out of another patch I'm about to
contribute, as the current version of the patch no longer touches cgraph
data structures.

I'm checking it in as obvious.


From: Eric Botcazou 

for  gcc/ChangeLog

* cgraph.c (cgraph_node::rtl_info): Fix cut&pasto in comment.
* cgraph.h (cgraph_node::rtl_info): Likewise.
---
 gcc/cgraph.c |2 +-
 gcc/cgraph.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 8b752d8380981..e83bf2001e257 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1846,7 +1846,7 @@ cgraph_node::local_info (tree decl)
   return &node->ultimate_alias_target ()->local;
 }
 
-/* Return local info for the compiled function.  */
+/* Return RTL info for the compiled function.  */
 
 cgraph_rtl_info *
 cgraph_node::rtl_info (const_tree decl)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 733d616fb8c3a..a7f357f0b5c07 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1381,7 +1381,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
   /* Return local info for the compiled function.  */
   static cgraph_local_info *local_info (tree decl);
 
-  /* Return local info for the compiled function.  */
+  /* Return RTL info for the compiled function.  */
   static struct cgraph_rtl_info *rtl_info (const_tree);
 
   /* Return the cgraph node that has ASMNAME for its DECL_ASSEMBLER_NAME.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

introduce -fcallgraph-info option

2019-10-25 Thread Alexandre Oliva

This was first submitted many years ago
https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html

The command line option -fcallgraph-info is added and makes the
compiler generate another output file (xxx.ci) for each compilation
unit, which is a valid VCG file (you can launch your favorite VCG
viewer on it unmodified) and contains the "final" callgraph of the
unit.  "final" is a bit of a misnomer as this is actually the
callgraph at RTL expansion time, but since most high-level
optimizations are done at the Tree level and RTL doesn't usually
fiddle with calls, it's final in almost all cases.  Moreover, the
nodes can be decorated with additional info: -fcallgraph-info=su adds
stack usage info and -fcallgraph-info=da dynamic allocation info.

Compared with the earlier version, this patch does not modify cgraph,
and instead adds the required information next to the stage usage
function data structure, so we only hold one of those at at time.  I've
switched to vecs from linked lists, for more compact edges and dynamic
allocation annotations, and arranged for them to be released as soon as
we've printed out the information.  I have NOT changed the file format,
because existing tools such as gnatstack consume the current format.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog
>From  Eric Botcazou  , Alexandre Oliva  
>

* common.opt (-fcallgraph-info[=]): New option.
* doc/invoke.texi (Debugging options): Document it.
* opts.c (common_handle_option): Handle it.
* builtins.c (expand_builtin_alloca): Record allocation if
-fcallgraph-info=da.
* calls.c (expand_call): If -fcallgraph-info, record the call.
(emit_library_call_value_1): Likewise.
* flag-types.h (enum callgraph_info_type): New type.
* explow.c: Include stringpool.h.
(set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol.
* function.c (allocate_stack_usage_info): New.
(allocate_struct_function): Call it for -fcallgraph-info.
(prepare_function_start): Call it otherwise.
(rest_of_handle_thread_prologue_and_epilogue): Release callees
and dallocs after output_stack_usage.
(record_final_call, record_dynamic_alloc): New.
* function.h (struct callee, struct dalloc): New.
(struct stack_usage): Add callees and dallocs.
(record_final_call, record_dynamic_alloc): Declare.
* gimplify.c (gimplify_decl_expr): Record dynamically-allocated
object if -fcallgraph-info=da.
* optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL.
* print-tree.h (print_decl_identifier): Declare.
(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
* print-tree.c: Include print-tree.h.
(print_decl_identifier): New function.
* toplev.c: Include print-tree.h.
(callgraph_info_file): New global variable.
(callgraph_info_indirect_emitted): Likewise.
(output_stack_usage): Rename to...
(output_stack_usage_1): ... this.  Make it static, add cf
parameter.  If -fcallgraph-info=su, print stack usage to cf.
If -fstack-usage, use print_decl_identifier for
pretty-printing.
(INDIRECT_CALL_NAME): New.
(dump_final_indirect_call_node_vcg): New.
(dump_final_callee_vcg, dump_final_node_vcg): New.
(output_stack_usage): New.
(lang_dependent_init): Open and start file if
-fcallgraph-info.
(finalize): If callgraph_info_file is not null, finish it,
close it, and reset callgraph info state.

for  gcc/ada/ChangeLog

* gcc-interface/misc.c (callgraph_info_file): Delete.
---
 gcc/ada/gcc-interface/misc.c |3 -
 gcc/builtins.c   |4 +
 gcc/calls.c  |6 +
 gcc/common.opt   |8 ++
 gcc/doc/invoke.texi  |   17 
 gcc/explow.c |5 +
 gcc/flag-types.h |   16 
 gcc/function.c   |   63 ++--
 gcc/function.h   |   25 ++
 gcc/gimplify.c   |4 +
 gcc/optabs-libfuncs.c|4 -
 gcc/opts.c   |   26 ++
 gcc/output.h |2 
 gcc/print-tree.c |   76 +++
 gcc/print-tree.h |4 +
 gcc/toplev.c |  169 ++
 16 files changed, 381 insertions(+), 51 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 4abd4d5708a54..d68b37384ff7f 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -54,9 +54,6 @@
 #include "ada-tree.h"
 #include "gigi.h"
 
-/* This symbol needs to be defined for the front-end.  */
-void *callgraph_info_file = NULL;
-
 /* Command-line argc and argv.  These variables are global since they are
imported in back

Re: introduce -fcallgraph-info option

2019-10-26 Thread Alexandre Oliva

Hi, Richi,

On Oct 26, 2019, Richard Biener  wrote:

> How does it relate to the LTO-dump utility we have now which can in
> theory provide a more complete view?

Erhm...  Not at all, AFAICT.  The only vague resemblance I see is in the
presence of the word "callgraph" in the description of what both can do,
but even that's not quite the same callgraph, besides the different
format.

E.g., the reason we gather expanded calls rather than just use
cgraph_edges is that the latter would dump several "calls" that are
builtins expanded internally by the compiler, and would NOT dump other
ops that are expanded as (lib)calls.  In order to get an accurate
assessment of stack size requirements, the presence of the builtins
could be confusing but not so much trouble, but the absence of libcalls
would underestimate the potential max total stack use by a symbol.

> Maybe some infrastructure can be
> shared here (the actual dumping of the cgraph?)

You mean the one-liner loop in cgraph_node::dump_graphviz, called by
lto-dump to dump the cgraph?  That doesn't really seem worth sharing;
more so considering we dump edges in a quite different format, and not
just the edges.  In this different format expected by gnatstack, we also
dump the nodes, and include information in the labels of the nodes, such
as their original spelling and location, as well as stack use:

node: { title: "_ada_p" label: "P\np.adb:1:1\n48 bytes (static)" }

and dynamic stack allocations (alloca and vlas):

node: { title: "p.adb:p__u" label: "u\np.adb:21:3\n2 dynamic objects\n rt 
p.adb:23:5\n ri p.adb:24:5" }

and though edges to libcalls may carry just as little info as a graphviz
"from" -> "to" edge:

edge: { sourcename: "add" targetname: "__addvsi3" }

those between user symbols carry location info as well:

edge: { sourcename: "p__s" targetname: "p.adb:p__v" label: "p.adb:46:5" }

So I'm afraid I don't see anything that could be usefully factored out.
Do you?

Thanks,

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: introduce -fcallgraph-info option

2019-10-27 Thread Alexandre Oliva

On Oct 26, 2019, Alexandre Oliva  wrote:

> E.g., the reason we gather expanded calls rather than just use
> cgraph_edges is that the latter would dump several "calls" that are
> builtins expanded internally by the compiler, and would NOT dump other
> ops that are expanded as (lib)calls.

It occurred to me that we could reuse most cgraph edges and avoid
duplicating them in the callees vec, namely those that aren't builtins
or libcalls.  Those that are builtins might or might not become actual
calls, so we disregard the cgraph_edges and record them in the vec
instead.  Those that are libcalls (builtins in a different sense)
introduced during expand are not even present in the cgraph edges, so we
record them in the vec as well.  Here's the patch that implements this.

Regstrapped on x86_64-linux-gnu.  Ok to install?

for  gcc/ChangeLog
>From  Eric Botcazou  , Alexandre Oliva  
>

* common.opt (-fcallgraph-info[=]): New option.
* doc/invoke.texi (Debugging options): Document it.
* opts.c (common_handle_option): Handle it.
* builtins.c (expand_builtin_alloca): Record allocation if
-fcallgraph-info=da.
* calls.c (expand_call): If -fcallgraph-info, record the call.
(emit_library_call_value_1): Likewise.
* flag-types.h (enum callgraph_info_type): New type.
* explow.c: Include stringpool.h.
(set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol.
* function.c (allocate_stack_usage_info): New.
(allocate_struct_function): Call it for -fcallgraph-info.
(prepare_function_start): Call it otherwise.
(record_final_call, record_dynamic_alloc): New.
* function.h (struct callinfo_callee): New.
(CALLEE_FROM_CGRAPH_P): New.
(struct callinfo_dalloc): New.
(struct stack_usage): Add callees and dallocs.
(record_final_call, record_dynamic_alloc): Declare.
* gimplify.c (gimplify_decl_expr): Record dynamically-allocated
object if -fcallgraph-info=da.
* optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL.
* print-tree.h (print_decl_identifier): Declare.
(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
* print-tree.c: Include print-tree.h.
(print_decl_identifier): New function.
* toplev.c: Include print-tree.h.
(callgraph_info_file): New global variable.
(callgraph_info_indirect_emitted): Likewise.
(output_stack_usage): Rename to...
(output_stack_usage_1): ... this.  Make it static, add cf
parameter.  If -fcallgraph-info=su, print stack usage to cf.
If -fstack-usage, use print_decl_identifier for
pretty-printing.
(INDIRECT_CALL_NAME): New.
(dump_final_indirect_call_node_vcg): New.
(dump_final_callee_vcg, dump_final_node_vcg): New.
(output_stack_usage): New.
(lang_dependent_init): Open and start file if
-fcallgraph-info.
(finalize): If callgraph_info_file is not null, finish it,
close it, and reset callgraph info state.

for  gcc/ada/ChangeLog

* gcc-interface/misc.c (callgraph_info_file): Delete.
---
 gcc/ada/gcc-interface/misc.c |3 -
 gcc/builtins.c   |4 +
 gcc/calls.c  |6 +
 gcc/common.opt   |8 ++
 gcc/doc/invoke.texi  |   17 
 gcc/explow.c |5 +
 gcc/flag-types.h |   16 
 gcc/function.c   |   59 --
 gcc/function.h   |   30 +++
 gcc/gimplify.c   |4 +
 gcc/optabs-libfuncs.c|4 -
 gcc/opts.c   |   26 ++
 gcc/output.h |2 
 gcc/print-tree.c |   76 ++
 gcc/print-tree.h |4 +
 gcc/toplev.c |  179 ++
 16 files changed, 393 insertions(+), 50 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 4abd4d5708a54..d68b37384ff7f 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -54,9 +54,6 @@
 #include "ada-tree.h"
 #include "gigi.h"
 
-/* This symbol needs to be defined for the front-end.  */
-void *callgraph_info_file = NULL;
-
 /* Command-line argc and argv.  These variables are global since they are
imported in back_end.adb.  */
 unsigned int save_argc;
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 5d811f113c907..bd302383377ba 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5406,6 +5406,10 @@ expand_builtin_alloca (tree exp)
 = allocate_dynamic_stack_space (op0, 0, align, max_size, alloca_for_var);
   result = convert_memory_address (ptr_mode, result);
 
+  /* Dynamic allocations for variables are recorded during gimplification.  */
+  if (!alloca_for_var && (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_

Re: introduce -fcallgraph-info option

2019-10-30 Thread Alexandre Oliva

On Oct 28, 2019, Joseph Myers  wrote:

> We have a test in the testsuite that all option help text consistently 
> ends with '.' (see gcc.misc-tests/help.exp).  I'd have expected these 
> options, lacking that '.', to cause that test to fail.

Thanks.  I've fixed the common.opt changes, and I'll submit momentarily
a patch for help.exp to extend it to cover --help=common output as well.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: introduce -fcallgraph-info option

2019-10-30 Thread Alexandre Oliva

On Oct 28, 2019, Joseph Myers  wrote:

> We have a test in the testsuite that all option help text consistently 
> ends with '.' (see gcc.misc-tests/help.exp).  I'd have expected these 
> options, lacking that '.', to cause that test to fail.

Here's the patch.  Tested on x86_64-linux-gnu.  Ok to install?


Test --help=common for full sentences

The portion of help.exp that checks that help output contains full
sentences failed to cover common options.


for  gcc/testsuite/ChangeLog

* gcc.misc-tests/help.exp: Test --help=common for full sentences.
---
 gcc/testsuite/gcc.misc-tests/help.exp |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.misc-tests/help.exp 
b/gcc/testsuite/gcc.misc-tests/help.exp
index b8a09fc..4bb359f 100644
--- a/gcc/testsuite/gcc.misc-tests/help.exp
+++ b/gcc/testsuite/gcc.misc-tests/help.exp
@@ -146,8 +146,7 @@ check_for_options c "--help=joined,undocumented" "" "" ""
 # find the source a failure.
 
 foreach cls { "ada" "c" "c++" "d" "fortran" "go" \
-   "optimizers" "param" "target" "warnings" } {
-
+   "common" "optimizers" "param" "target" "warnings" } {
 check_for_options c "--help=$cls" "" "^ +-.*\[^:.\]$" ""
 }
 


-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: introduce -fcallgraph-info option

2019-10-30 Thread Alexandre Oliva

On Oct 28, 2019, Richard Biener  wrote:

> I guess you need to elaborate on 'per-file'.  With LTO as far as I
> understand you'll get the graph per LTRANS unit (did you check
> where the output is generated?).

Yeah, I guess this was not designed with LTO in mind; it probably even
predates LTO.  We get per-LTRANS unit indeed, and the output is
generated in the temporary dir, which is not desirable behavior for
sure.  The outputs seem to be usable if you can figure out what they
are, but I'm not sure how to go about combining the multiple .ci files,
or how to name the combined output, since it's not generally expected
that these files will be created at link time, rather than at compile
time.  I'll bring this up internally and come back with some
improvement.

> Is this mainly a debugging tool or does it serve a different purpose?

It feeds gnatstack, that's a tool to compute max stack depth and perform
other call graph analyzes.  I don't think of it as a debugging tool.

https://www.adacore.com/gnatpro/toolsuite/gnatstack
http://docs.adacore.com/live/wave/gnatstack/html/gnatstack_ug/index.html

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: introduce -fcallgraph-info option

2019-11-02 Thread Alexandre Oliva

On Oct 30, 2019, Richard Biener  wrote:

> One way of operation would be to
> generate the graph during the compilation step

Stack usage is only computed during prologue/epilogue generation in RTL.

> Additional arguments to -fcallgraph-info might be used to direct the
> output to a specific directory as well.

I've adjusted open_auxiliary_file for LTO recompilation auxiliary files
to be placed in the same location as the specified executable output
name, and noted that in the documentation.

I've also added a bitmap to output nodes for externals, accidentally
dropped in the transition to incremental generation of the .ci file.

Regstrapped on x86_64-linux-gnu.  Ok to install?


introduce -fcallgraph-info option

This was first submitted many years ago
https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html

The command line option -fcallgraph-info is added and makes the
compiler generate another output file (xxx.ci) for each compilation
unit (or LTO partitoin), which is a valid VCG file (you can launch
your favorite VCG viewer on it unmodified) and contains the "final"
callgraph of the unit.  "final" is a bit of a misnomer as this is
actually the callgraph at RTL expansion time, but since most
high-level optimizations are done at the Tree level and RTL doesn't
usually fiddle with calls, it's final in almost all cases.  Moreover,
the nodes can be decorated with additional info: -fcallgraph-info=su
adds stack usage info and -fcallgraph-info=da dynamic allocation info.


for  gcc/ChangeLog
>From  Eric Botcazou  , Alexandre Oliva  
>

* common.opt (-fcallgraph-info[=]): New option.
* doc/invoke.texi (Developer options): Document it.
* opts.c (common_handle_option): Handle it.
* builtins.c (expand_builtin_alloca): Record allocation if
-fcallgraph-info=da.
* calls.c (expand_call): If -fcallgraph-info, record the call.
(emit_library_call_value_1): Likewise.
* flag-types.h (enum callgraph_info_type): New type.
* explow.c: Include stringpool.h.
(set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol.
* function.c (allocate_stack_usage_info): New.
(allocate_struct_function): Call it for -fcallgraph-info.
(prepare_function_start): Call it otherwise.
(record_final_call, record_dynamic_alloc): New.
* function.h (struct callinfo_callee): New.
(CALLEE_FROM_CGRAPH_P): New.
(struct callinfo_dalloc): New.
(struct stack_usage): Add callees and dallocs.
(record_final_call, record_dynamic_alloc): Declare.
* gimplify.c (gimplify_decl_expr): Record dynamically-allocated
object if -fcallgraph-info=da.
* optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL.
* print-tree.h (print_decl_identifier): Declare.
(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
* print-tree.c: Include print-tree.h.
(print_decl_identifier): New function.
* toplev.c: Include print-tree.h.
(callgraph_info_file): New global variable.
(callgraph_info_external_printed): Likewise.
(open_auxiliary_file): Use dump_base_name for LTO partitions.
(output_stack_usage): Rename to...
(output_stack_usage_1): ... this.  Make it static, add cf
parameter.  If -fcallgraph-info=su, print stack usage to cf.
If -fstack-usage, use print_decl_identifier for
pretty-printing.
(INDIRECT_CALL_NAME): New.
(dump_final_node_vcg_start): New.
(dump_final_callee_vcg, dump_final_node_vcg): New.
(output_stack_usage): New.
(lang_dependent_init): Open and start file if
-fcallgraph-info.  Allocated callgraph_info_external_printed.
(finalize): If callgraph_info_file is not null, finish it,
close it, and release callgraph_info_external_printed.

for  gcc/ada/ChangeLog

* gcc-interface/misc.c (callgraph_info_file): Delete.
---
 gcc/ada/gcc-interface/misc.c |3 -
 gcc/builtins.c   |4 +
 gcc/calls.c  |6 +
 gcc/common.opt   |8 ++
 gcc/doc/invoke.texi  |   22 +
 gcc/explow.c |5 +
 gcc/flag-types.h |   16 
 gcc/function.c   |   59 -
 gcc/function.h   |   30 +++
 gcc/gimplify.c   |4 +
 gcc/optabs-libfuncs.c|4 -
 gcc/opts.c   |   26 ++
 gcc/output.h |2 
 gcc/print-tree.c |   76 +
 gcc/print-tree.h |4 +
 gcc/toplev.c |  186 ++
 16 files changed, 402 insertions(+), 53 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 4abd4d5..d68b373 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c

[FYI] pass --enable-obsolete to build configure

2019-11-02 Thread Alexandre Oliva

e} tm-preds.h"
 tm_d_file="${tm_d_file} defaults.h"
@@ -18868,7 +18887,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18871 "configure"
+#line 18890 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18974,7 +18993,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18977 "configure"
+#line 18996 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 8df849f4..71ae18e0 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1830,6 +1830,7 @@ AC_SUBST(extra_opt_files)
 if test x$host = x$build
 then
build_auto=auto-host.h
+   HAVE_AUTO_BUILD='# '
 else
# We create a subdir, then run autoconf in the subdir.
# To prevent recursion we set host and build for the new
@@ -1852,7 +1853,10 @@ else
GMPINC="" CPPFLAGS="${CPPFLAGS} -DGENERATOR_FILE" \
${realsrcdir}/configure \
--enable-languages=${enable_languages-all} \
-   --target=$target_alias --host=$build_alias --build=$build_alias
+   ${enable_obsolete+--enable-obsolete="$enable_obsolete"} \
+   
${enable_option_checking+--enable-option-checking="$enable_option_checking"} \
+   --target=$target_alias --host=$build_alias \
+   --build=$build_alias || exit # retaining $tempdir
 
# We just finished tests for the build machine, so rename
# the file auto-build.h in the gcc directory.
@@ -1860,8 +1864,10 @@ else
cd ..
rm -rf $tempdir
build_auto=auto-build.h
+   HAVE_AUTO_BUILD=
 fi
 AC_SUBST(build_subdir)
+AC_SUBST(HAVE_AUTO_BUILD)
 
 tm_file="${tm_file} defaults.h"
 tm_p_file="${tm_p_file} tm-preds.h"

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: [FYI] pass --enable-obsolete to build configure

2019-11-04 Thread Alexandre Oliva

\
+   --build=$build_alias || exit # retaining $tempdir
 
# We just finished tests for the build machine, so rename
# the file auto-build.h in the gcc directory.
@@ -12238,9 +12243,11 @@ else
cd ..
rm -rf $tempdir
build_auto=auto-build.h
+   HAVE_AUTO_BUILD=
 fi
 
 
+
 tm_file="${tm_file} defaults.h"
 tm_p_file="${tm_p_file} tm-preds.h"
 tm_d_file="${tm_d_file} defaults.h"
@@ -18926,7 +18933,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18929 "configure"
+#line 18936 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19032,7 +19039,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19035 "configure"
+#line 19042 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0fe4e54..1a0d682 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1838,6 +1838,7 @@ AC_SUBST(extra_opt_files)
 if test x$host = x$build
 then
build_auto=auto-host.h
+   HAVE_AUTO_BUILD='# '
 else
# We create a subdir, then run autoconf in the subdir.
# To prevent recursion we set host and build for the new
@@ -1860,7 +1861,10 @@ else
GMPINC="" CPPFLAGS="${CPPFLAGS} -DGENERATOR_FILE" \
${realsrcdir}/configure \
--enable-languages=${enable_languages-all} \
-   --target=$target_alias --host=$build_alias --build=$build_alias
+   ${enable_obsolete+--enable-obsolete="$enable_obsolete"} \
+   
${enable_option_checking+--enable-option-checking="$enable_option_checking"} \
+   --target=$target_alias --host=$build_alias \
+   --build=$build_alias || exit # retaining $tempdir
 
# We just finished tests for the build machine, so rename
# the file auto-build.h in the gcc directory.
@@ -1868,8 +1872,10 @@ else
cd ..
rm -rf $tempdir
build_auto=auto-build.h
+   HAVE_AUTO_BUILD=
 fi
 AC_SUBST(build_subdir)
+AC_SUBST(HAVE_AUTO_BUILD)
 
 tm_file="${tm_file} defaults.h"
 tm_p_file="${tm_p_file} tm-preds.h"


-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: introduce -fcallgraph-info option

2019-11-06 Thread Alexandre Oliva

On Nov  4, 2019, Richard Biener  wrote:

> Please leave that part out for now, I'd rather discuss this separately
> from the bulk of the patch.  That is, I wonder why we shouldn't
> simply adjust aux_base_name to something else for -flto [in the driver].

*nod*, that makes sense to me.  After seeing your suggestion, I started
looking into how to do that, but didn't get very far yet.  For now, I've
split that bit out of the main patch.  So I'm installing the first, big
one, and not installing the latter, posted mainly so that the
documentation bit can be picked up.  Thanks!


introduce -fcallgraph-info option

This was first submitted many years ago
https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html

The command line option -fcallgraph-info is added and makes the
compiler generate another output file (xxx.ci) for each compilation
unit (or LTO partitoin), which is a valid VCG file (you can launch
your favorite VCG viewer on it unmodified) and contains the "final"
callgraph of the unit.  "final" is a bit of a misnomer as this is
actually the callgraph at RTL expansion time, but since most
high-level optimizations are done at the Tree level and RTL doesn't
usually fiddle with calls, it's final in almost all cases.  Moreover,
the nodes can be decorated with additional info: -fcallgraph-info=su
adds stack usage info and -fcallgraph-info=da dynamic allocation info.


for  gcc/ChangeLog
>From  Eric Botcazou  , Alexandre Oliva  
>

* common.opt (-fcallgraph-info[=]): New option.
* doc/invoke.texi (Developer options): Document it.
* opts.c (common_handle_option): Handle it.
* builtins.c (expand_builtin_alloca): Record allocation if
-fcallgraph-info=da.
* calls.c (expand_call): If -fcallgraph-info, record the call.
(emit_library_call_value_1): Likewise.
* flag-types.h (enum callgraph_info_type): New type.
* explow.c: Include stringpool.h.
(set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol.
* function.c (allocate_stack_usage_info): New.
(allocate_struct_function): Call it for -fcallgraph-info.
(prepare_function_start): Call it otherwise.
(record_final_call, record_dynamic_alloc): New.
* function.h (struct callinfo_callee): New.
(CALLEE_FROM_CGRAPH_P): New.
(struct callinfo_dalloc): New.
(struct stack_usage): Add callees and dallocs.
(record_final_call, record_dynamic_alloc): Declare.
* gimplify.c (gimplify_decl_expr): Record dynamically-allocated
object if -fcallgraph-info=da.
* optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL.
* print-tree.h (print_decl_identifier): Declare.
(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
* print-tree.c: Include print-tree.h.
(print_decl_identifier): New function.
* toplev.c: Include print-tree.h.
(callgraph_info_file): New global variable.
(callgraph_info_external_printed): Likewise.
(output_stack_usage): Rename to...
(output_stack_usage_1): ... this.  Make it static, add cf
parameter.  If -fcallgraph-info=su, print stack usage to cf.
If -fstack-usage, use print_decl_identifier for
pretty-printing.
(INDIRECT_CALL_NAME): New.
(dump_final_node_vcg_start): New.
(dump_final_callee_vcg, dump_final_node_vcg): New.
(output_stack_usage): New.
(lang_dependent_init): Open and start file if
-fcallgraph-info.  Allocated callgraph_info_external_printed.
(finalize): If callgraph_info_file is not null, finish it,
close it, and release callgraph_info_external_printed.

for  gcc/ada/ChangeLog

* gcc-interface/misc.c (callgraph_info_file): Delete.
---
 gcc/ada/gcc-interface/misc.c |3 -
 gcc/builtins.c   |4 +
 gcc/calls.c  |6 +
 gcc/common.opt   |8 ++
 gcc/doc/invoke.texi  |   23 +
 gcc/explow.c |5 +
 gcc/flag-types.h |   16 
 gcc/function.c   |   59 +-
 gcc/function.h   |   30 +++
 gcc/gimplify.c   |4 +
 gcc/optabs-libfuncs.c|4 -
 gcc/opts.c   |   26 ++
 gcc/output.h |2 
 gcc/print-tree.c |   76 ++
 gcc/print-tree.h |4 +
 gcc/toplev.c |  178 ++
 16 files changed, 397 insertions(+), 51 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 4abd4d5..d68b373 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -54,9 +54,6 @@
 #include "ada-tree.h"
 #include "gigi.h"
 
-/* This symbol needs to be defined for the front-end.  */
-void *callgraph_info

Re: introduce -fcallgraph-info option

2019-11-06 Thread Alexandre Oliva

On Nov  4, 2019, Richard Biener  wrote:

> I wonder why we shouldn't simply adjust aux_base_name to something
> else for -flto [in the driver].

About that, having tried to make sense of the current uses of
aux_base_name and of lto-wrapper, three main possibilities occur to me:

a) adjust the driver code to accept -auxbase, and have lto-wrapper
explicitly pass -aux-base ${output_dir-.}/$(lbasename ${output_name}) or
somesuch for each -fltrans command;

b) introduce -auxdir and get lto-wrapper to pass -auxdir ${output_dir-.}
for each -fltrans (and offload) command; or

c) get -fltrans to implicitly adjust aux_base_name with the directory
passed to -dumpdir, if any, or . otherwise

Any preferences?

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  7, 2019, Richard Biener  wrote:

> A simple test shows we currently only pass -auxbase-strip /tmp/cc...o
> to the LTRANS lto1 invocation plus -dumpbase cc...o

This -auxbase-strip argument is introduced by the gcc driver that runs
lto1, based on the -o (tmp ltrans .o) argument, but this driver has no
clue as to the executable name or location, and there's nothing
whatsoever in the driver interface that enables aux_base_name to be
located separately from the output name.  Thus the possibilities I
brought up of introducing means for it to be told so, explicitly or by
convention.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  6, 2019, Thomas Schwinge  wrote:

>> which is a valid VCG file (you can launch your favorite VCG
>> viewer on it unmodified)

> What should be my "favorite VCG viewer"?

-ENOCLUE, I'm afraid.  I honestly don't even know which one Eric used
back when he first attempted to contribute this feature, almost 10 years
ago.

What I do know is that visualization is not the primary goal.  There are
indeed newer and more elaborate and modern graph file formats for that.
The primary intended consumer of this output is gnatstack, that's long
used only this simple format.  It's hard to justify rewriting it and
creating an incompatibility when the simple format does the job well.

Plus, it's simple enough and regular enough that it should be quite easy
to parse it with a few lines of awk and post-process the .ci file into
any other graph format of interest, when visualization of the graph is
the aim.  If you show me examples of graph formats that you'd like, that
can represent all the data encoded in .ci files, it wouldn't take much
effort to persuade me to write the few lines of awk, or perhaps even
sed, to convert .ci files output by GCC to the other format ;-)

> I tried that, but 'xvcg' didn't render anything useful for a
> '-fcallgraph-info=su,da' dump, hmm.

Did xvcg fail to display the node information added by su and da?  As
in, do you see the difference the options make to the graph text file,
but not in the visualization?  Or is it something else?

> Also, I found that many years ago, in 2012, Steven Bosscher did "Rework
> RTL CFG graph dumping to dump DOT format" (that's Graphviz), and then did
> "remove vcg CFG dumper".

gnatstack and -fcallgraph-info have been available since long before
that move indeed.

> Note that I'm not actively objecting VCG

Good, thanks for pointing that out :-)

> unmaintained mid-90s software, containing obfuscated layout/rendering
> source code

Since gnatstack is the primary consumer, I think that objection doesn't
apply.

As a Free Software activist, however, I am a little concerned about the
claim about obfuscated source code.  I haven't been able to find any
substantiation of that in your message.  I think that would be OT for
this list, so would you please send me what you got about it at
ol...@fsf.org?  TIA,

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  7, 2019, Richard Biener  wrote:

> So how's -dumpbase handled?

It's part of the gcc driver interface, and lto-wrapper passes it to gcc
for -fltrans compilations.  -auxbase isn't, so one of the alternatives I
suggested involved our taking auxbase from dumpdir & dumpbase for
-fltrans compilations.  The other alternatives amounted to exposing
auxdir or auxbase in the driver interface, so that lto-wrapper could
specify them explicitly.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

linking without -o: (a.out.ltrans#, a.out.ltrans#)
ex $CC ltobjdir/foo.o ltobjdir/bar.o -fdump-rtl-expand
-> /tmp/temp().ltrans0.ltrans.o a.out.ltrans0.su 
 + a.out.ltrans0.#r.expand
 + a.out


I'm a little hesitant, this amounts to quite significant behavior
changes.  Do these seem acceptable and desirable nevertheless?


-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

introduce overridable clear_cache emitter

2020-11-10 Thread Alexandre Oliva

fault_ira_change_pseudo_allocno_class (int, reg_class_t,
reg_class_t);
diff --git a/gcc/tree.h b/gcc/tree.h
index 684be10b440a7..d69f8bad1d015 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -5579,6 +5579,13 @@ is_lang_specific (const_tree t)
 #define BUILTIN_VALID_P(FNCODE) \
   (IN_RANGE ((int)FNCODE, ((int)BUILT_IN_NONE) + 1, ((int) END_BUILTINS) - 1))
 
+/* Obtain a pointer to the identifier string holding the asm name for
+   BUILTIN, a BUILT_IN code.  This is handy if the target
+   mangles/overrides the function name that implements the
+   builtin.  */
+#define BUILTIN_ASM_NAME_PTR(BUILTIN) \
+  (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (builtin_decl_explicit (BUILTIN
+
 /* Return the tree node for an explicit standard builtin function or NULL.  */
 static inline tree
 builtin_decl_explicit (enum built_in_function fncode)
diff --git a/libgcc/config/t-vxworks b/libgcc/config/t-vxworks
index 02e2efa0eb7d7..b4bb85bff08be 100644
--- a/libgcc/config/t-vxworks
+++ b/libgcc/config/t-vxworks
@@ -4,7 +4,6 @@ LIBGCC2_DEBUG_CFLAGS =
 # We provide our own implementation for __clear_cache, using a
 # VxWorks specific entry point.
 LIB2FUNCS_EXCLUDE += _clear_cache
-LIB2ADD += $(srcdir)/config/vxcache.c
 
 # This ensures that the correct target headers are used; some VxWorks
 # system headers have names that collide with GCC's internal (host)
diff --git a/libgcc/config/t-vxworks7 b/libgcc/config/t-vxworks7
index 20c72f490dd3c..6ddd3e84f3309 100644
--- a/libgcc/config/t-vxworks7
+++ b/libgcc/config/t-vxworks7
@@ -4,7 +4,6 @@ LIBGCC2_DEBUG_CFLAGS =
 # We provide our own implementation for __clear_cache, using a
 # VxWorks specific entry point.
 LIB2FUNCS_EXCLUDE += _clear_cache
-LIB2ADD += $(srcdir)/config/vxcache.c
 
 # This ensures that the correct target headers are used; some VxWorks
 # system headers have names that collide with GCC's internal (host)
diff --git a/libgcc/config/vxcache.c b/libgcc/config/vxcache.c
deleted file mode 100644
index e25e0cce0a4c2..0
--- a/libgcc/config/vxcache.c
+++ /dev/null
@@ -1,35 +0,0 @@
-/* Copyright (C) 2018-2020 Free Software Foundation, Inc.
-   Contributed by Alexandre Oliva 
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify it under
-the terms of the GNU General Public License as published by the Free
-Software Foundation; either version 3, or (at your option) any later
-version.
-
-GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-for more details.
-
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
-
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
-
-/* Instruction cache invalidation routine using VxWorks' cacheLib.  */
-
-#include 
-#include 
-
-void
-__clear_cache (char *beg __attribute__((__unused__)),
-  char *end __attribute__((__unused__)))
-{
-  cacheTextUpdate (beg, end - beg);
-}


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: drop -aux{dir,base}, revamp -dump{dir,base}

2020-01-09 Thread Alexandre Oliva

On Jan  9, 2020, Richard Biener  wrote:

> Did I miss the actual (non-documentation) patch?

No, I didn't post it.  It's kind of big, and only yesterday did I get it
to work as expected and now extensively documented, passing all of the
extensive testsuite I wrote for it.

Alas, some of the latest tweaks to driver and lto-wrapper to update lto
dumps to the new semantics ended up regressing a handful of tests in the
testsuite.  I'm still looking for some simple work-around, and I'm not
very happy with the naming of wpa dumps.

commit c34c1f54a8e4c92a0fca101a0f732c5e29c44643 currently in
aoliva/testme in the git repo is the latest I tested; I pushed other
minor cleanups over that one, but other changes I might make and push
later today might be more disruptive in terms of test results.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: drop -aux{dir,base}, revamp -dump{dir,base}

2020-01-16 Thread Alexandre Oliva

On Jan 16, 2020, Alexandre Oliva  wrote:

> On Jan  9, 2020, Alexandre Oliva  wrote:
>> On Jan  9, 2020, Richard Biener  wrote:
>>> Did I miss the actual (non-documentation) patch?

>> No, I didn't post it.  It's kind of big, and only yesterday did I get it
>> to work as expected and now extensively documented, passing all of the
>> extensive testsuite I wrote for it.

> Here it is, at last, regstrapped on x86_64-linux-gnu.  Ok to install?

And here's a followup that fixes a limitation (bug?) in libiberty that
was hit when I attempted a last-minute simplification in lto-wrapper.

Regstrapped separately on x86_64-linux-gnu.  Ok to install?


[libiberty] output empty args as a pair of quotes

From: Alexandre Oliva 

writeargv writes out empty arguments in a way that expandargv skips
them instead of preserving them.  Fixed by writing out a pair of
quotes for them.

This enables lto-wrapper to pass down an empty string, as desired.


for  libiberty/ChangeLog

* argv.c (writeargv): Output empty args as "".

for  gcc/ChangeLog

* lto-wrapper.c (run_gcc): Use an empty string for -dumpdir.
---
 gcc/lto-wrapper.c |5 +
 libiberty/argv.c  |8 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index ed076e3..aa71f1e 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1482,10 +1482,7 @@ run_gcc (unsigned argc, char *argv[])
   if (!incoming_dumppfx)
 {
   obstack_ptr_grow (&argv_obstack, "-dumpdir");
-  /* An empty string would do, if only writeargv would write it
-out in a way that would not be skipped by expandargv and
-buildargv.  */
-  obstack_ptr_grow (&argv_obstack, current_dir);
+  obstack_ptr_grow (&argv_obstack, "");
 }
   obstack_ptr_grow (&argv_obstack, "-dumpbase");
 
diff --git a/libiberty/argv.c b/libiberty/argv.c
index 8c9794db..6a72208 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -327,6 +327,14 @@ writeargv (char * const *argv, FILE *f)
   arg++;
 }
 
+  /* Write out a pair of quotes for an empty argument.  */
+  if (arg == *argv)
+   if (EOF == fputs ("\"\"", f))
+ {
+   status = 1;
+   goto done;
+ }
+
   if (EOF == fputc ('\n', f))
 {
   status = 1;


-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

tolerate padding in mbstate_t

2020-01-21 Thread Alexandre Oliva



Padding in mbstate_t objects may get the memcmp to fail.
Attempt to avoid the failure with zero initialization.


Regstrapped on x86_64-linux-gnu, and also tested on a platform that used
to fail because of padding in std::mbstate_t.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/27_io/fpos/mbstate_t/1.cc: Zero-init mbstate_t.
---
 testsuite/27_io/fpos/mbstate_t/1.cc |   28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc 
libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc
index f92d68f..559bd8d 100644
--- libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc
+++ libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc
@@ -28,8 +28,24 @@
 void test01()
 {
   typedef std::mbstate_t state_type;
-  state_type state01 = state_type();
-  state_type state02 = state_type();
+  // Use zero-initialization of the underlying memory so that padding
+  // bytes, if any, stand a better chance of comparing the same.
+  // Zero-initialized memory is guaranteed to be a valid initial
+  // state.  This doesn't quite guarantee that any padding bits won't
+  // be overwritten when copying from other instances that haven't
+  // been fully initialized: this data type is compatible with C, so
+  // it is likely plain old data, but it could have a default ctor
+  // that initializes only the relevant fields, whereas copy-ctor and
+  // operator= could be implemented as a full-object memcpy, including
+  // padding bits, rather than fieldwise copying.  However, since
+  // we're comparing two values copied from the same state_type
+  // instance (or was this meant to take one of them from pos02 rather
+  // than both from pos01?), if padding bits are copied, we'll get the
+  // same for both of them, and if they aren't, we'll keep the values
+  // we initialized them with, so this should be good.
+  state_type state[2];
+  std::memset(state, 0, sizeof (state));
+
 
   std::streampos pos01(0);
   std::streampos pos02(0);
@@ -39,13 +55,13 @@ void test01()
   // state_type state();
 
   // XXX Need to have better sanity checking for the mbstate_t type,
-  // or whatever the insantiating type for class fpos happens to be
+  // or whatever the instantiating type for class fpos happens to be
   // for streampos, as things like equality operators and assignment
   // operators, increment and deincrement operators need to be in
   // place.
-  pos01.state(state02);
-  state01 = pos01.state();
-  VERIFY( std::memcmp(&state01, &state02, sizeof(state_type)) == 0 );
+  pos01.state(state[1]);
+  state[0] = pos01.state();
+  VERIFY( std::memcmp(&state[0], &state[1], sizeof(state_type)) == 0 );
 }
 
 int main() 

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1938 matches

Mail list logo