[PATCH] return edge in make_eh_edges
The need to initialize edge probabilities has made make_eh_edges undesirably hard to use. I suppose we don't want make_eh_edges to initialize the probability of the newly-added edge itself, so that the caller takes care of it, but identifying the added edge in need of adjustments is inefficient and cumbersome. Change make_eh_edges so that it returns the added edge. Regstrapped on x86_64-linux-gnu, and (along with various hardening patches) on ppc64el-linux-gnu. Also tested on multiple other targets, on older versions of GCC. The returned value is unused in code already in the compiler. This is a preparatory patch for uses to be introduced along with stack scrubbing and control flow redundancy. Ok to install? for gcc/ChangeLog * tree-eh.cc (make_eh_edges): Return the new edge. * tree-eh.h (make_eh_edges): Likewise. --- gcc/tree-eh.cc |6 +++--- gcc/tree-eh.h |2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc index e8ceff36cc6e7..1cb8e08652909 100644 --- a/gcc/tree-eh.cc +++ b/gcc/tree-eh.cc @@ -2274,7 +2274,7 @@ make_eh_dispatch_edges (geh_dispatch *stmt) /* Create the single EH edge from STMT to its nearest landing pad, if there is such a landing pad within the current function. */ -void +edge make_eh_edges (gimple *stmt) { basic_block src, dst; @@ -2283,14 +2283,14 @@ make_eh_edges (gimple *stmt) lp_nr = lookup_stmt_eh_lp (stmt); if (lp_nr <= 0) -return; +return NULL; lp = get_eh_landing_pad_from_number (lp_nr); gcc_assert (lp != NULL); src = gimple_bb (stmt); dst = label_to_block (cfun, lp->post_landing_pad); - make_edge (src, dst, EDGE_EH); + return make_edge (src, dst, EDGE_EH); } /* Do the work in redirecting EDGE_IN to NEW_BB within the EH region tree; diff --git a/gcc/tree-eh.h b/gcc/tree-eh.h index 771be50fe9a1d..1382568b7c919 100644 --- a/gcc/tree-eh.h +++ b/gcc/tree-eh.h @@ -30,7 +30,7 @@ extern bool remove_stmt_from_eh_lp (gimple *); extern int lookup_stmt_eh_lp_fn (struct function *, const gimple *); extern int lookup_stmt_eh_lp (const gimple *); extern bool make_eh_dispatch_edges (geh_dispatch *); -extern void make_eh_edges (gimple *); +extern edge make_eh_edges (gimple *); extern edge redirect_eh_edge (edge, basic_block); extern void redirect_eh_dispatch_edge (geh_dispatch *, edge, basic_block); extern bool operation_could_trap_helper_p (enum tree_code, bool, bool, bool, -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [RFC][debug] Add -fadd-debug-nops
On Jul 16, 2018, Tom de Vries wrote: > On 07/16/2018 03:34 PM, Jakub Jelinek wrote: >> So is this essentially a workaround for GDB not supporting the statement >> frontiers? > AFAIU now, the concept of location views addresses this problem, so yes. Nice! A preview for what can be obtained with LVu support in the debugger! > Right, but in the mean time I don't mind having an option that lets me > filter out noise in guality test results. FWIW, I'm a bit concerned about working around legitimate problems, as in modifying testcases so that they pass. This hides actual problems, that we'd like to fix eventually by adjusting the compiler, not the testcases. That said, thank you for the attention you've given to the guality testsuite recently. It's appreciated. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
[PATCH] Introduce instance discriminators
This patch is a rewrite of an earlier patch submitted at https://gcc.gnu.org/ml/gcc-patches/2012-11/msg02340.html With -gnateS, the Ada compiler sets itself up to output discriminators for different instantiations of generics, but the middle and back ends have lacked support for that. This patch introduces the missing bits, translating the GNAT-internal representation of the instance map to an instance_table that maps ordinary line-map indices to instance discriminators. Instance discriminators are not compatible with LTO, in that the instance mapping is not preserved in LTO dumps. There are no plans to preserve discriminators in them. This patch (minus whitespace changes and tests) was regstrapped on x86_64-linux-gnu. The final form of the patch was tested with a non-bootstrap build, and a single-test check-gnat run. Ok to install? From: Olivier Hainque for libcpp/ChangeLog * include/line-map.h (ORDINARY_MAP_INDEX): New. for gcc/ChangeLog * einput.c: New file. Allow associating "line context" extension data to instruction location info, for sets of locations covered by an ordinary line_map structure. * einput.h: Likewise. * Makefile.in (OBJS): Add einput.o. * input.c (expand_location_1): On request, provide pointer to the line map that was used to resolve the input location. (map_expand_location): New function. Same as expand_location, also providing the map from which the input location was resolved. (expand_location, expand_location_to_spelling_point): Adjust calls to expand_location_1. (linemap_client_expand_location_to_spelling_point): Likewise. * input.h (map_expand_location): Declare. * emit-rtl.c (insn_location): Handle a location_lc* argument. * rtl.h (insn_location): Adjust prototype. * print-rtl.c (print_rtx): Adjust call to insn_location. * modulo-sched.c (dump_insn_location): Likewise. * tree-inline.c (copy_bb): Copy discriminator field as well. * flag-types.h (loc_discriminator_type): New enum, allowing BB or INSTANCE_ID discriminators. * common.opt (loc_discriminator_kind): New variable, conveying the kinf of discriminator we want to see emited with source locations. * final.c (bb_discriminator, last_bb_discriminator): New statics, to track basic block discriminators. (final_start_function_1): Initialize them. (final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track bb_discriminator. (notice_source_line): If INSN_HAS_LOCATION, update current discriminator from BB or INSTANCE_ID depending on the kind we're requested to convey. When deciding to emit, account for both possible kinds of discriminators. for gcc/ada * trans.c (gigi): When requested so, allocate and populate the gcc table controlling the emission of per-instance debug info. From: Alexandre Oliva , Olivier Hainque for gcc/testsuite/ChangeLog * gnat.dg/dinst.adb: New. * gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New. --- gcc/Makefile.in |1 + gcc/ada/gcc-interface/trans.c | 10 ++ gcc/common.opt | 12 gcc/einput.c| 55 +++ gcc/einput.h| 50 gcc/emit-rtl.c | 11 +-- gcc/final.c | 29 +++--- gcc/flag-types.h| 14 + gcc/input.c | 32 +--- gcc/input.h |2 + gcc/modulo-sched.c |2 + gcc/print-rtl.c |2 + gcc/rtl.h |3 +- gcc/testsuite/gnat.dg/dinst.adb | 20 + gcc/testsuite/gnat.dg/dinst_pkg.adb |7 gcc/testsuite/gnat.dg/dinst_pkg.ads |4 +++ gcc/tree-inline.c |2 + libcpp/include/line-map.h |8 + 18 files changed, 247 insertions(+), 17 deletions(-) create mode 100644 gcc/einput.c create mode 100644 gcc/einput.h create mode 100644 gcc/testsuite/gnat.dg/dinst.adb create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.adb create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.ads diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 2a05a66ea9b87..f9a9fe8726b18 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1285,6 +1285,7 @@ OBJS = \ dwarf2cfi.o \ dwarf2out.o \ early-remat.o \ + einput.o \ emit-rtl.o \ et-forest.o \ except.o \ diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c index 31e098a0c707a..3ad3f83fd60f5 100644 --- a/gcc/ada/gcc-interface/trans.c +++ b/gcc/ada/gcc-interface/trans.c @@ -45,6 +45,7 @@ #include "tree-i
Re: [PATCH] Introduce instance discriminators
On Jul 18, 2018, Richard Biener wrote: > On Wed, Jul 18, 2018 at 8:53 AM Alexandre Oliva wrote: >> Instance discriminators are not compatible with LTO, in that the >> instance mapping is not preserved in LTO dumps. There are no plans to >> preserve discriminators in them. > Because... ... it follows existing practice. BB discriminators are not saved for LTO either. They could be saved along with the CFG, but AFAICT they aren't. As for instance discriminators, we might be able to save them along with LOC information, but that would be quite wasteful, and because of the way ordinary maps are reconstructed when reading in the LTO data, we'd end up with yet another internal representation for line_maps. I was told there was no interest from our customers in using the converage and monitoring aspects of instance discriminators when performing link-time optimizations, and thus that it made sense to follow existing practice. I suspect there might be a way to assign instance discriminator numbers to individual function DECLs, and then walk up the lexical block tree to identify the DECL containing the block so as to obtain the discriminator number. This would be a lot less efficient, algorithmically speaking, but, provided that LTO dumps discriminator numbers as part of the decls, and enough info to reconstruct the lexical block trees, including the inlined-function enclosing blocks, that should work even for LTO, at least as long as different decls are maintained for each instance. Indeed, if this is the case, code ranges of lexical blocks in inlined subroutines would suffice to identify each instantiation, without the need for discriminators. It would be a lot more expensive to gather the information from that debug info than from discriminators, though. All this said, there doesn't seem to be much interest in that from Ada users to justify by itself the pursuit of yet another internal representation. I wonder if this sort of discriminator might be of interest for users of C++ templates as well. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [PATCH] Introduce instance discriminators
On Jul 19, 2018, Richard Biener wrote: > Oh, that probably wasn't omitted on purpose. Cary said it was used > for profiling but I can't see any such use. > Is the instance discriminator stuff also used for profiling? Not that I know, but... I probably wouldn't know yet ;-) Anyway, it was easy enough to implement this: >> I suspect there might be a way to assign instance discriminator numbers >> to individual function DECLs, and then walk up the lexical block tree to >> identify the DECL containing the block so as to obtain the discriminator >> number. and then, in a subsequent patch, I went ahead and added support for LTO, saving and recovering discriminator info for instances and, while at that, for basic blocks too. Besides sucessfully regstrapping the first two patches on x86_64-linux-gnu, I have tested this patchset with an additional bootstrap with the third (throw-away) patch, that adds -gnateS to Ada compilations in gcc/ada, libada and gnattools. I also tested the saving and restoring of discriminators for LTO by manually inspecting the line number tables in LTO-recompiled executables, to check that they retained the instance or BB discriminator numbers that went into the non-LTO object files. Ok to install the first two patches? (the third is just for reference) Introduce instance discriminators From: Alexandre Oliva With -gnateS, the Ada compiler sets itself up to output discriminators for different instantiations of generics, but the middle and back ends have lacked support for that. This patch introduces the missing bits, translating the GNAT-internal representation of the per-file instance map to an instance_table that maps decls to instance discriminators. From: Alexandre Oliva , Olivier Hainque for gcc/ChangeLog * debug.h (decl_to_instance_map_t): New type. (decl_to_instance_map): Declare. (maybe_create_decl_to_instance_map): New inline function. * final.c (bb_discriminator, last_bb_discriminator): New statics, to track basic block discriminators. (final_start_function_1): Initialize them. (final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track bb_discriminator. (decl_to_instance_map): New variable. (map_decl_to_instance, maybe_set_discriminator): New functions. (notice_source_line): Set discriminator. for gcc/ada * trans.c: Include debug.h. (file_map): New static variable. (gigi): Set it. Create decl_to_instance_map when needed. (Subprogram_Body_to_gnu): Pass gnu_subprog_decl to... (Sloc_to_locus): ... this. Add decl parm, map it to instance. * gigi.h (Sloc_to_locus): Adjust declaration. for gcc/testsuite/ChangeLog * gnat.dg/dinst.adb: New. * gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New. --- gcc/ada/gcc-interface/gigi.h|2 + gcc/ada/gcc-interface/trans.c | 29 --- gcc/debug.h | 15 gcc/final.c | 70 +-- gcc/testsuite/gnat.dg/dinst.adb | 20 ++ gcc/testsuite/gnat.dg/dinst_pkg.adb |7 gcc/testsuite/gnat.dg/dinst_pkg.ads |4 ++ 7 files changed, 137 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gnat.dg/dinst.adb create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.adb create mode 100644 gcc/testsuite/gnat.dg/dinst_pkg.ads diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h index a75cb9094491..b890195cefc3 100644 --- a/gcc/ada/gcc-interface/gigi.h +++ b/gcc/ada/gcc-interface/gigi.h @@ -285,7 +285,7 @@ extern void process_type (Entity_Id gnat_entity); location and false if it doesn't. If CLEAR_COLUMN is true, set the column information to 0. */ extern bool Sloc_to_locus (Source_Ptr Sloc, location_t *locus, - bool clear_column = false); + bool clear_column = false, const_tree decl = 0); /* Post an error message. MSG is the error message, properly annotated. NODE is the node at which to post the error and the node to use for the diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c index 31e098a0c707..0371d00fce18 100644 --- a/gcc/ada/gcc-interface/trans.c +++ b/gcc/ada/gcc-interface/trans.c @@ -41,6 +41,7 @@ #include "stmt.h" #include "varasm.h" #include "output.h" +#include "debug.h" #include "libfuncs.h" /* For set_stack_check_libfunc. */ #include "tree-iterator.h" #include "gimplify.h" @@ -255,6 +256,12 @@ static tree create_init_temporary (const char *, tree, tree *, Node_Id); static const char *extract_encoding (const char *) ATTRIBUTE_UNUSED; static const char *decode_name (const char *) ATTRIBUTE_UNUSED; +/* This makes gigi's file_info_ptr visible in this translation unit, + so that
Re: [RFC 1/3, debug] Add fdebug-nops
On Jul 24, 2018, Tom de Vries wrote: > There's a design principle in GCC that code generation and debug generation > are independent. This guarantees that if you're encountering a problem in an > application without debug info, you can recompile it with -g and be certain > that you can reproduce the same problem, and use the debug info to debug the > problem. This invariant is enforced by bootstrap-debug. The fdebug-nops > breaks this invariant I thought of a way to not break it: enable the debug info generation machinery, including VTA and SFN, but discard those only at the very end if -g is not enabled. The downside is that it would likely slow -Og down significantly, but who uses it without -g anyway? -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [RFC 2/3, debug] Add fkeep-vars-live
On Jul 24, 2018, Tom de Vries wrote: > This patch adds fake uses of user variables at the point where they go out of > scope, to keep user variables inspectable throughout the application. I suggest also adding such uses before sets, so that variables aren't regarded as dead and get optimized out in ranges between the end of a live range and a subsequent assignment. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]
Hello, Christina, On Jul 24, 2018, Tamar Christina wrote: > gcc/ > 2018-07-24 Tamar Christina > PR target/86486 > * configure.ac: Add stack-clash-protection-guard-size. > * doc/install.texi: Document it. > * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New. > * params.def: Update comment for guard-size. > * configure: Regenerate. The configury bits look almost good to me. I wish the help message, comments and docs expressed somehow that the given power of two expresses a size in bytes, rather than in kilobytes, bits or any other unit that might be reasonably assumed to express stack sizes. I'm afraid I don't know the best way to accomplish that in a few words. > +stk_clash_default=12 This seems to be left-over from an earlier patch, as it is now unused AFAICT. Thanks, -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]
On Jul 25, 2018, Tamar Christina wrote: > gcc/ > 2018-07-25 Tamar Christina > PR target/86486 > * configure.ac: Add stack-clash-protection-guard-size. > * doc/install.texi: Document it. > * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New. > * params.def: Update comment for guard-size. > (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, > PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Update description. > * configure: Regenerate. Thanks. No objections from me. I don't see any use of the new config knob, though; assuming it's in a subsequent patch, I guess this one is fine, but I'm not sure I'm entitled to approve it. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [RFC 2/3, debug] Add fkeep-vars-live
On Jul 25, 2018, Jakub Jelinek wrote: > On Tue, Jul 24, 2018 at 04:11:11PM -0300, Alexandre Oliva wrote: >> On Jul 24, 2018, Tom de Vries wrote: >> >> > This patch adds fake uses of user variables at the point where they go out >> > of >> > scope, to keep user variables inspectable throughout the application. >> >> I suggest also adding such uses before sets, so that variables aren't >> regarded as dead and get optimized out in ranges between the end of a >> live range and a subsequent assignment. > But that can be done incrementally, right Sure > the optimizers could still move it appart *nod* -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [RFC 1/3, debug] Add fdebug-nops
On Jul 24, 2018, Tom de Vries wrote: >> I thought of a way to not break it: enable the debug info generation >> machinery, including VTA and SFN, but discard those only at the very end >> if -g is not enabled. The downside is that it would likely slow -Og >> down significantly, but who uses it without -g anyway? > I thought of the same. I've submitted a patch here that uses SFN: > https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01391.html . Nice! > VTA is not needed AFAIU. Yes, indeed. It could avoid inserting some nops, if you were to refrain from emitting them if there aren't any binds between neighbor SFNs, but I like it better your way: it's even more like SFN support in the debugger :-) -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
Re: [PATCH] Introduce instance discriminators
On Jul 24, 2018, Alexandre Oliva wrote: > Ok to install the first two patches? (the third is just for reference) Ping? https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01419.html > Introduce instance discriminators > From: Alexandre Oliva > With -gnateS, the Ada compiler sets itself up to output discriminators > for different instantiations of generics, but the middle and back ends > have lacked support for that. This patch introduces the missing bits, > translating the GNAT-internal representation of the per-file instance > map to an instance_table that maps decls to instance discriminators. > From: Alexandre Oliva , Olivier Hainque > > for gcc/ChangeLog > * debug.h (decl_to_instance_map_t): New type. > (decl_to_instance_map): Declare. > (maybe_create_decl_to_instance_map): New inline function. > * final.c (bb_discriminator, last_bb_discriminator): New statics, > to track basic block discriminators. > (final_start_function_1): Initialize them. > (final_scan_insn_1): On NOTE_INSN_BASIC_BLOCK, track > bb_discriminator. > (decl_to_instance_map): New variable. > (map_decl_to_instance, maybe_set_discriminator): New functions. > (notice_source_line): Set discriminator. > for gcc/ada > * trans.c: Include debug.h. > (file_map): New static variable. > (gigi): Set it. Create decl_to_instance_map when needed. > (Subprogram_Body_to_gnu): Pass gnu_subprog_decl to... > (Sloc_to_locus): ... this. Add decl parm, map it to instance. > * gigi.h (Sloc_to_locus): Adjust declaration. > for gcc/testsuite/ChangeLog > * gnat.dg/dinst.adb: New. > * gnat.dg/dinst_pkg.ads, gnat.dg/dinst_pkg.adb: New. [...] > Save discriminator info for LTO > From: Alexandre Oliva > for gcc/ChangeLog > * gimple-streamer-in.c (input_bb): Restore BB discriminator. > * gimple-streamer-out.c (output_bb): Save it. > * lto-streamer-in.c (input_struct_function_base): Restore > instance discriminator if available. Create map on demand. > * lto-streamer-out.c (output_struct_function_base): Save it if > available. > * final.c (decl_to_instance_map): Document LTO strategy. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist
[PATCH] testsuite: check for and use -mno-strict-align where needed
On Mar 10, 2021, Alexandre Oliva wrote: > ppc configurations that have -mstrict-align enabled by default fail > gcc.dg/strlenopt-80.c, because some memcpy calls don't get turned into > MEM_REFs, which defeats the tested-for strlen optimization. I've combined this patch with other patches that added -mno-strict-align to tests that needed it on targets configured with -mstrict-align enabled by default, and conditioned the use of the flag to targets that support it. Regstrapped on x86_64-linux-gnu, ppc64le-linux-gnu, also tested on a ppc-vx7r2 configured with -mstrict-align. Ok to install? Various tests fail on powerpc if the toolchain is configured to enable -mstrict-align by default. This patch introduces -mno-strict-align on tests found to fail that way, when the target supports this option. I suppose !non_strict_align could be used to skip tests, instead of or in addition to this tweak, and that might be desirable if they still fail on targets that do no support -mno-strict-align, but I haven't observed such scenarios. The p9-vec-length tests expect vectorization on loop bodies and epilogues that reference arrays that are not known to be more aligned than their small element types. Though VSX vectors work best with 32- or 64-bit alignment, unaligned vector loads and stores are expected by the tests. However, with -mstrict-align by default, vector loads and stores not known to be aligned end up open coded, which doesn't match the asm output expectations coded in the tests. for gcc/ChangeLog * doc/sourcebuild.texi (opt_mstrict_align): New target. for gcc/testsuite/ChangeLog * lib/target-supports.exp (check_effective_target_opt_mstrict_align): New. * gcc.dg/strlenopt-80.c: Add -mno-strict-align if supported. * gcc.target/powerpc/prefix-ds-dq.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-1.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-1.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-2.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-3.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-4.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-5.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-6.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-run-8.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-1.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-2.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-3.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-4.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-5.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-6.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-8.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-1.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-2.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-3.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-4.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-5.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-6.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-run-8.c: Likewise. --- gcc/doc/sourcebuild.texi |3 +++ gcc/testsuite/gcc.dg/strlenopt-80.c|4 .../gcc.target/powerpc/p9-vec-length-epil-1.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-2.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-3.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-4.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-5.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-6.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-7.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-8.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-run-1.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-run-2.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-run-3.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-run-4.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-run-5.c |2 ++ .../gcc.target/powerpc/p9-vec-length-epil-run-6.c |2 ++ .../gcc.target/powerpc/p9-ve
Re: [PATCH] return edge in make_eh_edges
On Oct 19, 2023, Richard Biener wrote: > OK. Maybe time to do s/make_eh_edges/make_eh_edge/ though. Thanks, will do, ideally on top of the already-tested refreshed patches that I'm going to post shortly. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: Enable top-level recursive 'autoreconf'
On Oct 19, 2023, Thomas Schwinge wrote: > On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote: >> I guess I can ask, why there is not a recursive approach for configuring >> GCC. e.g. AC_SUBDIRS in the top level? > ('AC_CONFIG_SUBDIRS' you mean.) You know, often it just takes someone to > ask the right questions... ;-) > What do people think about the attached > "Enable top-level recursive 'autoreconf'"? Only lightly tested, so far. Interesting idea! It is a little hackish, in that it seems to exploit an implementation detail in AC_CONFIG_SUBDIRS rather than a documented feature. I like it! The autoconf documentation suggests that optional directories can be tested for: if test -d "$srcdir/foo"; then AC_CONFIG_SUBDIRS([foo]) fi We could use a macro that takes a list and iterates over the list (untested): dnl Handle a list of optional subdirs. dnl After AC_OUTPUT, affects autoreconf runs, but not configure runs. AC_DEFUN([AC_CONFIG_SUBDIRS_OPT], [ m4_foreach_w([dir], [$1], [ if test -d "$srcdir/dir"; then AC_CONFIG_SUBDIRS(dir) fi ]) ]) Thanks, -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
[PATCH v4] Introduce hardbool attribute for C
g/torture/hardbool-ll-5a.c new file mode 100644 index 0..14438c5104f07 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options "-w" } */ + +#define falseval 0x781ee187a53cc35all + +#include "hardbool-ll.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-ll.c b/gcc/testsuite/gcc.dg/torture/hardbool-ll.c new file mode 100644 index 0..d4d498c6f2af1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-ll.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ + +#define basetype long long + +#include "hardbool.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c b/gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c new file mode 100644 index 0..e38a56b5deb05 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options "-w" } */ + +#define falseval 0x5aa5 + +#include "hardbool-s.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-s.c b/gcc/testsuite/gcc.dg/torture/hardbool-s.c new file mode 100644 index 0..942300be2072a --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-s.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ + +#define basetype short + +#include "hardbool.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c b/gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c new file mode 100644 index 0..7beec578ff89c --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options "-w" } */ + +#define falseval 0xa53cc35a + +#include "hardbool-ul.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-ul.c b/gcc/testsuite/gcc.dg/torture/hardbool-ul.c new file mode 100644 index 0..841c1d4bc2ec8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-ul.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ + +#define basetype unsigned long + +#include "hardbool.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c b/gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c new file mode 100644 index 0..5bfc922795d3d --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options "-w" } */ + +#define falseval 0xa55a + +#include "hardbool-us.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool-us.c b/gcc/testsuite/gcc.dg/torture/hardbool-us.c new file mode 100644 index 0..e9feec681c41e --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool-us.c @@ -0,0 +1,5 @@ +/* { dg-do run } */ + +#define basetype unsigned short + +#include "hardbool.c" diff --git a/gcc/testsuite/gcc.dg/torture/hardbool.c b/gcc/testsuite/gcc.dg/torture/hardbool.c new file mode 100644 index 0..01684952a2a9f --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/hardbool.c @@ -0,0 +1,118 @@ +/* { dg-do run } */ + +#include + +#ifndef basetype +#define basetype char +#endif + +#ifndef falseval +#define falseval 0 +#endif + +#ifndef trueval +#define trueval ~falseval +#endif + +/* hardbool may be #defined so as to drop parms in other tests. */ +typedef basetype __attribute__ ((hardbool (falseval, trueval))) hbool; + +typedef unsigned char __attribute__ ((__hardbool__ (1, 0))) zbool; + +struct hs { + hbool a[2]; + hbool x:2; + hbool y:5; + zbool z:1; +}; + +hbool var = 0; + +struct hs x = { { 1, 0 }, 2, 0, 2 }; + +int f(hbool v) { + return !v; +} + +int g(int i) { + return f(i); +} + +hbool h(hbool x) { + return x; +} + +hbool h2(hbool x) { + return h(x); +} + +int hsx(struct hs v) { + return v.x; +} + +int ghs(hbool s) { + struct hs v = { {s, !s}, s, !s, s }; + return hsx (v); +} + +int t = (hbool)2; + +void check_pfalse (hbool *p) +{ + assert (!*p); + assert (*(basetype*)p == (basetype)falseval); + assert (!(int)(hbool)*p); +} + +void check_ptrue (hbool *p) +{ + assert (*p); + assert (*(basetype*)p == (basetype)trueval); + assert ((int)(hbool)*p); +} + +void check_vfalse (hbool v) +{ + check_pfalse (&v); +} + +void check_vtrue (hbool v) +{ + check_ptrue (&v); +} + +int main () { + check_pfalse (&var); + var = !(int)(hbool)(_Bool)var; + check_ptrue (&var); + var = (zbool)var; + check_ptrue (&var); + + check_ptrue (&x.a[0]); + check_pfalse (&x.a[1]); + check_vtrue (x.x); + check_vfalse (x.y); + check_vtrue (x.z); + + check_vtrue (t); + + check_vtrue (var && t); + check_vfalse (!var || x.y); + + check_vfalse (f (2)); + check_vfalse (f (1)); + check_vtrue (f (0)); + + check_vfalse (g (2)); + check_vfalse (g (1)); + check_vtrue (g (0)); + + check_vtrue (h (2)); + check_vtrue (h (1)); + check_vfalse (h (0)); + + check_vtrue (h2 (2)); + check_vtrue (h2 (1)); + check_vfalse (h2 (0)); +} + -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
[PATCH] rename make_eh_edges to make_eh_edge (was: return edge in make_eh_edges)
); +extern edge make_eh_edge (gimple *); extern edge redirect_eh_edge (edge, basic_block); extern void redirect_eh_dispatch_edge (geh_dispatch *, edge, basic_block); extern bool operation_could_trap_helper_p (enum tree_code, bool, bool, bool, diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc index d63060c9429c4..867f3b22b589a 100644 --- a/gcc/tree-inline.cc +++ b/gcc/tree-inline.cc @@ -2615,7 +2615,7 @@ copy_edges_for_bb (basic_block bb, profile_count num, profile_count den, } else if (can_throw) { - make_eh_edges (copy_stmt); + make_eh_edge (copy_stmt); update_probs = true; } -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH v3] Control flow redundancy hardening
-cfr-always.c: New. * c-c++-common/torture/harden-cfr-never.c: New. * c-c++-common/torture/harden-cfr-no-xthrow.c: New. * c-c++-common/torture/harden-cfr-nothrow.c: New. * c-c++-common/torture/harden-cfr-bret-always.c: New. * c-c++-common/torture/harden-cfr-bret-never.c: New. * c-c++-common/torture/harden-cfr-bret-noopt.c: New. * c-c++-common/torture/harden-cfr-bret-noret.c: New. * c-c++-common/torture/harden-cfr-bret-no-xthrow.c: New. * c-c++-common/torture/harden-cfr-bret-nothrow.c: New. * c-c++-common/torture/harden-cfr-bret-retcl.c: New. * c-c++-common/torture/harden-cfr-bret.c: New. * g++.dg/harden-cfr-throw-always-O0.C: New. * g++.dg/harden-cfr-throw-returning-O0.C: New. * g++.dg/torture/harden-cfr-noret-always-no-nothrow.C: New. * g++.dg/torture/harden-cfr-noret-never-no-nothrow.C: New. * g++.dg/torture/harden-cfr-noret-no-nothrow.C: New. * g++.dg/torture/harden-cfr-throw-always.C: New. * g++.dg/torture/harden-cfr-throw-never.C: New. * g++.dg/torture/harden-cfr-throw-no-xthrow.C: New. * g++.dg/torture/harden-cfr-throw-no-xthrow-expected.C: New. * g++.dg/torture/harden-cfr-throw-nothrow.C: New. * g++.dg/torture/harden-cfr-throw-nocleanup.C: New. * g++.dg/torture/harden-cfr-throw-returning.C: New. * g++.dg/torture/harden-cfr-throw.C: New. * gcc.dg/torture/harden-cfr-noret-no-nothrow.c: New. * gcc.dg/torture/harden-cfr-tail-ub.c: New. * gnat.dg/hardcfr.adb: New. for libgcc/ChangeLog * Makefile.in (LIB2ADD): Add hardcfr.c. * hardcfr.c: New. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: Darwin: Replace environment runpath with embedded [PR88590]
On Aug 29, 2023, FX Coudert wrote: >> I think a build machinery review is needed. > Thanks. CC’ing the relevant maintainers for review of the build part. > The driver part and the darwin-specific part are already okayed. The build machinery bits look reasonable to me. I have no specific knowledge of darwin, but I expect darwin maintainers understand and approve of the bits that are going into build machinery files. Apologies for the delay. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Think Assange & Stallman. The empires strike back
Re: [PATCH RESEND] libatomic: drop redundant all-multi command
On Aug 1, 2023, Jan Beulich via Gcc-patches wrote: > * Makefile.am (all-multi): Drop commands. > * Makefile.in: Update accordingly. LGTM, thanks, -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Think Assange & Stallman. The empires strike back
Re: Remove stale Autoconf checks for Perl
On May 16, 2023, Thomas Schwinge wrote: > OK to push the attached "Remove stale Autoconf checks for Perl"? LGTM, thanks, -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Think Assange & Stallman. The empires strike back
[PATCH] [PR111520] set hardcmp eh probs (was: rename make_eh_edges to make_eh_edge)
On Oct 20, 2023, Richard Biener wrote: >> * tree-eh.h (make_eh_edges): Rename to... >> (make_eh_edge): ... this. >> * tree-eh.cc: Likewise. Adjust all callers. Once the above goes in (it depends on the strub monster patch), the following one should apply as well. Regstrapped on x86_64-linux-gnu. Ok to install? Set execution count of EH blocks, and probability of EH edges. for gcc/ChangeLog PR tree-optimization/111520 * gimple-harden-conditionals.cc (pass_harden_compares::execute): Set EH edge probability and EH block execution count. for gcc/testsuite/ChangeLog PR tree-optimization/111520 * g++.dg/torture/harden-comp-pr111520.cc: New. --- gcc/gimple-harden-conditionals.cc | 12 +++- .../g++.dg/torture/harden-comp-pr111520.cc | 17 + 2 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc diff --git a/gcc/gimple-harden-conditionals.cc b/gcc/gimple-harden-conditionals.cc index 1999e827a04ca..bded288985063 100644 --- a/gcc/gimple-harden-conditionals.cc +++ b/gcc/gimple-harden-conditionals.cc @@ -580,11 +580,21 @@ pass_harden_compares::execute (function *fun) if (throwing_compare_p) { add_stmt_to_eh_lp (asgnck, lookup_stmt_eh_lp (asgn)); - make_eh_edge (asgnck); + edge eh = make_eh_edge (asgnck); + /* This compare looks like it could raise an exception, +but it's dominated by the original compare, that +would raise an exception first, so the EH edge from +this one is never really taken. */ + eh->probability = profile_probability::never (); + if (eh->dest->count.initialized_p ()) + eh->dest->count += eh->count (); + else + eh->dest->count = eh->count (); edge ckeh; basic_block nbb = split_edge (non_eh_succ_edge (gimple_bb (asgnck), &ckeh)); + gcc_checking_assert (eh == ckeh); gsi_split = gsi_start_bb (nbb); if (dump_file) diff --git a/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc b/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc new file mode 100644 index 0..b4381b4d84ec4 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/harden-comp-pr111520.cc @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-fharden-compares -fsignaling-nans -fnon-call-exceptions" } */ + +struct S +{ + S (bool); + ~S (); +}; + +float f; + +void +foo () +{ + S a = 0; + S b = f; +} -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH] Ignore case of header line in dg-extract-results.py
On Oct 24, 2023, Paul Iannetta wrote: > * dg-extract-results.py: Make the test_run regex case > insensitive. It looks reasonable to me, but I'm not sure this is a change I'm entitled to approve. Thanks! -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing
Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.html I'm combining the gcc/ipa-strub.cc bits from https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH] rename make_eh_edges to make_eh_edge
On Oct 20, 2023, Richard Biener wrote: > OK. Thanks. >> * tree-eh.h (make_eh_edges): Rename to... >> (make_eh_edge): ... this. >> * tree-eh.cc: Likewise. Adjust all callers. Ugh, "Adjust all callers" is no longer enough, all files need to be mentioned explicitly. So I'm pushing it with: * tree-eh.h (make_eh_edges): Rename to... (make_eh_edge): ... this. * tree-eh.cc: Likewise. Adjust all callers... * gimple-harden-conditionals.cc: ... here, ... * gimple-harden-control-flow.cc: ... here, ... * tree-cfg.cc: ... here, ... * tree-inline.cc: ... and here. The ipa-strub.cc is now part of the strub patch. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
hardcfr: support checking at abnormal edges [PR111943]
Control flow redundancy may choose abnormal edges for early checking, but that breaks because we can't insert checks on such edges. Introduce conditional checking on the dest block of abnormal edges, and leave it for the optimizer to drop the conditional. Also, oops, I noticed the new files went in with an incorrect copyright notice, that this patch fixes. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog PR tree-optimization/111943 * gimple-harden-control-flow.cc: Adjust copyright year. (rt_bb_visited): Add vfalse and vtrue data members. Zero-initialize them in the ctor. (rt_bb_visited::insert_exit_check_on_edge): Upon encountering abnormal edges, insert initializers for vfalse and vtrue on entry, and insert the check sequence guarded by a conditional in the dest block. for libgcc/ChangeLog * hardcfr.c: Adjust copyright year. for gcc/testsuite/ChangeLog PR tree-optimization/111943 * gcc.dg/harden-cfr-pr111943.c: New. --- gcc/gimple-harden-control-flow.cc | 78 +++- gcc/testsuite/gcc.dg/harden-cfr-pr111943.c | 33 libgcc/hardcfr.c |2 - 3 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/harden-cfr-pr111943.c diff --git a/gcc/gimple-harden-control-flow.cc b/gcc/gimple-harden-control-flow.cc index 3711b25d09123..77c140178060e 100644 --- a/gcc/gimple-harden-control-flow.cc +++ b/gcc/gimple-harden-control-flow.cc @@ -1,5 +1,5 @@ /* Control flow redundancy hardening. - Copyright (C) 2022 Free Software Foundation, Inc. + Copyright (C) 2022-2023 Free Software Foundation, Inc. Contributed by Alexandre Oliva . This file is part of GCC. @@ -460,6 +460,10 @@ class rt_bb_visited at the end of a block's predecessors or successors list. */ tree ckfail, ckpart, ckinv, ckblk; + /* If we need to deal with abnormal edges, we insert SSA_NAMEs for + boolean true and false. */ + tree vfalse, vtrue; + /* Convert a block index N to a block vindex, the index used to identify it in the VISITED array. Check that it's in range: neither ENTRY nor EXIT, but maybe one-past-the-end, to compute @@ -596,7 +600,8 @@ public: /* Prepare to add control flow redundancy testing to CFUN. */ rt_bb_visited (int checkpoints) : nblocks (n_basic_blocks_for_fn (cfun)), - vword_type (NULL), ckseq (NULL), rtcfg (NULL) + vword_type (NULL), ckseq (NULL), rtcfg (NULL), + vfalse (NULL), vtrue (NULL) { /* If we've already added a declaration for the builtin checker, extract vword_type and vword_bits from its declaration. */ @@ -703,7 +708,74 @@ public: /* Insert SEQ on E. */ void insert_exit_check_on_edge (gimple_seq seq, edge e) { -gsi_insert_seq_on_edge_immediate (e, seq); +if (!(e->flags & EDGE_ABNORMAL)) + { + gsi_insert_seq_on_edge_immediate (e, seq); + return; + } + +/* Initialize SSA boolean constants for use in abnormal PHIs. */ +if (!vfalse) + { + vfalse = make_ssa_name (boolean_type_node); + vtrue = make_ssa_name (boolean_type_node); + + gimple_seq vft_seq = NULL; + gassign *vfalse_init = gimple_build_assign (vfalse, boolean_false_node); + gimple_seq_add_stmt (&vft_seq, vfalse_init); + gassign *vtrue_init = gimple_build_assign (vtrue, boolean_true_node); + gimple_seq_add_stmt (&vft_seq, vtrue_init); + + gsi_insert_seq_on_edge_immediate (single_succ_edge + (ENTRY_BLOCK_PTR_FOR_FN (cfun)), + vft_seq); + } + +/* We can't insert on abnormal edges, but we can arrange for SEQ + to execute conditionally at dest. Add a PHI boolean with TRUE + from E and FALES from other preds, split the whole block, add a + test for the PHI to run a new block with SEQ or skip straight + to the original block. If there are multiple incoming abnormal + edges, we'll do this multiple times. ??? Unless there are + multiple abnormal edges with different postcheck status, we + could split the block and redirect other edges, rearranging the + PHI nodes. Optimizers already know how to do this, so we can + keep things simple here. */ +basic_block bb = e->dest; +basic_block bb_postcheck = split_block_after_labels (bb)->dest; + +basic_block bb_check = create_empty_bb (e->dest); +bb_check->count = e->count (); +if (dom_info_available_p (CDI_DOMINATORS)) + set_immediate_dominator (CDI_DOMINATORS, bb_check, bb); +if (current_loops) + add_bb_to_loop (bb_check, current_loops->tree_root); + +gimple_stmt_iterator chkpt = gsi_after_labels (bb_check); +gsi_insert_seq_before_without_update (&chkpt, seq, GSI_SAME_
Re: hardcfr: support checking at abnormal edges [PR111943]
[adding list] On Oct 27, 2023, rep.dot@gmail.com wrote: > + from E and FALES from other preds, split the whole block, add a > s/FALES/FALSE/ Thanks, I've just installed the patch including the typo fix. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
follow SSA defs for asan base
Ada makes extensive use of nested functions, which turn all automatic variables of the enclosing function that are used in nested ones into members of an artificial FRAME record type. The address of a local variable is usually passed to asan marking functions without using a temporary. Taking the address of a member of FRAME within a nested function, however, is not regarded as a gimple val: while introducing FRAME variables, current_function_decl is always the outermost function, even while processing a nested function, so decl_address_invariant_p returns false for such ADDR_EXPRs. So, as automatic variables are moved into FRAME, any asan call that marks such a variable has its ADDR_EXPR replaced with a SSA_NAME set to the ADDR_EXPR of the FRAME member. asan_expand_mark_ifn was not prepared to deal with ADDR_EXPRs split out into SSA_NAMEs. This patch deals with such cases. [It does NOT deal with PHI nodes and whatnot. I'm not even sure it should. Maybe we want the ADDR_EXPR to be a gimple val instead, but this more conservative fix felt more appropriate for this stage.] Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog * asan.c (asan_expand_mark_ifn): Follow SSA_NAME defs for an ADDR_EXPR base. for gcc/testsuite/ChangeLog * gcc.dg/asan/nested-1.c: New. --- gcc/asan.c | 21 + gcc/testsuite/gcc.dg/asan/nested-1.c | 24 2 files changed, 45 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c diff --git a/gcc/asan.c b/gcc/asan.c index 89ecd99b18294..2d2fb97098b2f 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -3629,6 +3629,27 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter) bool is_poison = ((asan_mark_flags)flag) == ASAN_MARK_POISON; tree base = gimple_call_arg (g, 1); + while (TREE_CODE (base) == SSA_NAME) +{ + gimple *def = SSA_NAME_DEF_STMT (base); + if (!def) + break; + + if (!is_gimple_assign (def)) + break; + + if (!SINGLE_SSA_TREE_OPERAND (def, SSA_OP_DEF)) + break; + + if (gimple_num_ops (def) != 2) + break; + + if (gimple_expr_code (def) == ADDR_EXPR + || gimple_expr_code (def) == SSA_NAME) + base = gimple_assign_rhs1 (def); + else + break; +} gcc_checking_assert (TREE_CODE (base) == ADDR_EXPR); tree decl = TREE_OPERAND (base, 0); diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c b/gcc/testsuite/gcc.dg/asan/nested-1.c new file mode 100644 index 0..87e842098077c --- /dev/null +++ b/gcc/testsuite/gcc.dg/asan/nested-1.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=address" } */ + +int f(int i) { + auto int h() { +int r; +int *p; + +{ + int x[3]; + + auto int g() { + return x[i]; + } + + p = &r; + *p = g(); +} + +return *p; + } + + return h(); +} -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
skip asan-poisoning of discarded vars
GNAT may create temporaries to hold return values of function calls. If such a temporary is created as part of a dynamic initializer of a variable in a unit other than the one being compiled, the initializer is dropped, including the temporary and its binding block. Don't issue asan mark calls for such variables, they are gone. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog * gimplify.c (gimplify_decl_expr): Skip asan marking calls for temporaries not seen in binding block, and not about to be added as gimple variables. for gcc/testsuite/ChangeLog * gnat.dg/asan1.adb: New test. * gnat.dg/asan1_pkg.ads: New additional source. --- gcc/gimplify.c |8 +++- gcc/testsuite/gnat.dg/asan1.adb | 15 +++ gcc/testsuite/gnat.dg/asan1_pkg.ads |9 + 3 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gnat.dg/asan1.adb create mode 100644 gcc/testsuite/gnat.dg/asan1_pkg.ads diff --git a/gcc/gimplify.c b/gcc/gimplify.c index d2ac5f913593f..95d55bb8ba4c7 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1795,7 +1795,13 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) && !DECL_HAS_VALUE_EXPR_P (decl) && DECL_ALIGN (decl) <= MAX_SUPPORTED_STACK_ALIGNMENT && dbg_cnt (asan_use_after_scope) - && !gimplify_omp_ctxp) + && !gimplify_omp_ctxp + /* GNAT introduces temporaries to hold return values of calls in +initializers of variables defined in other units, so the +declaration of the variable is discarded completely. We do not +want to issue poison calls for such dropped variables. */ + && (DECL_SEEN_IN_BIND_EXPR_P (decl) + || (DECL_ARTIFICIAL (decl) && DECL_NAME (decl) == NULL_TREE))) { asan_poisoned_variables->add (decl); asan_poison_variable (decl, false, seq_p); diff --git a/gcc/testsuite/gnat.dg/asan1.adb b/gcc/testsuite/gnat.dg/asan1.adb new file mode 100644 index 0..a4bc59a9a2143 --- /dev/null +++ b/gcc/testsuite/gnat.dg/asan1.adb @@ -0,0 +1,15 @@ +-- { dg-do compile } +-- { dg-additional-sources asan1_pkg.ads } +-- { dg-options "-fsanitize=address" } +-- { dg-skip-if "" no_fsanitize_address } + +with Asan1_Pkg; + +procedure Asan1 is + use Asan1_Pkg; + + X, Y : E; +begin + X := C (N); + Y := V; +end Asan1; diff --git a/gcc/testsuite/gnat.dg/asan1_pkg.ads b/gcc/testsuite/gnat.dg/asan1_pkg.ads new file mode 100644 index 0..fbbc1c5e7f5bd --- /dev/null +++ b/gcc/testsuite/gnat.dg/asan1_pkg.ads @@ -0,0 +1,9 @@ +package Asan1_Pkg is + subtype E is Integer; + type T is array (1..32) of E; + + function N return T; + function C (P : T) return E; + + V : constant E := C (N); +end Asan1_Pkg; -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: skip asan-poisoning of discarded vars
On Jan 21, 2021, Jakub Jelinek wrote: > Does that affect only Ada and not other languages? I haven't observed it on other languages, but I didn't try really hard. Doing that now, with an assert that the newly-added condition doesn't ever hit. I'd already completed a bootstrap-asan the other day, but not with all languages. The kind of variable that triggers the problem is created within gcc/ada/gcc-interface/trans.c:Call_to_gnu, specifically within create_temporary in the same file. In the provided testcase, it injects the temporary created for the return value from N, passed as a parameter to C. The binding block containing that temporary ends up dropped when the initializer of V is discarded, because it's dynamic and V is imported from a different unit. I figured if the added condition were to ever hit before, we would add a poison call to a function that did not have gimple_add_tmp_var called on it, and that would NOT have it called in the block right after the one I proposed to modify: if (!DECL_SEEN_IN_BIND_EXPR_P (decl) && DECL_ARTIFICIAL (decl) && DECL_NAME (decl) == NULL_TREE) gimple_add_tmp_var (decl); Without gimple_add_tmp_var, we wouldn't allocate automatic storage to the variable in expand, and then the attempt to take its address for the poisoning call would explode in make_decl_rtl like this testcase does, because make_decl_rtl is not to be called for automatic variables. Since this didn't happen, I figured the new condition would not be hit except for the failing case. But I was wrong. The bootstrap with the added assert has just failed, as early as stage2 libiberty. Looking into it... -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: skip asan-poisoning of discarded vars
On Jan 21, 2021, Alexandre Oliva wrote: > But I was wrong. The bootstrap with the added assert has just failed, > as early as stage2 libiberty. Looking into it... Uhh, I take that back. I just goofed in the assert, inverting the condition. Long day... With the correct condition, it's got past the stage2 compilation of all of the gcc deps and Ada sources, and then some. Maybe my reasoning wasn't wrong, after all ;-) -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: skip asan-poisoning of discarded vars
On Jan 21, 2021, Alexandre Oliva wrote: > On Jan 21, 2021, Alexandre Oliva wrote: >> But I was wrong. The bootstrap with the added assert has just failed, >> as early as stage2 libiberty. Looking into it... > Uhh, I take that back. I just goofed in the assert, inverting the > condition. Long day... > With the correct condition, it's got past the stage2 compilation of all > of the gcc deps and Ada sources, and then some. Maybe my reasoning > wasn't wrong, after all ;-) Yeah, confirmed, bootstrap-asan (and -ubsan) completed on x86_64-linux-gnu, all languages, with the following patchlet (cut&pasted, then retabified manually, so it may not apply mechanically): diff --git a/gcc/gimplify.c b/gcc/gimplify.c index d2ac5f9..c0dcb39 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1797,6 +1797,9 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) && dbg_cnt (asan_use_after_scope) && !gimplify_omp_ctxp) { + gcc_assert (DECL_SEEN_IN_BIND_EXPR_P (decl) + || (DECL_ARTIFICIAL (decl) + && DECL_NAME (decl) == NULL_TREE)); asan_poisoned_variables->add (decl); asan_poison_variable (decl, false, seq_p); if (!DECL_ARTIFICIAL (decl) && gimplify_ctxp->live_switch_vars) -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: follow SSA defs for asan base
On Jan 22, 2021, Richard Biener via Gcc-patches wrote: > Yeah, I guess such addresses could be decl_address_invariant_p by changing > the > || DECL_CONTEXT (op) == current_function_decl > || decl_function_context (op) == current_function_decl) > to sth like >|| auto_var_p (op) > but I guess it won't help much since the access in the nested function > will be through the static chain pointer and thus &chain->member > which definitely isn't a gimple val. I think we only get poison/unpoison calls in the scope of the automatic variables, so it won't be the case that these calls will get such indirect frame references: they will only get references to the own function's frame, and those have invariant addresses, and thus can be regarded as such, and as gimple vals. So I gave this alternate change a spin, and both regstrap and asan+ubsan bootstrap completed successfully. Given the considerations of risk about the assert you pointed out, I'm now inclined to regard this change as safer and superior. Do you all concur? Ok to install if so? (I see the code in tree.c was untabified, and one of my changes introduced a tab that misaligned stuff. Tabify, untabify, or leave it inconsistent as in the tested patch below? regard the address of auto vars and consts as invariant From: Alexandre Oliva Ada makes extensive use of nested functions, which turn all automatic variables of the enclosing function that are used in nested ones into members of an artificial FRAME record type. The address of a local variable is usually passed to asan marking functions without using a temporary. asan_expand_mark_ifn will reject an ADDR_EXPRs if it's split out from the call into an SSA_NAMEs. Taking the address of a member of FRAME within a nested function was not regarded as a gimple val: while introducing FRAME variables, current_function_decl pointed to the outermost function, even while processing a nested function, so decl_address_invariant_p, checking that the context of the variable is current_function_decl, returned false for such ADDR_EXPRs. This patch changes decl_address_invariant_p to disregard current_function_decl, and regard all automatic variables as having invariant addresses, regardless of nesting. This may initially include references to variables in other nesting levels, but once they become references to enclosing frames, the indirection makes them non-gimple_vals. As long as poisoning and unpoisoning calls doesn't kick in for variables in other frames, this shouldn't be a problem. for gcc/ChangeLog * tree.c (decl_address_invariant_p): Accept auto variables and constants. for gcc/testsuite/ChangeLog * gcc.dg/asan/nested-1.c: New. --- gcc/testsuite/gcc.dg/asan/nested-1.c | 24 gcc/tree.c |5 ++--- 2 files changed, 26 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c b/gcc/testsuite/gcc.dg/asan/nested-1.c new file mode 100644 index 0..87e842098077c --- /dev/null +++ b/gcc/testsuite/gcc.dg/asan/nested-1.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=address" } */ + +int f(int i) { + auto int h() { +int r; +int *p; + +{ + int x[3]; + + auto int g() { + return x[i]; + } + + p = &r; + *p = g(); +} + +return *p; + } + + return h(); +} diff --git a/gcc/tree.c b/gcc/tree.c index 287e5001dc3b3..3de3085f42c8a 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -3590,14 +3590,13 @@ decl_address_invariant_p (const_tree op) case VAR_DECL: if ((TREE_STATIC (op) || DECL_EXTERNAL (op)) || DECL_THREAD_LOCAL_P (op) - || DECL_CONTEXT (op) == current_function_decl - || decl_function_context (op) == current_function_decl) + || auto_var_p (op)) return true; break; case CONST_DECL: if ((TREE_STATIC (op) || DECL_EXTERNAL (op)) - || decl_function_context (op) == current_function_decl) + || auto_var_p (op)) return true; break; -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: follow SSA defs for asan base
On Jan 26, 2021, Richard Biener wrote: > So while I think it's safe let's look at if we can improve tree-nested.c, > like I see (probably not the correct place): *nod*, it's just not the *only* place. > seeing how we adjust current_function_decl around the > recompute_tree_invariant_for_addr_expr call but not the > gsi_gimplify_val one (we already pass it a nesting_info, > not sure if wi->info is the same as the 'info' used above though), > so eventually we can fix it in one place? There are pieces of nested function lowering for which we set cfun and current_function_decl while walking each function, and there are other pieces that just don't bother, and we only set up current_function_decl temporarily for ADDR_EXPR handling. This patch adjusts both of the ADDR_EXPR handlers that override current_function_decl, so that the temporary overriding remains in effect during the re-gimplification. That is enough to avoid the problem. But I'm not very happy with this temporary overriding, it seems fishy. I'd rather we set things up for the entire duration of the walking of each function. But that's only relevant because we rely on current_function_decl for address handling. It's not clear to me that we should, as the other patch demonstrated. With it, we could probably even do away with these overriders. But, for this stage, this is probably as conservative a change as we could possibly hope for. I've regstrapped it on x86_64-linux-gnu, and also bootstrapped it with asan and ubsan. Ok to install? restore current_function_decl after re-gimplifying nested ADDR_EXPRs From: Alexandre Oliva Ada makes extensive use of nested functions, which turn all automatic variables of the enclosing function that are used in nested ones into members of an artificial FRAME record type. The address of a local variable is usually passed to asan marking functions without using a temporary. asan_expand_mark_ifn will reject an ADDR_EXPRs if it's split out from the call into an SSA_NAMEs. Taking the address of a member of FRAME within a nested function was not regarded as a gimple val: while introducing FRAME variables, current_function_decl pointed to the outermost function, even while processing a nested function, so decl_address_invariant_p, checking that the context of the variable is current_function_decl, returned false for such ADDR_EXPRs. decl_address_invariant_p, called when determining whether an expression is a legitimate gimple value, compares the context of automatic variables with current_function_decl. Some of the tree-nested function processing doesn't set current_function_decl, but ADDR_EXPR-processing bits temporarily override it. However, they restore it before re-gimplifying, which causes even ADDR_EXPRs referencing automatic variables in the FRAME struct of a nested function to not be regarded as address-invariant. This patch moves the restores of current_function_decl in the ADDR_EXPR-handling bits after the re-gimplification, so that the correct current_function_decl is used when testing for address invariance. for gcc/ChangeLog * tree-nested.c (convert_nonlocal_reference_op): Move current_function_decl restore after re-gimplification. (convert_local_reference_op): Likewise. for gcc/testsuite/ChangeLog * gcc.dg/asan/nested-1.c: New. --- gcc/testsuite/gcc.dg/asan/nested-1.c | 24 gcc/tree-nested.c|4 ++-- 2 files changed, 26 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c b/gcc/testsuite/gcc.dg/asan/nested-1.c new file mode 100644 index 0..87e842098077c --- /dev/null +++ b/gcc/testsuite/gcc.dg/asan/nested-1.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=address" } */ + +int f(int i) { + auto int h() { +int r; +int *p; + +{ + int x[3]; + + auto int g() { + return x[i]; + } + + p = &r; + *p = g(); +} + +return *p; + } + + return h(); +} diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c index 1b52669b622aa..addd6eef9aba6 100644 --- a/gcc/tree-nested.c +++ b/gcc/tree-nested.c @@ -1214,7 +1214,6 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data) save_context = current_function_decl; current_function_decl = info->context; recompute_tree_invariant_for_addr_expr (t); - current_function_decl = save_context; /* If the callback converted the address argument in a context where we only accept variables (and min_invariant, presumably), @@ -1222,6 +1221,7 @@ convert_nonlocal_reference_op (tree *tp, int *walk_subtrees, void *data) if (save_val_only) *tp = gsi_gimplify_val ((struct nesting_info *) wi-&
[RFC] test builtin ratio for loop distribution
This patch attempts to fix a libgcc codegen regression introduced in gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2. The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch adds to the loop distribution pass some cost analysis based on preexisting *_RATIO macros, so that we won't transform loops with trip counts as low as the ratios we'd rather expand inline. This patch is not finished; it needs adjustments to the testsuite, to make up for the behavior changes it brings about. Specifically, on a x86_64-linux-gnu regstrap, it regresses: > FAIL: gcc.dg/pr53265.c (test for warnings, line 40) > FAIL: gcc.dg/pr53265.c (test for warnings, line 42) > FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 1 > library cal> FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++14 scan-tree-dump > ldist "split to 0 loops and 1 library calls" > FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++17 scan-tree-dump ldist "split to > 0 loops and 1 library calls" > FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++2a scan-tree-dump ldist "split to > 0 loops and 1 library calls" I suppose just lengthening the loops will take care of ldist-38 and pr78847, but the loss of the warnings in pr53265 is more concerning, and will require investigation. Nevertheless, I seek feedback on whether this is an acceptable approach, or whether we should use alternate tuning parameters for ldist, or something entirely different. Thanks in advance, for gcc/ChangeLog * tree-loop-distribution.c (maybe_normalize_partition): New. (loop_distribution::distribute_loop): Call it. [requires testsuite adjustments and investigation of a warning regression] --- gcc/tree-loop-distribution.c | 54 ++ 1 file changed, 54 insertions(+) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index bb15fd3723fb6..b5198652817ee 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec *partitions) } } +/* Return false if it's profitable to turn the LOOP PARTITION into a builtin + call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL. */ + +static bool +maybe_normalize_partition (class loop *loop, struct partition *partition) +{ + unsigned HOST_WIDE_INT ratio; + + switch (partition->kind) +{ +case PKIND_NORMAL: +case PKIND_PARTIAL_MEMSET: + return false; + +case PKIND_MEMSET: + if (integer_zerop (gimple_assign_rhs1 (DR_STMT +(partition->builtin->dst_dr + ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop)); + else + ratio = SET_RATIO (optimize_loop_for_speed_p (loop)); + break; + +case PKIND_MEMCPY: +case PKIND_MEMMOVE: + ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop)); + break; + +default: + gcc_unreachable (); +} + + tree niters = number_of_latch_executions (loop); + if (niters == NULL_TREE || niters == chrec_dont_know) +return false; + + wide_int minit, maxit; + value_range_kind vrk = determine_value_range (niters, &minit, &maxit); + if (vrk == VR_RANGE && wi::ltu_p (maxit, ratio)) +{ + partition->kind = PKIND_NORMAL; + return true; +} + + return false; +} + void loop_distribution::finalize_partitions (class loop *loop, vec *partitions, @@ -3087,6 +3133,14 @@ loop_distribution::distribute_loop (class loop *loop, vec stmts, } finalize_partitions (loop, &partitions, &alias_ddrs); + { +bool any_changes_p = false; +for (i = 0; partitions.iterate (i, &partition); ++i) + if (maybe_normalize_partition (loop, partition)) + any_changes_p = true; +if (any_changes_p) + finalize_partitions (loop, &partitions, &alias_ddrs); + } /* If there is a reduction in all partitions make sure the last one is not classified for builtin code generation. */ -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
[RFC] mask out mult expr ctz bits from nonzero bits
While looking into the possibility of introducing setmemM patterns on RISC-V to undo the transformation from loops of word writes into memset, I was disappointed to find out that get_nonzero_bits would take into account the range of the length passed to memset, but not the trivially-available observation that this length was a multiple of the word size. This knowledge, if passed on to setmemM, could enable setmemM to output more efficient code. In the end, I did not introduce a setmemM pattern, nor the machinery to pass the ctz of the length on to it along with other useful information, but I figured this small improvement to nonzero_bits could still improve code generation elsewhere. https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564341.html Regstrapped on x86_64-linux-gnu. No analysis of codegen impact yet. Does this seem worth pursuing, presumably for stage1? for gcc/ChangeLog * tree-ssanames.c (get_nonzero_bits): Zero out low bits of integral types, when a MULT_EXPR INTEGER_CST operand ensures the result will be a multiple of a power of two. --- gcc/tree-ssanames.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c index 51a26d2fce1c2..c4b5bf2a4999a 100644 --- a/gcc/tree-ssanames.c +++ b/gcc/tree-ssanames.c @@ -546,10 +546,29 @@ get_nonzero_bits (const_tree name) } range_info_def *ri = SSA_NAME_RANGE_INFO (name); + wide_int ret; if (!ri) -return wi::shwi (-1, precision); +ret = wi::shwi (-1, precision); + else +ret = ri->get_nonzero_bits (); + + /* If NAME is defined as a multiple of a constant C, we know the ctz(C) low + bits are zero. ??? Should we handle LSHIFT_EXPR too? Non-constants, + e.g. the minimum shift count, and ctz from both MULT_EXPR operands? That + could make for deep recursion. */ + if (INTEGRAL_TYPE_P (TREE_TYPE (name)) + && SSA_NAME_DEF_STMT (name) + && is_gimple_assign (SSA_NAME_DEF_STMT (name)) + && gimple_assign_rhs_code (SSA_NAME_DEF_STMT (name)) == MULT_EXPR + && TREE_CODE (gimple_assign_rhs2 (SSA_NAME_DEF_STMT (name))) == INTEGER_CST) +{ + unsigned HOST_WIDE_INT bits + = tree_ctz (gimple_assign_rhs2 (SSA_NAME_DEF_STMT (name))); + wide_int mask = wi::shwi (-1, precision) << bits; + ret &= mask; +} - return ri->get_nonzero_bits (); + return ret; } /* Return TRUE is OP, an SSA_NAME has a range of values [0..1], false -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Jan 27, 2021, Richard Biener wrote: > That said, rather than not transforming the loop as you do I'd > say we want to re-inline small copies more forcefully during > loop distribution code-gen so we turn a loop that sets > 3 'short int' to zero into a 'int' store and a 'short' store for example > (we have more code-size leeway here because we formerly had > a loop). Ok, that makes sense to me, if there's a chance of growing the access stride. > Since you don't add a testcase Uhh, sorry, I mentioned TFmode emulation calls, but I wasn't explicit I meant the soft-fp ones from libgcc. ./xgcc -B./ -O2 $srcdir/libgcc/soft-fp/fixtfdi.c \ -I$srcdir/libgcc/config/riscv -I$srcdir/include \ -S -o - | grep memset > I can't see whether the actual case would be fixed by setting SSA > pointer alignment on the memset arguments The dest pointer alignment is known at the builtin expansion time, that's not a problem. What isn't known (*) is that the length is a multiple of the word size: what gets to the expander is that it's between 4 and 12 bytes (targeting 32-bit risc-v), but not that it's either 4, 8 or 12. Coming up with an efficient inline expansion becomes a bit of a challenge without that extra knowledge. (*) at least before my related patch for get_nonzero_bits https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564344.html -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Jan 28, 2021, Richard Biener wrote: > That would allow turning back the memset into the original loop (but > with optimal IVs, etc.). Is this sort of what you had in mind? I haven't tested the inline expansion of memset much yet; and that of memcpy, not at all; this really is mainly to check that I understood what you had in mind before I spend further time polishing it. test builtin ratio for loop distribution From: Alexandre Oliva The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch adds, to the loop distribution pass, the ability to perform inline expansion of bounded variable-length memset and memcpy in ways that take advantage of known alignments and size's factors, when preexisting *_RATIO macros suggest the inline expansion is advantageous. for gcc/ChangeLog * tree-loop-distribution.c: Include builtins.h. (generate_memset_builtin): Introduce optional inline expansion of bounded variable-sized memset calls. (generate_memcpy_builtin): Likewise for memcpy only. (loop_distribution::execute): Fix loop structure afterwards. --- gcc/tree-loop-distribution.c | 280 ++ 1 file changed, 279 insertions(+), 1 deletion(-) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index bb15fd3723fb6..3be7a4c1ac281 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -115,6 +115,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" #include "gimple-fold.h" +#include "builtins.h" #define MAX_DATAREFS_NUM \ @@ -1148,6 +1149,23 @@ generate_memset_builtin (class loop *loop, partition *partition) /* The new statements will be placed before LOOP. */ gsi = gsi_last_bb (loop_preheader_edge (loop)->src); + /* Compute builtin->size range and ctz before it's gimplified; sub-expressions + thereof are rewritten in place, so they end up referencing SSA_NAMEs for + which we don't have VR info. */ + unsigned align = get_pointer_alignment (builtin->dst_base) / BITS_PER_UNIT; + unsigned alctz, szctz, xbits; + wide_int szmin, szmax; + value_range_kind vrk; + if (align) +{ + alctz = wi::ctz (align); + szctz = MIN (tree_ctz (builtin->size), alctz); + xbits = alctz - szctz; + vrk = determine_value_range (builtin->size, &szmin, &szmax); + if (szmin == szmax) + align = 0; +} + nb_bytes = rewrite_to_non_trapping_overflow (builtin->size); nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE, false, GSI_CONTINUE_LINKING); @@ -1172,6 +1190,127 @@ generate_memset_builtin (class loop *loop, partition *partition) val = tem; } + unsigned HOST_WIDE_INT ratio; + if (integer_zerop (val)) +ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop)); + else +ratio = SET_RATIO (optimize_loop_for_speed_p (loop)); + + /* Compare the ratio with the number of (most-aligned) required stores needed + for szmax bytes, with the ratio: a loop that iterates up to szmax >> alctz, + and then one conditional store for each extra bit of alignment that the + destination has over the size. */ + if (align && vrk == VR_RANGE + && wi::ltu_p (wi::rshift (szmax, alctz, UNSIGNED) + xbits, ratio)) +{ + gimple *stmt; + tree bytes = create_tmp_var_raw (size_type_node, "ldistbytes"); + tree ptr = create_tmp_var_raw (build_pointer_type (char_type_node), +"ldistptr"); + tree stval = force_gimple_operand_gsi (&gsi, +fold_convert +(unsigned_char_type_node, val), +true, NULL_TREE, false, +GSI_CONTINUE_LINKING); + + for (unsigned i = 1; i != alctz; i *= 2) + { + unsigned bits = i * 2 * BITS_PER_UNIT; + tree type = build_nonstandard_integer_type (bits, true); + stval = fold_convert (type, stval); + tree op2 = fold_build2_loc (partition->loc, LSHIFT_EXPR,
Re: [RFC] test builtin ratio for loop distribution
On Feb 3, 2021, Richard Biener wrote: > So I think we should try to match what __builtin_memcpy/memset > expansion would do here, taking advantage of extra alignment > and size knowledge. In particular, > a) if __builtin_memcpy/memset would use setmem/cpymem optabs > see if we can have variants of memcpy/memset transferring alignment > and size knowledge We could add more optional parameters to them to convey the length's known ctz. However, the ctz can't be recovered reliably. We can't even recover it after gimplifying the length within ldist! That said, my other patch already enables ctz calls to recover it, at least in libgcc risc-v tfmode cases, and it's possible it's readily available in other cases. I'd rather leave that for someone dealing with the machine-specific patterns to figure out whether a separate argument would be useful. RISC-V, which is what I'm dealing with, doesn't have much to offer as far as these patterns are concerned. > b) if expansion would use BY_PIECES then expand to an unrolled loop Why would that be better than keeping the constant-length memset call, that would be turned into an unrolled loop during expand? > c) if expansion would emit a memset/memcpy call but we know > alignment and have a low bound on niters emit a loop (like your patch > does) > a) might be difficult but adding the builtin variants may pay off in any case. *nod* > The patch itself could benefit from one or two helpers we already > have, first of all there's create_empty_loop_on_edge (so you don't > need the loop fixup) Uhh, thanks, but... you realize nearly all of the gimple-building code is one and the same for the loop and for trailing count misalignment? There doesn't seem to be a lot of benefit to using this primitive, aside from its dealing with creating the loop data structure which, again, I'd only want to do in the loop case. (I guess I should add more comments as to the inline expansion strategy. it's equivalent to: i = len, ptr = base, blksz = 1 << alctz; while (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; } blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; } blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; } ... until blksz gets down to zero or to 1< Note that for memmove if we know the dependence direction, we > can also emit a loop / unrolled code. *nod*, but the logic would have to be quite different, using bit tests, and odds are we won't know the direction and have to output a test and code for both possibilities, which would be quite unlikely to be beneficial. Though the original code would quite likely make the direction visible; perhaps if the size is small enough that we would expand a memcpy inline, we should refrain from transforming the loop into a memmove call. In the case at hand, there's no benefit at all to these transformations: we start from a loop with the known alignment and a small loop count (0 to 2 words copied), and if everything goes all right with the transformation, we may be lucky to get back to that. It's not like the transformation could even increase the known alignment, so why bother, and throw away debug info by rewriting the loop into the same code minus debug? > I think the builtins with alignment and calloc-style element count > will be useful on its own. Oh, I see, you're suggesting actual separate builtin functions. Uhh... I'm not sure I want to go there. I'd much rather recover the ctz of the length, and use it in existing code. I'd also prefer if the generic memset (and memcpy and memmove?) builtin expanders dealt with non-constant lengths in the way I implemented. That feels like the right spot for it. That deprives us of gimple loop optimizations in the inlined loop generated by the current patch, but if we expand an unrolled loop with compares and offsets with small constants, loop optimizations might not even be relevant. FWIW, the patch I posted yesterday is broken, the regstrap test did not even build libstdc++-v3 successfully. I'm not sure whether to pursue it further, or to reimplement it in the expander. Suggestions are welcome. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Feb 4, 2021, Richard Biener wrote: >> > b) if expansion would use BY_PIECES then expand to an unrolled loop >> >> Why would that be better than keeping the constant-length memset call, >> that would be turned into an unrolled loop during expand? > Well, because of the possibly lost ctz and alignment info. Funny you should mention that. I got started with the expand-time expansion yesterday, and found out that we're not using the alignment information that is available. Though the pointer is known to point to an aligned object, we are going for 8-bit alignment for some reason. The strategy I used there was to first check whether by_pieces would expand inline a constant length near the max known length, then loop over the bits in the variable length, expand in each iteration a constant-length store-by-pieces for the fixed length corresponding to that bit, and a test comparing the variable length with the fixed length guarding the expansion of the store-by-pieces. We may get larger code this way (no loops), but only O(log(len)) compares. I've also fixed some bugs in the ldist expander, so now it bootstraps, but with a few regressions in the testsuite, that I'm yet to look into. >> Uhh, thanks, but... you realize nearly all of the gimple-building code >> is one and the same for the loop and for trailing count misalignment? > Sorry, the code lacked comments and so I didn't actually try decipering > the code you generate ;) Oh, come on, it was planly obscure ;-D Sorry for posting an early-draft before polishing it up. > The original motivation was really that esp. for small trip count loops > the target knows best how to implement them. Now, that completely > fails of course in case the target doesn't implement any of this or > the generic code fails because we lost ctz and alignment info. In our case, generic code fails because it won't handle variable-sized clear-by-pieces. But then, I found out, when it's fixed-size, it also makes the code worse, because it seems to expand to byte stores even when the store-to object is known to have wider alignment: union u { long long i; char c[8]; } x[8]; int s(union u *p, int k) { for (int i = k ? 0 : 3; i < 8; i++) { for (int j = 0; j < 8; j++) { p[i].c[j] = 0; } // becomes a memset to an 8-byte-aligned 8-byte object, then 8 byte stores } } >> > I think the builtins with alignment and calloc-style element count >> > will be useful on its own. >> >> Oh, I see, you're suggesting actual separate builtin functions. Uhh... >> I'm not sure I want to go there. I'd much rather recover the ctz of the >> length, and use it in existing code. > Yeah, but when we generate memcpy there might not be a way to > store the ctz info until RTL expansion where the magic should really happen > ... True. It can be recovered without much difficulty in the cases I've looked at, but it could be lost in others. > So I'd say go for improving RTL expansion. 'k, thanks -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: use -mfpu=auto for arm/simd/vmmla_1.c (was: use -mfpu=neon for arm/simd/vmmla_1.c)
On Jan 13, 2021, Alexandre Oliva wrote: > On Jan 5, 2021, Alexandre Oliva wrote: >> So this patch adds -mfpu=auto to the test, overriding any implicit >> flags with the fpu implied by the arch. >> * gcc.target/arm/simd/vmmla_1.c: Pass -mfpu=auto. > Ping? > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562798.html Ping? -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: use -mfpu=neon for arm/simd/vmmla_1.c
On Feb 9, 2021, Kyrylo Tkachov wrote: > Ok. Aren't there more tests that have this problem? Thanks. This was the only test that exhibited this problem. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Feb 4, 2021, Jim Wilson wrote: > FYI we have a bug report for this for a coremark regression which sounds > like the same problem. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092 Indeed, thanks! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Feb 4, 2021, Alexandre Oliva wrote: > On Feb 4, 2021, Richard Biener wrote: >>> > b) if expansion would use BY_PIECES then expand to an unrolled loop >>> >>> Why would that be better than keeping the constant-length memset call, >>> that would be turned into an unrolled loop during expand? >> Well, because of the possibly lost ctz and alignment info. > Funny you should mention that. I got started with the expand-time > expansion yesterday, and found out that we're not using the alignment > information that is available. Though the pointer is known to point to > an aligned object, we are going for 8-bit alignment for some reason. > The strategy I used there was to first check whether by_pieces would > expand inline a constant length near the max known length, then loop > over the bits in the variable length, expand in each iteration a > constant-length store-by-pieces for the fixed length corresponding to > that bit, and a test comparing the variable length with the fixed length > guarding the expansion of the store-by-pieces. We may get larger code > this way (no loops), but only O(log(len)) compares. > I've also fixed some bugs in the ldist expander, so now it bootstraps, > but with a few regressions in the testsuite, that I'm yet to look into. A few more fixes later, this seems to work. It encompasses the patch to recover tree_ctz from a mult_expr def, it adds code to set up the alignment data for the ldist-generated memset dest, and then it expands memset as described above if length is not constant, setmem is not available, but the by-pieces machinery would still be used for nearby constants. How does this look? [PR94092] introduce try store by multiple pieces From: Alexandre Oliva The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch handles variable lengths with multiple conditional power-of-2-constant-sized stores-by-pieces, so as to reduce the overhead of length compares. It also propagates the alignment of the memset dest, introduced by loop-distribution, to the SSA pointer used to hold it, so that it is not discarded right away, and may be recovered by the memset builtin expander. Finally, it computes the ctz of the length, and uses it to omit blocks for stores of small byte counts when we're storing whole words. tree_ctz is improved so as to look at the length DEF stmt, and zero out the least-significant bits if it's a multiply by a power of two. for gcc/ChangeLog PR tree-optimization/94092 * builtins.c (try_store_by_multiple_pieces): New. (expand_builtin_memset_args): Use it. If target_char_cast fails, proceed as for non-constant val. Pass len's ctz to... * expr.c (clear_storage_hints): ... this. Try store by multiple pieces after setmem. (clear_storage): Adjust. * expr.h (clear_storage_hints): Likewise. (try_store_by_multiple_pieces): Declare. * tree-loop-distribution.c: Include builtins.h. (generate_memset_builtin): Propagate dst_base alignmen to mem. * tree-ssanames.c (get_nonzero_bits): Zero out low bits of integral types, when a MULT_EXPR INTEGER_CST operand ensures the result will be a multiple of a power of two. --- gcc/builtins.c | 113 +++--- gcc/expr.c |9 +++ gcc/expr.h | 12 gcc/tree-loop-distribution.c |9 +++ gcc/tree-ssanames.c | 23 - 5 files changed, 154 insertions(+), 12 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 0aed008687ccc..44f3af92536a5 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -6600,6 +6600,97 @@ expand_builtin_memset (tree exp, rtx target, machine_mode mode) return expand_builtin_memset_args (dest, val, len, target, mode, exp); } +/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO. + Return TRUE if successful, FALSE otherwise. TO is assumed to be + aligned at an ALIGN boundary. LEN is assumed to be a multiple of + 1<= ctz_len; i--) +{ + unsigned HOST_WIDE_INT blksize = HOST_WIDE_INT_1U << i; + rtx_code_label *label = NULL;
Re: [RFC] test builtin ratio for loop distribution
On Feb 11, 2021, Alexandre Oliva wrote: > How does this look? > for gcc/ChangeLog > PR tree-optimization/94092 > * builtins.c (try_store_by_multiple_pieces): New. > (expand_builtin_memset_args): Use it. If target_char_cast > fails, proceed as for non-constant val. Pass len's ctz to... > * expr.c (clear_storage_hints): ... this. Try store by > multiple pieces after setmem. > (clear_storage): Adjust. > * expr.h (clear_storage_hints): Likewise. > (try_store_by_multiple_pieces): Declare. > * tree-loop-distribution.c: Include builtins.h. > (generate_memset_builtin): Propagate dst_base alignmen to mem. > * tree-ssanames.c (get_nonzero_bits): Zero out low bits of > integral types, when a MULT_EXPR INTEGER_CST operand ensures > the result will be a multiple of a power of two. I forgot to mention this passed regstrap on x86_64-linux-gnu, as well as some cross testing of riscv32-elf. I've also regstrapped it on x86_64-linux-gnu along with a patch for testing purposes, that tried try_store_by_multiple_pieces before setmem in all 3 locations where they are called, which gives me some confidence that the implementation is reasonably robust. Is this ok to install? (if not right now, perhaps in stage1) -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Feb 12, 2021, Richard Biener wrote: >> + if (TREE_CODE (mem) == SSA_NAME) >> +if (ptr_info_def *pi = get_ptr_info (mem)) >> + { >> + unsigned al = get_pointer_alignment (builtin->dst_base); >> + if (al > pi->align || pi->misalign) > We still might prefer pi->align == 64 and pi->misalign == 32 over al == 16 > so maybe factor that in, too. Ugh, apologies, I somehow posted an incorrect and outdated version of the patch. The improved (propagates both alignments) and fixed (divides by BITS_PER_UNIT, fixing a regression in gfortran.dg/sms-2.f90) had this alternate hunk as the only difference: @@ -1155,6 +1156,16 @@ generate_memset_builtin (class loop *loop, partition *partition) mem = force_gimple_operand_gsi (&gsi, mem, true, NULL_TREE, false, GSI_CONTINUE_LINKING); + if (TREE_CODE (mem) == SSA_NAME) +if (ptr_info_def *pi = get_ptr_info (mem)) + { + unsigned al; + unsigned HOST_WIDE_INT misal; + if (get_pointer_alignment_1 (builtin->dst_base, &al, &misal)) + set_ptr_info_alignment (pi, al / BITS_PER_UNIT, + misal / BITS_PER_UNIT); + } + /* This exactly matches the pattern recognition in classify_partition. */ val = gimple_assign_rhs1 (DR_STMT (builtin->dst_dr)); /* Handle constants like 0x15151515 and similarly > So I wonder whether we should instead re-run CCP after loop opts which > computes nonzero bits as well instead of the above "hack". Would > nonzero bits be ready to compute in the above way from loop distribution? > That is, you can do set_nonzero_bits just like you did > set_ptr_info_alignment ... > Since CCP also performs copy propagation an obvious candidate would be > to replace the last pass_copy_prop with pass_ccp (with a comment noting > to compute up-to-date nonzero bits info). I'll look into these possibilities. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [RFC] test builtin ratio for loop distribution
On Feb 16, 2021, Alexandre Oliva wrote: >> So I wonder whether we should instead re-run CCP after loop opts which >> computes nonzero bits as well instead of the above "hack". That works. It takes care of both the dest alignment and the len ctz. Explicitly masking out the len tz from nonzero bits also works, but I suppose the ccp pass does a better job, and it takes care of memcpy and memmove as well. I noticed that ccp releases CDI_DOMINATORS information. It looks like the last copy_prop pass, now replaced with ccp in my tree, was late enough that it doesn't seem to matter, but at first I thought you meant an earlier copy_prop, that runs before graphite, and dropping dominator info at that point was a problem for e.g. cunroll. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
[PR94092] Re: [RFC] test builtin ratio for loop distribution
Here's an improved version of the patch. Regstrapped on x86_64-linux-gnu, with and without a patchlet that moved multi-pieces ahead of setmem, and also tested with riscv32-elf. Is it ok to install? Or should it wait for stage1? [PR94092] introduce try store by multiple pieces From: Alexandre Oliva The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length clearing memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch handles variable lengths with multiple conditional power-of-2-constant-sized stores-by-pieces, so as to reduce the overhead of length compares. It also changes the last copy-prop pass into ccp, so that pointer alignment and length's nonzero bits are detected and made available for the expander, even for ldist-introduced SSA_NAMEs. for gcc/ChangeLog PR tree-optimization/94092 * builtins.c (try_store_by_multiple_pieces): New. (expand_builtin_memset_args): Use it. If target_char_cast fails, proceed as for non-constant val. Pass len's ctz to... * expr.c (clear_storage_hints): ... this. Try store by multiple pieces after setmem. (clear_storage): Adjust. * expr.h (clear_storage_hints): Likewise. (try_store_by_multiple_pieces): Declare. * passes.def: Replace the last copy_prop to ccp. --- gcc/builtins.c | 182 ++-- gcc/expr.c |9 ++- gcc/expr.h | 13 gcc/passes.def |5 +- 4 files changed, 197 insertions(+), 12 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 0aed008687ccc..05b14f2a3f6d3 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -6600,6 +6600,166 @@ expand_builtin_memset (tree exp, rtx target, machine_mode mode) return expand_builtin_memset_args (dest, val, len, target, mode, exp); } +/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO. + Return TRUE if successful, FALSE otherwise. TO is assumed to be + aligned at an ALIGN-bits boundary. LEN must be a multiple of + 1<= 0); + + if (val) +valc = 1; + + /* Bits more significant than TST_BITS are part of the shared prefix + in the binary representation of both min_len and max_len. Since + they're identical, we don't need to test them in the loop. */ + int tst_bits = (max_bits != min_bits ? max_bits + : floor_log2 (max_len ^ min_len)); + + /* Check whether it's profitable to start by storing a fixed BLKSIZE + bytes, to lower max_bits. In the unlikely case of a constant LEN + (implied by identical MAX_LEN and MIN_LEN), we want to issue a + single store_by_pieces, but otherwise, select the minimum multiple + of the ALIGN (in bytes) and of the MCD of the possible LENs, that + brings MAX_LEN below TST_BITS, if that's lower than min_len. */ + unsigned HOST_WIDE_INT blksize; + if (max_len > min_len) +{ + unsigned HOST_WIDE_INT alrng = MAX (HOST_WIDE_INT_1U << ctz_len, + align / BITS_PER_UNIT); + blksize = max_len- (HOST_WIDE_INT_1U << tst_bits) + alrng; + blksize &= ~(alrng - 1); +} + else if (max_len == min_len) +blksize = max_len; + else +gcc_unreachable (); + if (min_len >= blksize) +{ + min_len -= blksize; + min_bits = floor_log2 (min_len); + max_len -= blksize; + max_bits = floor_log2 (max_len); + + tst_bits = (max_bits != min_bits ? max_bits +: floor_log2 (max_len ^ min_len)); +} + else +blksize = 0; + + /* Check that we can use store by pieces for the maximum store count + we may issue (initial fixed-size block, plus conditional + power-of-two-sized from max_bits to ctz_len. */ + unsigned HOST_WIDE_INT xlenest = blksize; + if (max_bits >= 0) +xlenest += ((HOST_WIDE_INT_1U << max_bits) * 2 + - (HOST_WIDE_INT_1U << ctz_len)); + if (!can_store_by_pieces (xlenest, builtin_memset_read_str, + &valc, align, true)) +return false; + + rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode); + void *constfundata; + if (val) +{ + constfun = builtin_memset_gen_str; + constfundata = val = force_reg (TYPE_MODE (unsigned_char_type_node), + val); +} + else +
add rv64im{,c,fc} multilibs
We've had customer demand for these multilibs. We'd be happy to maintain this change locally, but I thought I'd contribute the patch, just in case there's wider interest in them. WDYT? for gcc/ChangeLog * config/riscv/t-elf-multilib: Add multilibs for rv64im, rv64imc, and rv64imfc/lp64f. --- gcc/config/riscv/t-elf-multilib | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/t-elf-multilib b/gcc/config/riscv/t-elf-multilib index 19f9434616c2d..b268e26c954c8 100644 --- a/gcc/config/riscv/t-elf-multilib +++ b/gcc/config/riscv/t-elf-multilib @@ -1,6 +1,6 @@ # This file was generated by multilib-generator with the command: -# ./multilib-generator rv32i-ilp32--c rv32im-ilp32--c rv32iac-ilp32-- rv32imac-ilp32-- rv32imafc-ilp32f-rv32imafdc- rv64imac-lp64-- rv64imafdc-lp64d-- -MULTILIB_OPTIONS = march=rv32i/march=rv32ic/march=rv32im/march=rv32imc/march=rv32iac/march=rv32imac/march=rv32imafc/march=rv32imafdc/march=rv32gc/march=rv64imac/march=rv64imafdc/march=rv64gc mabi=ilp32/mabi=ilp32f/mabi=lp64/mabi=lp64d +# ./multilib-generator rv32i-ilp32--c rv32im-ilp32--c rv32iac-ilp32-- rv32imac-ilp32-- rv32imafc-ilp32f-rv32imafdc- rv64im-lp64-- rv64imc-lp64-- rv64imfc-lp64f-- rv64imac-lp64-- rv64imafdc-lp64d-- +MULTILIB_OPTIONS = march=rv32i/march=rv32ic/march=rv32im/march=rv32imc/march=rv32iac/march=rv32imac/march=rv32imafc/march=rv32imafdc/march=rv32gc/march=rv64im/march=rv64imc/march=rv64imfc/march=rv64imac/march=rv64imafdc/march=rv64gc mabi=ilp32/mabi=ilp32f/mabi=lp64/mabi=lp64f/mabi=lp64d MULTILIB_DIRNAMES = rv32i \ rv32ic \ rv32im \ @@ -10,17 +10,24 @@ rv32imac \ rv32imafc \ rv32imafdc \ rv32gc \ +rv64im \ +rv64imc \ +rv64imfc \ rv64imac \ rv64imafdc \ rv64gc ilp32 \ ilp32f \ lp64 \ +lp64f \ lp64d MULTILIB_REQUIRED = march=rv32i/mabi=ilp32 \ march=rv32im/mabi=ilp32 \ march=rv32iac/mabi=ilp32 \ march=rv32imac/mabi=ilp32 \ march=rv32imafc/mabi=ilp32f \ +march=rv64im/mabi=lp64 \ +march=rv64imc/mabi=lp64 \ +march=rv64imfc/mabi=lp64f \ march=rv64imac/mabi=lp64 \ march=rv64imafdc/mabi=lp64d MULTILIB_REUSE = march.rv32i/mabi.ilp32=march.rv32ic/mabi.ilp32 \ -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add rv64im{,c,fc} multilibs
On Feb 23, 2021, Kito Cheng wrote: > We've added a new configure option to allow you to override that > without changing source code. Ah, nice, thanks! I'll add a note on our internal patch to switch to that when we switch to GCC 11. I take your response as confirming my expectation that the defaults are to remain unchanged for now, and I will thus proceed under this assumption. Thanks for the prompt response! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add rv64im{,c,fc} multilibs
On Feb 23, 2021, Jim Wilson wrote: > If we add default multilibs for you, then to be fair, we need to add > default multilibs for other people that ask, and before long we are trying > to build hundreds or maybe even thousands of multilibs by default which is > unworkable. *nod*, it's a very familiar issue to me, I know where that's coming from, no worries. I expected the change would be turned down, and for good reason, unless there was say an emerging growth of adoption on those multilibs, which I wasn't aware of. The expectation didn't stop me from offering the patch just in case, that's all. > People that want a different set can define their own, and we have > made it easy for people to define their own sets of multilibs as Kito > pointed out. *nod*, thanks, -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
[FYI] Add missing dg-require-effective-target fpic directives to powerpc tests
t { powerpc-*-linux* && ilp32 } } } */ /* { dg-options "-O2 -fpic -fno-reorder-blocks" } */ +/* { dg-require-effective-target fpic } */ /* Analog of pr79439-1.c for 32-bit Linux. */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr83629.c b/gcc/testsuite/gcc.target/powerpc/pr83629.c index 976b564e927d7..8900010fb1d0a 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr83629.c +++ b/gcc/testsuite/gcc.target/powerpc/pr83629.c @@ -1,6 +1,7 @@ /* { dg-require-effective-target ilp32 } */ /* { dg-require-effective-target fpic } */ /* { dg-options "-O2 -fPIC -frename-registers --param=sched-autopref-queue-depth=0 -mdejagnu-cpu=603" } */ +/* { dg-require-effective-target fpic } */ extern void bar (void *); diff --git a/gcc/testsuite/gcc.target/powerpc/pr84112.c b/gcc/testsuite/gcc.target/powerpc/pr84112.c index 8fbafa1b1ca59..82a7d2a60f055 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr84112.c +++ b/gcc/testsuite/gcc.target/powerpc/pr84112.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target fpic } */ /* { dg-options "-mdejagnu-cpu=power8 -O3 -fstack-protector-strong -fpic" } */ +/* { dg-require-effective-target fpic } */ char *b; int c, d, e, f; -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [FYI] Add missing dg-require-effective-target fpic directives to powerpc tests
revert just-added duplicate fpic target requirement A moment after pushing the previous patch, I noticed the fpic target requirement markers had already been added to some of the files in the patch from long ago that I've just contributed. This patch reverts the duplicates. for gcc/testsuite/ChangeLog * gcc.target/powerpc/pr67789.c: Revert fpic target requirement duplication. * gcc.target/powerpc/pr83629.c: Likewise. * gcc.target/powerpc/pr84112.c: Likewise. --- gcc/testsuite/gcc.target/powerpc/pr67789.c |1 - gcc/testsuite/gcc.target/powerpc/pr83629.c |1 - gcc/testsuite/gcc.target/powerpc/pr84112.c |1 - 3 files changed, 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/pr67789.c b/gcc/testsuite/gcc.target/powerpc/pr67789.c index ea77ec979cd5a..05d01ef20d777 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr67789.c +++ b/gcc/testsuite/gcc.target/powerpc/pr67789.c @@ -1,7 +1,6 @@ /* { dg-do assemble } */ /* { dg-require-effective-target fpic } */ /* { dg-options "-O2 -msecure-plt -fPIC" } */ -/* { dg-require-effective-target fpic } */ /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */ #define FE_TONEAREST 0 diff --git a/gcc/testsuite/gcc.target/powerpc/pr83629.c b/gcc/testsuite/gcc.target/powerpc/pr83629.c index 8900010fb1d0a..976b564e927d7 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr83629.c +++ b/gcc/testsuite/gcc.target/powerpc/pr83629.c @@ -1,7 +1,6 @@ /* { dg-require-effective-target ilp32 } */ /* { dg-require-effective-target fpic } */ /* { dg-options "-O2 -fPIC -frename-registers --param=sched-autopref-queue-depth=0 -mdejagnu-cpu=603" } */ -/* { dg-require-effective-target fpic } */ extern void bar (void *); diff --git a/gcc/testsuite/gcc.target/powerpc/pr84112.c b/gcc/testsuite/gcc.target/powerpc/pr84112.c index 82a7d2a60f055..8fbafa1b1ca59 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr84112.c +++ b/gcc/testsuite/gcc.target/powerpc/pr84112.c @@ -1,7 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target fpic } */ /* { dg-options "-mdejagnu-cpu=power8 -O3 -fstack-protector-strong -fpic" } */ -/* { dg-require-effective-target fpic } */ char *b; int c, d, e, f; -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
add -mpowerpc-gpopt to options for sqrt insn on PowerPC
This patch avoids an ICE in gimplefe-28.c, in our ppc64-vxworks7r2 tests. Tested on x86_64-linux-gnu, and on the affected platform. Ok to install? From: Eric Botcazou for gcc/testsuite/ChangeLog * lib/target-supports.exp (add_options_for_sqrt_insn): For PowerPC targets, add -mpowerpc-gpopt option. --- gcc/testsuite/lib/target-supports.exp |3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index af46c77921482..29faeeba67945 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -7598,6 +7598,9 @@ proc add_options_for_sqrt_insn { flags } { if { [istarget arm*-*-*] } { return [add_options_for_arm_vfp "$flags"] } +if { [istarget powerpc*-*-*] } { + return "$flags -mpowerpc-gpopt" +} return $flags } -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
add powerpc_vsx_ok requirement to undef-bool tests
These tests use -mvsx in their dg-options list, so they are only applicable if the -mvsx option is supported by the compiler. Tested with target ppc64-vx7r2, configured to force altivec disabled, and thus to reject vsx. Ok to install? From: Joel Brobecker gcc/testsuite/ChangeLog: * gcc.target/powerpc/undef-bool-2.c: Add dg-require-effective-target powerpc_vsx_ok directive. * g++.dg/ext/undef-bool-1.C: Add dg-require-effective-target powerpc_vsx_ok directive. --- gcc/testsuite/g++.dg/ext/undef-bool-1.C |1 + gcc/testsuite/gcc.target/powerpc/undef-bool-2.c |1 + 2 files changed, 2 insertions(+) diff --git a/gcc/testsuite/g++.dg/ext/undef-bool-1.C b/gcc/testsuite/g++.dg/ext/undef-bool-1.C index 716e06c1ce413..9cc1cd8726098 100644 --- a/gcc/testsuite/g++.dg/ext/undef-bool-1.C +++ b/gcc/testsuite/g++.dg/ext/undef-bool-1.C @@ -1,6 +1,7 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-options "-O2 -DNO_WARN_X86_INTRINSICS -mvsx" } */ /* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ /* Test to ensure that "bool" gets undef'd in xmmintrin.h when we require strict ANSI. */ diff --git a/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c b/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c index d4944ab1ca6c7..7bc5d18484039 100644 --- a/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c +++ b/gcc/testsuite/gcc.target/powerpc/undef-bool-2.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -std=c11 -DNO_WARN_X86_INTRINSICS -mvsx" } */ /* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ /* Test to ensure that "bool" gets undef'd in xmmintrin.h when we require strict ANSI. Subsequent use of bool needs stdbool.h. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC
On Feb 26, 2021, Segher Boessenkool wrote: > On Fri, Feb 26, 2021 at 12:31:16PM -0500, David Edelsohn wrote: >> On Fri, Feb 26, 2021 at 11:09 AM Alexandre Oliva wrote: >> > >> > This patch avoids an ICE in gimplefe-28.c, in our ppc64-vxworks7r2 >> > tests. Tested on x86_64-linux-gnu, and on the affected platform. Ok to >> > install? >> >> I'm sort of surprised that sqrt instruction would be available for the >> target but not enabled by default. Is this really the method that >> customers would use? It seems that it might be more appropriate to >> xfail or skip the test for PPC64 VxWorks. > You should essentially never use -mpowerpc-gpopt, but instead use a > -mcpu= that has it. You also of course whould not do that in run tests > if the cpu does not support those insns. I think the feedback is missing the point of the obvious bug that Eric's patch fixes. We have a dejagnu proc that should return any target-specific options necessary for a sqrt insn to be available: # Return any additional options to enable square root intructions. powerpc has an optional sqrt insn, but the option that enables it is not returned by that proc. That's a blatang bug in that proc. Do you see that, or do you have any sensible reasoning to share to support the position that the proc is NOT buggy? This proc is presumably only used in compile tests; it wouldn't make sense to assume an optional sqrt insn to be available on whatever hardware variant you happen to be using for testing. But the bug fixed by Eric's patch is pretty blatant, and I don't think it makes sense to reject this fix, and insist on a fix of another bug instead. That would just leave *this* bug in the dejagnu proc unfixed. Now, maybe a different flag is to be returned, but -mpowerpc-gpopt is the flag that enables sqrt with minimal disturbance of any other cpu selections in effect, so I believe that's the option that best serves the purpose of making the hardware sqrt insn available, regardless of whatever would be recommended to end users. > But, Alex, you say this avoids an ICE? You shouldn't avoid the ICE in > that testcase then -- instead, fix it! (Or report it). Orthogonal issue, IMHO. If you restate the response as "the proposed patch is fine as long as a bug report is on file for the error encountered when attempting to expand . when a sqrt intrinsic is not available", I can go along with it. But saying "we don't want to fix this bug, we want to fix another vaguely related bug *instead*", will make our stances mutually incompatible, and would likely result in my pursuing neither bug. While at that, if -mpowerpc-gpopt is not to be used, maybe you'll want to fix, or file a bug report, about the multiple tests gcc.target/powerpc/pr46728-*.c, that use it explicitly, and for the very purpose of enabling the fsqrt insn. Several of these are execution tests, so they fail on systems that don't offer fsqrt. I suppose these should mention *_sqrt_insn target requirements and add options. I haven't looked into how they interact, if at all. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [PATCH] gcc.misc-tests/outputs.exp: assert unique test-names
On Feb 25, 2021, Hans-Peter Nilsson wrote: > Navigating and debugging causes for failing tests here isn't > helped by the existence of tests with duplicate names. Aah, I guess I see what happened: some test sets were copied to cover additional cases I hadn't covered (cool :-), but the test names were not changed. I wonder if we wouldn't be better off using say $bp instead of $b in test name strings, so that bp could be set to "$b whatever" or "$b" before a group of tests. This would make it easier to copy blocks of tests, but it would make it harder to search for a failing test. I suppose the changes you proposed, along with code that rejects duplicates, is preferrable: > gcc/testsuite: > * gcc.misc-tests/outputs.exp: Append discriminating > suffixes to tests with duplicate names. > (outest): Assert that each running test has a unique > name. LGTM, thanks! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [PATCH] outputs.exp: skip @file -save-temps if target has -L or -I
On Feb 25, 2021, Hans-Peter Nilsson wrote: > Additional files are created in presence of @file option. Oh, wow. I hope nobody uses @file in target board files ;-) > gcc/testsuite: > * gcc.misc-tests/outputs.exp: Skip @file -save-temps > tests if target test-framework has -L or -I options. Thanks, this looks reasonable. I suppose it might be possible to record the presence of -I and -L flags separately, and adjust the @file tests so as to conditionally expect extra files where they'd appear, but I don't see that this would increase test coverage, so ISTM that would be a pointless effort. So we're good. Thanks! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC
On Mar 2, 2021, David Edelsohn wrote: > The procs are used in more than that one test. Err, are you looking at the trunk? In my tree, there are only two tests that mention sqrt_insn, and it's two rather than one just because I added the flag to another test myself, in a patch yet to be contributed. Here's the patch that introduced the only use of the proc that is known to me: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01204.html As for the to-be-contributed patch, it adds the sqrt_insn feature option to cdce3.c. Like gimplefe-28.c, a compile-only front-end test that depends on the existence of a sqrt insn to do its job, cdce3.c is a compile-only test for a middle-end shrink-wrap optimization, that is only performed when there is a sqrt insn. While we could hide the bug/missing feature in add_options_for_sqrt_insn by constraining check_effective_target_sqrt_insn, the result would be just reduced test coverage for powerpc builds that defaulted to -mno-powerpc-gpopt. A downside without any upside. Whereas if we fix the former proc to perform its documented function on powerpc, namely return the options required to enable the fsqrt insn, the end result is that, if the options do the job, we get those two tests performed, whereas if they happen to be incompatible with other settings to the point of raising an error, we (should?) skip the tests. In my book, that's upsides without downsides. Now, if the grounds for rejecting the patch was based on the understanding that the proc was used elsewhere, with a different function, I hope this demonstrates that this understanding was based on incorrect information (that I may have hinted at myself; sorry if so), and now that it's been corrected, I request the rejection of this approach to be reconsidered. I suppose the patch as-is might still be rejected, because it introduces only -mpowerpc-gpopt, without -mno-soft-float. Since in some configurations it may take both of them to enable the fsqrt insn, would an alternate version that returned both be deemed acceptable instead? Maybe we also want to document in the proc that these added options can only be used in compile tests, because there's no expectation or guarantee that the so-enabled feature is available in the target under test. But AFAIK this has always been the case, and now I see and confirm that a feature-option-adding call is always followed by a feature-available-check call. Thanks, -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC
On Mar 2, 2021, Segher Boessenkool wrote: > This is PR99352 now. Thanks. I've filed PR99371 for the add_options_for_sqrt_insn incompleteness, and PR99372 for the gimplefe-28.c ICE. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC
On Mar 3, 2021, Segher Boessenkool wrote: > On Wed, Mar 03, 2021 at 04:45:23PM -0300, Alexandre Oliva wrote: >> On Mar 2, 2021, Segher Boessenkool wrote: >> >> > This is PR99352 now. >> >> Thanks. I've filed PR99371 for the add_options_for_sqrt_insn >> incompleteness, > As I said before, I don't agree with that. Maybe you'll change of mind if you try to make sense of why the proc has, since its inception, added vfp options to enable sqrt on arm, regardless of whether vfp is available with the processor being tested, and realize this is not different from the case of powerpc. > If a user disabled it, we should *not* reenable it, that reduces > testing surface. It's skipping the test, as the change you propose, that reduces testing surface, when testing only a configuration that ends up skipping it. Now, if you're testing multiple combinations, skipping or running does *not* change the test surface. So rejecting Eric's patch makes for a no-op in one case, and a reduction in another. Whereas installing it makes for a no-op in one case, and an increase in another. Please explain how you came to the conclusion that this amounts to reducing hte test surface. Something appears to be amiss in that reasoning. >> and PR99372 for the gimplefe-28.c ICE. > This is fixed trivially by the PR99352 patch as far as I can see. If your patch also deals with the ICE that appears with the options named in PR99372, great. > Please verify (I'll post it later today). Please Cc me so I don't miss it, thanks, -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: add powerpc_vsx_ok requirement to undef-bool tests
On Feb 26, 2021, David Edelsohn wrote: > This patch is okay. Thanks, I've finally checked it in. >> From: Joel Brobecker >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/undef-bool-2.c: Add >> dg-require-effective-target powerpc_vsx_ok directive. >> * g++.dg/ext/undef-bool-1.C: Add dg-require-effective-target >> powerpc_vsx_ok directive. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: retain debug stmt order when moving to successors
On Jul 28, 2021, Richard Biener wrote: > OK. Thanks, I've finally put this in. Sorry about the delay. > On Wed, Jul 28, 2021 at 10:12 AM Alexandre Oliva wrote: >> * tree-inline.c (maybe_move_debug_stmts_to_successors): Don't >> reverse debug stmts. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>
Re: don't access cfun in dump_function_to_file
On Jul 28, 2021, Richard Biener wrote: > OK. Thanks, I've finally put this in as well. >> * tree-cfg.c (dump_function_to_file): Use fun, not cfun. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>
Re: move unreachable user labels to entry point
On Jul 13, 2021, Richard Biener wrote: > The right OMP region suggests something wrt correctness Yeah, as Jakub wrote, we have to choose a block that's in the same region the label belongs to. The proposed patch doesn't change that, it just uses the entry block instead of the previous block, if it satisfies the requirement. I found it made the logic simpler, slightly more efficient, and more predictable, but I'm not attached to the change. Since Jakub objected to it, let's leave it alone. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>
Re: ipa-modref: merge flags when adding escape
On Aug 11, 2021, Jan Hubicka wrote: > This is improved patch Thanks for the proper fix! -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>
fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)
I've hit a bootstrap-debug error involving large subprograms in gcc/ada/sem_ch12.adb. I'm afraid I couldn't narrow it down to a reasonable testcase. thread1 made different decisions about a block containing a builtin_eh_filter call because in one compilation, estimate_num_insns found a cgraph_node for the builtin and could thus get to the is_simple_builtin test, but in the other it didn't. With different insn counts, one stage jump-threaded and the other didn't, and the resulting code diverged quite a bit. The reason the builtin had a cgraph_node in one case but not the other was that modref got a chance to analyze the builtin call when it was the first stmt in the block, and that created the cgraph_node. However, when it was preceded by debug stmts, the loop in analyze_function was cut short after the first debug stmt, because the summary so far was not useful. This patch fixes both issues: skip debug stmts in the analyze_function loop, so as to prevent them from affecting any decisions in the loop, and enable the insn count estimator to get to the is_simple_builtin test when a cgraph_node has not been created for the builtin. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog * ipa-modref.c (analyze_function): Skip debug stmts. * tree-inline.c (estimate_num_insn): Consider builtins even without a cgraph_node. --- gcc/ipa-modref.c |3 ++- gcc/tree-inline.c |4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c index fafd804d4bae4..f0cddbb077aaa 100644 --- a/gcc/ipa-modref.c +++ b/gcc/ipa-modref.c @@ -2108,7 +2108,8 @@ analyze_function (function *f, bool ipa) FOR_EACH_BB_FN (bb, f) { gimple_stmt_iterator si; - for (si = gsi_after_labels (bb); !gsi_end_p (si); gsi_next (&si)) + for (si = gsi_start_nondebug_after_labels_bb (bb); + !gsi_end_p (si); gsi_next_nondebug (&si)) { if (!analyze_stmt (summary, summary_lto, gsi_stmt (si), ipa, &recursive_calls) diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index d0e9f52d5f138..636130fe0019e 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -4436,8 +4436,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights) /* Do not special case builtins where we see the body. This just confuse inliner. */ struct cgraph_node *node; - if (!(node = cgraph_node::get (decl)) - || node->definition) + if ((node = cgraph_node::get (decl)) + && node->definition) ; /* For buitins that are likely expanded to nothing or inlined do not account operand costs. */ -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>
Re: fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)
On Aug 22, 2021, Jan Hubicka wrote: > OK, thanks for looking into this issue! Thanks, I've finally installed it in the trunk. > It seems that analye_stmt indeed does not skip debug stmts. It is very > odd we got so far without hitting build difference. Indeed. That got me thinking... The comments state: If the statement cannot be analyzed (for any reason), the entire function cannot be analyzed by modref. but the implementation also tests, for every statement: || ((!summary || !summary->useful_p (ecf_flags, false)) && (!summary_lto || !summary_lto->useful_p (ecf_flags, false which means that, if the first stmt of a block doesn't add useful information to the summary, we give up. Was this really the intent? -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>
Re: [PATCH 2/4, revised patch applied] PowerPC: Rename functions for min, max, cmove
Hello, Mike, On Sep 11, 2020, Michael Meissner via Gcc-patches wrote: > +case SFmode: > +case DFmode: gcc110 (ppc64) in the build farm didn't like this. The bootstrap compiler barfs on these expressions, because of some constexpr issue I haven't really looked into. I'm testing this patch. I'll check it in when I'm done. use E_*mode instead of just *mode From: Alexandre Oliva g++ 4.8.5 rejected cases with SFmode and DFmode, presumably due to some bug in the constexpr implementation. for gcc/ChangeLog * config/rs6000/rs6000.c (have_compare_and_set_mask): Use E_*mode in cases. --- gcc/config/rs6000/rs6000.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6d0c550..b32fe91 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -15190,8 +15190,8 @@ have_compare_and_set_mask (machine_mode mode) { switch (mode) { -case SFmode: -case DFmode: +case E_SFmode: +case E_DFmode: return TARGET_P9_MINMAX; default: -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
[PATCH] assorted improvements for fold_truth_andor_1
This patch introduces various improvements to the logic that merges field compares. Before the patch, we could merge: (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) into something like: (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK) if both of A's fields live within the same alignment boundaries, and so do B's, at the same relative positions. Constants may be used instead of the object B. The initial goal of this patch was to enable such combinations when a field crossed alignment boundaries, e.g. for packed types. We can't generally access such fields with a single memory access, so when we come across such a compare, we will attempt to combine each access separately. Some merging opportunities were missed because of right-shifts, compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and narrowing conversions, especially after earlier merges. This patch introduces handlers for several cases involving these. Other merging opportunities were missed because of association. The existing logic would only succeed in merging a pair of consecutive compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D. I've generalized the handling of the rightmost compare in the left-hand operand, going for the leftmost compare in the right-hand operand, and then onto trying to merge compares pairwise, one from each operand, even if they are not consecutive, taking care to avoid merging operations with intervening side effects, including volatile accesses. When it is the second of a non-consecutive pair of compares that first accesses a word, we may merge the first compare with part of the second compare that refers to the same word, keeping the compare of the remaining bits at the spot where the second compare used to be. Handling compares with non-constant fields was somewhat generalized, now handling non-adjacent fields. When a field of one object crosses an alignment boundary but the other doesn't, we issue the same load in both compares; gimple optimizers will later turn it into a single load, without our having to handle SAVE_EXPRs at this point. The logic for issuing split loads and compares, and ordering them, is now shared between all cases of compares with constants and with another object. The -Wno-error for toplev.o on rs6000 is because of toplev.c's: if ((flag_sanitize & SANITIZE_ADDRESS) && !FRAME_GROWS_DOWNWARD) and rs6000.h's: #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 \ || (flag_sanitize & SANITIZE_ADDRESS) != 0) The mutually exclusive conditions involving flag_sanitize are now noticed and reported by fold-const.c's: warning (0, "% of mutually exclusive equal-tests" " is always 0"); This patch enables over 12k compare-merging opportunities that we used to miss in a GCC bootstrap. Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu. Ok to install? for gcc/ChangeLog * fold-const.c (prepare_xor): New. (decode_field_reference): Handle xor, shift, and narrowing conversions. (all_ones_mask_p): Remove. (compute_split_boundary_from_align): New. (build_split_load, reuse_split_load): New. (fold_truth_andor_1): Add recursion to combine pairs of non-neighboring compares. Handle xor compared with zero. Handle fields straddling across alignment boundaries. Generalize handling of non-constant rhs. (fold_truth_andor): Leave sub-expression handling to the recursion above. * config/rs6000/t-rs6000 (toplev.o-warn): Disable errors. for gcc/testsuite/ChangeLog * gcc.dg/field-merge-1.c: New. * gcc.dg/field-merge-2.c: New. * gcc.dg/field-merge-3.c: New. * gcc.dg/field-merge-4.c: New. * gcc.dg/field-merge-5.c: New. --- gcc/config/rs6000/t-rs6000 |4 gcc/fold-const.c | 818 -- gcc/testsuite/gcc.dg/field-merge-1.c | 64 +++ gcc/testsuite/gcc.dg/field-merge-2.c | 31 + gcc/testsuite/gcc.dg/field-merge-3.c | 36 + gcc/testsuite/gcc.dg/field-merge-4.c | 40 ++ gcc/testsuite/gcc.dg/field-merge-5.c | 40 ++ 7 files changed, 882 insertions(+), 151 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 index 1ddb572..516486d 100644 --- a/gcc/config/rs6000/t-rs6000 +++ b/gcc/config/rs6000/t-rs6000 @@ -52,6 +52,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \ $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \ $(
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Sep 28, 2020, Richard Biener wrote: > On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva wrote: >> This patch introduces various improvements to the logic that merges >> field compares. > Sorry for throwing a wrench in here but doing this kind of transform > during GENERIC folding is considered a bad thing. Ugh. Even tree-ifcombine itself relies on tree folding logic to perform this and various similar sorts of optimizations. Is the issue just that we don't want to perform this kind of transformation while still in GENERIC, or that the logic must not even be there? I ask because it wouldn't be too hard to disable unwanted cases of folding while in GENERIC, and extending it to follow SSA defs so that ifcombine would those optimizations back at an acceptable spot. Please help me understand what it is that makes this change as it is undesirable, so that I can figure out how to do it in an acceptable way, and justify the additional time and effort it will take. I *think* ifcombine could even be extended so as to reuse the separate-test logic I put in, by looking for non-immediate dominating outer conditions for the inner condition. A further modified version of fold_truth_andor_1 could then be used to combine the separate tests. As for reassoc... It doesn't seem a good fit at all for reassociating short-circuiting boolean operations, that normally get translated as multiple basic blocks. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Sep 29, 2020, Richard Biener wrote: > On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva wrote: >> On Sep 28, 2020, Richard Biener wrote: > ifcombine should stop using fold*, yeah Wow, that's quite a lot of work for no expected improvement in codegen. I don't expect to be able to justify such an undertaking :-( > I also think it will not end up using the simplifications using loads. Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in the inner block. that won't do when the whole point is to merge loads from memory. That seems excessive. Since we rule out any memory-changing side effects, I suppose we could get away with checking for volatile operands there. Then, adding just a little SSA_DEF chasing, I believe I could bring all of the fold_truth_andor_1 logic I've worked on into ifcombine without much difficulty, and then we could do away with at least that part of fold_truth_andor. > Specifically your patch seems to introduce splitting of loads > at alignment boundaries ... when there's another compare involving a load from either side of the crossed alignment boundary. Even on arches that can do unaligned loads, the result is no worse, and if there are multiple fields crossing consecutive alignment boundaries, the codegen and performance difference can be pretty significant. >> I *think* ifcombine could even be extended so as to reuse the >> separate-test logic I put in, by looking for non-immediate dominating >> outer conditions for the inner condition. A further modified version of >> fold_truth_andor_1 could then be used to combine the separate tests. > I think the structure of ifcombine doesn't exactly match what > fold_truth_andor does How so? AFAICT ifcombine_ifandif deals exactly with the (gimplified version of the) structure I described in the patch that started the thread: (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Sep 29, 2020, Alexandre Oliva wrote: > Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in > the inner block. that won't do when the whole point is to merge loads > from memory. > That seems excessive. Since we rule out any memory-changing side > effects, I suppose we could get away with checking for volatile operands > there. Then, adding just a little SSA_DEF chasing, I believe I could > bring all of the fold_truth_andor_1 logic I've worked on into ifcombine > without much difficulty, and then we could do away with at least that > part of fold_truth_andor. Confirmed, a very ugly prototype seems to work! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
Here's what I got so far, passing regstrap except for field-merge-1.c, that relies on combining non-adjacent compares, which I haven't implemented yet. I had to retain some parts of fold_truth_andor_1 to avoid regressions in gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c. gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required turning sign tests into masks. I moved the field-compare-merging bits of fold_truth_andor_1, and auxiliary functions for it, into fold_truth_andor_maybe_separate, though the ability to deal with non-adjacent compares is no longer used, at least for the time being. I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in maybe_fold_and_comparisons, but... the fact that it may return a tree that isn't a gimple condexpr, e.g. when there is masking or shifting, and that requires re-gimplification, suggests moving it to tree-ssa-ifcombine.c where this can be dealt with, possibly even for non-adjacent compares. Right now, there is code in ifcombine to re-gimplify results from the regular folder, which I've extended, for the time being, to maybe_fold_and_comparisons results as well. I've reworked decode_field_reference to seek and "substitute" SSA DEFs in a way that avoids building new trees, but the test for substitutability feels incomplete: there's nothing to ensure that vuses aren't brought from much-earlier blocks, and that there couldn't possibly be an intervening store. I suspect I will have to give these pieces a little more info for it to be able to tell whether the memory accesses, if moved, would still get the same value. Is there anything easily reusable for this sort of test? As for fields crossing alignment boundaries, the two-tests condition currently returned by fold_truth_andor_maybe_separate, that ends up gimplified into a new block, causes the chain of ifcombine optimizations to stop, something that I'm yet to investigate. My plan is to rearrange ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and to handle such resplit conditions by inserting one in each of the preexisting blocks, to simplify the gimplification and in preparation for combining non-adjacent compares, if we wish to do that. Do we? I was convinced that it was a safe optimization, because of the absence of intervening side effects and the undefinedness of intervening trapping memory accesses, but combining later tests that access a word with earlier tests that access the same word may cause intervening trapping accesses to be skipped. Is this a possibility we should enable or disable on e.g. a per-language basis, avoid altogether, or ... any other suggestions? Richi, is this sort of what you had in mind, or were you thinking of something for match.pd or so? Is match.pd even able to perform cross-block multi-condition matches? Any further advice as to my plans above? Thanks, diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 0cc80ad..47b5419 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -125,7 +125,6 @@ static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); -static tree unextend (tree, int, int, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *); static tree fold_binary_op_with_conditional_arg (location_t, @@ -4353,7 +4352,7 @@ invert_truthvalue_loc (location_t loc, tree arg) is the original memory reference used to preserve the alias set of the access. */ -static tree +tree make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type, HOST_WIDE_INT bitsize, poly_int64 bitpos, int unsignedp, int reversep) @@ -4603,132 +4602,6 @@ optimize_bit_field_compare (location_t loc, enum tree_code code, return lhs; } -/* Subroutine for fold_truth_andor_1: decode a field reference. - - If EXP is a comparison reference, we return the innermost reference. - - *PBITSIZE is set to the number of bits in the reference, *PBITPOS is - set to the starting bit number. - - If the innermost field can be completely contained in a mode-sized - unit, *PMODE is set to that mode. Otherwise, it is set to VOIDmode. - - *PVOLATILEP is set to 1 if the any expression encountered is volatile; - otherwise it is not changed. - - *PUNSIGNEDP is set to the signedness of the field. - - *PREVERSEP is set to the storage order of the field. - - *PMASK is set to the mask used. This is either contained in a - BIT_AND_EXPR or derived from the width of the field. - - *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any. - - Return 0 if this is not a component reference or is one that we can't - do anything with. */ - -static tree -decode_field_reference (location_t loc,
make sincos take type from intrinsic formal, not from result assignment
This is a first step towards enabling the sincos optimization in Ada. The issue this patch solves is that sincos takes the type to be looked up with mathfn_built_in from variables or temporaries in which results of sin and cos are stored. In Ada, sin and cos are declared in an internal aux package, with uses thereof in a standard generic package, which ensures that the types are not what mathfn_built_in expects. Taking the type from the intrinsic's formal parameter, as in the patch, ensures we get the type associated with the intrinsics, regardless of the types used to declare and import them, so the lookup of the CEXPI intrinsic for the same type finds it. for gcc/ChangeLog * tree-ssa-math-opts.c (execute_cse_sincos_1): Take the type for the cexpi/sincos intrinsic interface from formals of other intrinsics. --- gcc/tree-ssa-math-opts.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 8423caa..31fd241 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1139,7 +1139,7 @@ execute_cse_sincos_1 (tree name) { gimple_stmt_iterator gsi; imm_use_iterator use_iter; - tree fndecl, res, type; + tree fndecl = NULL_TREE, res, type = NULL_TREE; gimple *def_stmt, *use_stmt, *stmt; int seen_cos = 0, seen_sin = 0, seen_cexpi = 0; auto_vec stmts; @@ -1147,7 +1147,6 @@ execute_cse_sincos_1 (tree name) int i; bool cfg_changed = false; - type = TREE_TYPE (name); FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name) { if (gimple_code (use_stmt) != GIMPLE_CALL @@ -1169,15 +1168,34 @@ execute_cse_sincos_1 (tree name) break; default:; + continue; } -} + tree t = TREE_VALUE (TYPE_ARG_TYPES (gimple_call_fntype (use_stmt))); + if (!type) + type = t; + else if (t != type) + { + if (!tree_nop_conversion_p (type, t)) + return false; + /* If there is more than one type to choose from, prefer one +that has a CEXPI builtin. */ + else if (!fndecl + && (fndecl = mathfn_built_in (t, BUILT_IN_CEXPI))) + type = t; + } +} if (seen_cos + seen_sin + seen_cexpi <= 1) return false; + if (type != TREE_TYPE (name) + && !tree_nop_conversion_p (type, TREE_TYPE (name))) +return false; + /* Simply insert cexpi at the beginning of top_bb but not earlier than the name def statement. */ - fndecl = mathfn_built_in (type, BUILT_IN_CEXPI); + if (!fndecl) +fndecl = mathfn_built_in (type, BUILT_IN_CEXPI); if (!fndecl) return false; stmt = gimple_build_call (fndecl, 1, name); -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: make sincos take type from intrinsic formal, not from result assignment
On Oct 6, 2020, Richard Biener wrote: > On October 6, 2020 3:15:02 AM GMT+02:00, Alexandre Oliva > wrote: >> >> This is a first step towards enabling the sincos optimization in Ada. >> >> The issue this patch solves is that sincos takes the type to be looked >> up with mathfn_built_in from variables or temporaries in which results >> of sin and cos are stored. In Ada, sin and cos are declared in an >> internal aux package, with uses thereof in a standard generic package, >> which ensures that the types are not what mathfn_built_in expects. > But are they not compatible? They are, in that they use the same underlying representation, but they're distinct types, not associated with the same TYPE_MAIN_VARIANT. In Ada it's not unusual to have multiple floating-point types unrelated to each other, even if they share identical underlying representation. Each such type is a distinct type, in a similar sense that in C++ each struct type holding a single double field is a distinct type. Each such distinct FP type gets a different instantiation of Ada.Numerics.Generic_Elementary_Functions, just as a C++ template taking a parameter type would get different instantiations for such different struct types. Overall, it's a very confusing situation. We use these alternate types to declare the Sin and Cos functions imported from libm as intrinsics (separate patch I've written very recently, yet to be contributed), and they get matched to the libm intrinsics despite the distinct types, we issue calls to them, passing variables of the alternate types without explicit conversions, but when the sincos pass looks up the sincos/cexpi intrinsic, it uses the alternate type taken from the variable and fails, rather than the types declared as taken by the builtins. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: make sincos take type from intrinsic formal, not from result assignment
On Oct 6, 2020, Richard Biener wrote: > OK, I see. mathfn_built_in expects a type inter-operating with > the C ABI types (float_type_node, double_type_node, etc.) where > "inter-operating" means having the same main variant. Yup. > Now, I guess for the sincos pass we want to combine sinl + cosl > to sincosl, independent on the case where the result would be > assigned to a 'double' when 'double == long double'? Sorry, I goofed in the patch description and misled you. When looking at _d = sin (_s); the sincos didn't take the type of _d, but that of _s. I changed it so it takes the not from the actual passed to the intrinsic, but from the formal in the intrinsic declaration. If we had conversions of _s to different precisions, the optimization wouldn't kick in: we'd have different actuals passed to sin and cos. I'm not sure it makes much sense to try to turn e.g. _d1 = sin (_s); _t = (float) _s; _d2 = cosf (_t); into: sincos (_s, &D1, &T); _d1 = D1; _td2 = T; _d2 = (float) _td2; If someone goes through the trouble of computing sin and cos for the same angle at different precisions, you might as well leave it alone. > Now what about sinl + cos when 'double == long double'? Now that might make more sense to optimize, but if we're going to do such transformations, we might as well canonicalize the *l intrinsics to the equivalent double versions (assuming long double and double have the same precision), and then sincos will get it right. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: make sincos take type from intrinsic formal, not from result assignment
On Oct 6, 2020, Richard Biener wrote: > So how about that mathfn_type helper instead of hard-wring this logic > in sincos()? Like this? Regstrapped on x86_64-linux-gnu. Ok to install? I'm a little unhappy with the duplication of the CASE_MATHFN* sequence, that ought to be kept in sync, , and considered turning that whole sequence into a #define used in both places, but that would bloat the patch further and make it less readable, so I figured I'd propose this one first. Please let me know if you agree this additional change would make it better. take type from intrinsic in sincos pass From: Alexandre Oliva This is a first step towards enabling the sincos optimization in Ada. The issue this patch solves is that sincos takes the type to be looked up with mathfn_built_in from variables or temporaries passed as arguments to SIN and COS intrinsics. In Ada, different float types may be used but, despite their representation equivalence, their distinctness causes the optimization to be skipped, because they are not the types that mathfn_built_in expects. This patch introduces a function that maps intrinsics to the type they're associated with, and uses that type, obtained from the intrinsics used in calls to be optimized, to look up the correspoding CEXPI intrinsic. For the sake of defensive programming, when using the type obtained from the intrinsic, it now checks that, if different types are found for the used argument, or for other calls that use it, that the types are interchangeable. for gcc/ChangeLog * builtins.c (mathfn_built_in_type): New. * builtins.h (mathfn_built_in_type): Declare. * tree-ssa-math-opts.c (execute_cse_sincos_1): Use it to obtain the type expected by the intrinsic. --- gcc/builtins.c | 147 ++ gcc/builtins.h |1 gcc/tree-ssa-math-opts.c | 17 - 3 files changed, 162 insertions(+), 3 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index f91266e4..5649242 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -2160,6 +2160,7 @@ mathfn_built_in_2 (tree type, combined_fn fn) switch (fn) { + /* Copied to mathfn_built_in_type, please keep in sync. */ CASE_MATHFN (ACOS) CASE_MATHFN (ACOSH) CASE_MATHFN (ASIN) @@ -2278,6 +2279,10 @@ mathfn_built_in_2 (tree type, combined_fn fn) return END_BUILTINS; } +#undef CASE_MATHFN +#undef CASE_MATHFN_FLOATN +#undef CASE_MATHFN_REENT + /* Return mathematic function equivalent to FN but operating directly on TYPE, if available. If IMPLICIT_P is true use the implicit builtin declaration, otherwise use the explicit declaration. If we can't do the conversion, @@ -2313,6 +2318,148 @@ mathfn_built_in (tree type, enum built_in_function fn) return mathfn_built_in_1 (type, as_combined_fn (fn), /*implicit=*/ 1); } +/* Return the type associated with a built in function, i.e., the one + to be passed to mathfn_built_in to get the type-specific + function. */ + +tree +mathfn_built_in_type (combined_fn fn) +{ +#define CASE_MATHFN(MATHFN)\ + case BUILT_IN_##MATHFN: \ +return double_type_node; \ + case BUILT_IN_##MATHFN##F: \ +return float_type_node;\ + case BUILT_IN_##MATHFN##L: \ +return long_double_type_node; + +#define CASE_MATHFN_FLOATN(MATHFN) \ + CASE_MATHFN(MATHFN) \ + case BUILT_IN_##MATHFN##F16: \ +return float16_type_node; \ + case BUILT_IN_##MATHFN##F32: \ +return float32_type_node; \ + case BUILT_IN_##MATHFN##F64: \ +return float64_type_node; \ + case BUILT_IN_##MATHFN##F128:\ +return float128_type_node; \ + case BUILT_IN_##MATHFN##F32X:\ +return float32x_type_node; \ + case BUILT_IN_##MATHFN##F64X:\ +return float64x_type_node; \ + case BUILT_IN_##MATHFN##F128X: \ +return float128x_type_node; + +/* Similar to above, but appends _R after any F/L suffix. */ +#define CASE_MATHFN_REENT(MATHFN) \ + case BUILT_IN_##MATHFN##_R: \ +return double_type_node; \ + case BUILT_IN_##MATHFN##F_R: \ +return float_type_node;\ + case BUILT_IN_##MATHFN##L_R: \ +return long_double_type_node; + + switch (fn) +{ + /* Copied from mathfn_built_in_2, please keep in sync. */ +CASE_MATHFN (ACOS) +CASE_MATHFN (ACOSH) +CASE_MATHFN (ASIN) +CASE_MATHFN (ASINH) +CASE_MATHFN (ATAN) +CASE_MATHFN (ATAN2) +CASE_MATHFN (ATANH) +CASE_MATHFN (CBRT) +CASE_MATHFN_FLOATN (CEIL) +CASE_MATHFN (CEXP
Re: make sincos take type from intrinsic formal, not from result assignment
On Oct 8, 2020, Richard Biener wrote: > OK with a minor nit, see below >> I'm a little unhappy with the duplication of the CASE_MATHFN* sequence, >> that ought to be kept in sync, , and considered turning that whole >> sequence into a #define used in both places, but that would bloat the >> patch further and make it less readable, so I figured I'd propose this >> one first. Please let me know if you agree this additional change would >> make it better. > Yeah, I guess so. Thanks, I folded that change into the patch. Bloated version below :-) > use !types_compatible_p (type, t) here if we don't want to go > with the stricter type == t (both yours and types_compatible_p > would treat long double and double the same in case they > are mapped to the same FP mode). types_compatible_p > is the appropriate GIMPLE predicate here. Thanks, fixed, regstrapped. Here's what I'm installing. take type from intrinsic in sincos pass From: Alexandre Oliva This is a first step towards enabling the sincos optimization in Ada. The issue this patch solves is that sincos takes the type to be looked up with mathfn_built_in from variables or temporaries passed as arguments to SIN and COS intrinsics. In Ada, different float types may be used but, despite their representation equivalence, their distinctness causes the optimization to be skipped, because they are not the types that mathfn_built_in expects. This patch introduces a function that maps intrinsics to the type they're associated with, and uses that type, obtained from the intrinsics used in calls to be optimized, to look up the correspoding CEXPI intrinsic. For the sake of defensive programming, when using the type obtained from the intrinsic, it now checks that, if different types are found for the used argument, or for other calls that use it, that the types are interchangeable. for gcc/ChangeLog * builtins.c (mathfn_built_in_type): New. * builtins.h (mathfn_built_in_type): Declare. * tree-ssa-math-opts.c (execute_cse_sincos_1): Use it to obtain the type expected by the intrinsic. --- gcc/builtins.c | 236 +- gcc/builtins.h |1 gcc/tree-ssa-math-opts.c | 17 +++ 3 files changed, 164 insertions(+), 90 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index f91266e4..284926f 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -2160,95 +2160,98 @@ mathfn_built_in_2 (tree type, combined_fn fn) switch (fn) { -CASE_MATHFN (ACOS) -CASE_MATHFN (ACOSH) -CASE_MATHFN (ASIN) -CASE_MATHFN (ASINH) -CASE_MATHFN (ATAN) -CASE_MATHFN (ATAN2) -CASE_MATHFN (ATANH) -CASE_MATHFN (CBRT) -CASE_MATHFN_FLOATN (CEIL) -CASE_MATHFN (CEXPI) -CASE_MATHFN_FLOATN (COPYSIGN) -CASE_MATHFN (COS) -CASE_MATHFN (COSH) -CASE_MATHFN (DREM) -CASE_MATHFN (ERF) -CASE_MATHFN (ERFC) -CASE_MATHFN (EXP) -CASE_MATHFN (EXP10) -CASE_MATHFN (EXP2) -CASE_MATHFN (EXPM1) -CASE_MATHFN (FABS) -CASE_MATHFN (FDIM) -CASE_MATHFN_FLOATN (FLOOR) -CASE_MATHFN_FLOATN (FMA) -CASE_MATHFN_FLOATN (FMAX) -CASE_MATHFN_FLOATN (FMIN) -CASE_MATHFN (FMOD) -CASE_MATHFN (FREXP) -CASE_MATHFN (GAMMA) -CASE_MATHFN_REENT (GAMMA) /* GAMMA_R */ -CASE_MATHFN (HUGE_VAL) -CASE_MATHFN (HYPOT) -CASE_MATHFN (ILOGB) -CASE_MATHFN (ICEIL) -CASE_MATHFN (IFLOOR) -CASE_MATHFN (INF) -CASE_MATHFN (IRINT) -CASE_MATHFN (IROUND) -CASE_MATHFN (ISINF) -CASE_MATHFN (J0) -CASE_MATHFN (J1) -CASE_MATHFN (JN) -CASE_MATHFN (LCEIL) -CASE_MATHFN (LDEXP) -CASE_MATHFN (LFLOOR) -CASE_MATHFN (LGAMMA) -CASE_MATHFN_REENT (LGAMMA) /* LGAMMA_R */ -CASE_MATHFN (LLCEIL) -CASE_MATHFN (LLFLOOR) -CASE_MATHFN (LLRINT) -CASE_MATHFN (LLROUND) -CASE_MATHFN (LOG) -CASE_MATHFN (LOG10) -CASE_MATHFN (LOG1P) -CASE_MATHFN (LOG2) -CASE_MATHFN (LOGB) -CASE_MATHFN (LRINT) -CASE_MATHFN (LROUND) -CASE_MATHFN (MODF) -CASE_MATHFN (NAN) -CASE_MATHFN (NANS) -CASE_MATHFN_FLOATN (NEARBYINT) -CASE_MATHFN (NEXTAFTER) -CASE_MATHFN (NEXTTOWARD) -CASE_MATHFN (POW) -CASE_MATHFN (POWI) -CASE_MATHFN (POW10) -CASE_MATHFN (REMAINDER) -CASE_MATHFN (REMQUO) -CASE_MATHFN_FLOATN (RINT) -CASE_MATHFN_FLOATN (ROUND) -CASE_MATHFN_FLOATN (ROUNDEVEN) -CASE_MATHFN (SCALB) -CASE_MATHFN (SCALBLN) -CASE_MATHFN (SCALBN) -CASE_MATHFN (SIGNBIT) -CASE_MATHFN (SIGNIFICAND) -CASE_MATHFN (SIN) -CASE_MATHFN (SINCOS) -CASE_MATHFN (SINH) -CASE_MATHFN_FLOATN (SQRT) -CASE_MATHFN (TAN) -CASE_MATHFN (TANH) -CASE_MATHFN (TGAMMA) -CASE_MATHFN_FLOATN (TRUNC) -CASE_MATHFN (Y0) -CASE_MATHFN (Y1) +#define SEQ_OF_CASE_MATHFN
import elementary functions as intrinsics
Intrinsic, Sqrt, "sqrt"); pragma Pure_Function (Sqrt); function Log (X : Double) return Double; - pragma Import (C, Log, "log"); + pragma Import (Intrinsic, Log, "log"); pragma Pure_Function (Log); function Acos (X : Double) return Double; - pragma Import (C, Acos, "acos"); + pragma Import (Intrinsic, Acos, "acos"); pragma Pure_Function (Acos); function Asin (X : Double) return Double; - pragma Import (C, Asin, "asin"); + pragma Import (Intrinsic, Asin, "asin"); pragma Pure_Function (Asin); function Atan (X : Double) return Double; - pragma Import (C, Atan, "atan"); + pragma Import (Intrinsic, Atan, "atan"); pragma Pure_Function (Atan); function Sinh (X : Double) return Double; - pragma Import (C, Sinh, "sinh"); + pragma Import (Intrinsic, Sinh, "sinh"); pragma Pure_Function (Sinh); function Cosh (X : Double) return Double; - pragma Import (C, Cosh, "cosh"); + pragma Import (Intrinsic, Cosh, "cosh"); pragma Pure_Function (Cosh); function Tanh (X : Double) return Double; - pragma Import (C, Tanh, "tanh"); + pragma Import (Intrinsic, Tanh, "tanh"); pragma Pure_Function (Tanh); function Pow (X, Y : Double) return Double; - pragma Import (C, Pow, "pow"); + pragma Import (Intrinsic, Pow, "pow"); pragma Pure_Function (Pow); end Ada.Numerics.Aux; -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
[FYI] fix cgraph comment
This comment cut&pasto fix was split out of another patch I'm about to contribute, as the current version of the patch no longer touches cgraph data structures. I'm checking it in as obvious. From: Eric Botcazou for gcc/ChangeLog * cgraph.c (cgraph_node::rtl_info): Fix cut&pasto in comment. * cgraph.h (cgraph_node::rtl_info): Likewise. --- gcc/cgraph.c |2 +- gcc/cgraph.h |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 8b752d8380981..e83bf2001e257 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1846,7 +1846,7 @@ cgraph_node::local_info (tree decl) return &node->ultimate_alias_target ()->local; } -/* Return local info for the compiled function. */ +/* Return RTL info for the compiled function. */ cgraph_rtl_info * cgraph_node::rtl_info (const_tree decl) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 733d616fb8c3a..a7f357f0b5c07 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1381,7 +1381,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node /* Return local info for the compiled function. */ static cgraph_local_info *local_info (tree decl); - /* Return local info for the compiled function. */ + /* Return RTL info for the compiled function. */ static struct cgraph_rtl_info *rtl_info (const_tree); /* Return the cgraph node that has ASMNAME for its DECL_ASSEMBLER_NAME. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free!FSF VP & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
introduce -fcallgraph-info option
This was first submitted many years ago https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html The command line option -fcallgraph-info is added and makes the compiler generate another output file (xxx.ci) for each compilation unit, which is a valid VCG file (you can launch your favorite VCG viewer on it unmodified) and contains the "final" callgraph of the unit. "final" is a bit of a misnomer as this is actually the callgraph at RTL expansion time, but since most high-level optimizations are done at the Tree level and RTL doesn't usually fiddle with calls, it's final in almost all cases. Moreover, the nodes can be decorated with additional info: -fcallgraph-info=su adds stack usage info and -fcallgraph-info=da dynamic allocation info. Compared with the earlier version, this patch does not modify cgraph, and instead adds the required information next to the stage usage function data structure, so we only hold one of those at at time. I've switched to vecs from linked lists, for more compact edges and dynamic allocation annotations, and arranged for them to be released as soon as we've printed out the information. I have NOT changed the file format, because existing tools such as gnatstack consume the current format. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog >From Eric Botcazou , Alexandre Oliva > * common.opt (-fcallgraph-info[=]): New option. * doc/invoke.texi (Debugging options): Document it. * opts.c (common_handle_option): Handle it. * builtins.c (expand_builtin_alloca): Record allocation if -fcallgraph-info=da. * calls.c (expand_call): If -fcallgraph-info, record the call. (emit_library_call_value_1): Likewise. * flag-types.h (enum callgraph_info_type): New type. * explow.c: Include stringpool.h. (set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol. * function.c (allocate_stack_usage_info): New. (allocate_struct_function): Call it for -fcallgraph-info. (prepare_function_start): Call it otherwise. (rest_of_handle_thread_prologue_and_epilogue): Release callees and dallocs after output_stack_usage. (record_final_call, record_dynamic_alloc): New. * function.h (struct callee, struct dalloc): New. (struct stack_usage): Add callees and dallocs. (record_final_call, record_dynamic_alloc): Declare. * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object if -fcallgraph-info=da. * optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL. * print-tree.h (print_decl_identifier): Declare. (PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New. * print-tree.c: Include print-tree.h. (print_decl_identifier): New function. * toplev.c: Include print-tree.h. (callgraph_info_file): New global variable. (callgraph_info_indirect_emitted): Likewise. (output_stack_usage): Rename to... (output_stack_usage_1): ... this. Make it static, add cf parameter. If -fcallgraph-info=su, print stack usage to cf. If -fstack-usage, use print_decl_identifier for pretty-printing. (INDIRECT_CALL_NAME): New. (dump_final_indirect_call_node_vcg): New. (dump_final_callee_vcg, dump_final_node_vcg): New. (output_stack_usage): New. (lang_dependent_init): Open and start file if -fcallgraph-info. (finalize): If callgraph_info_file is not null, finish it, close it, and reset callgraph info state. for gcc/ada/ChangeLog * gcc-interface/misc.c (callgraph_info_file): Delete. --- gcc/ada/gcc-interface/misc.c |3 - gcc/builtins.c |4 + gcc/calls.c |6 + gcc/common.opt |8 ++ gcc/doc/invoke.texi | 17 gcc/explow.c |5 + gcc/flag-types.h | 16 gcc/function.c | 63 ++-- gcc/function.h | 25 ++ gcc/gimplify.c |4 + gcc/optabs-libfuncs.c|4 - gcc/opts.c | 26 ++ gcc/output.h |2 gcc/print-tree.c | 76 +++ gcc/print-tree.h |4 + gcc/toplev.c | 169 ++ 16 files changed, 381 insertions(+), 51 deletions(-) diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index 4abd4d5708a54..d68b37384ff7f 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -54,9 +54,6 @@ #include "ada-tree.h" #include "gigi.h" -/* This symbol needs to be defined for the front-end. */ -void *callgraph_info_file = NULL; - /* Command-line argc and argv. These variables are global since they are imported in back
Re: introduce -fcallgraph-info option
Hi, Richi, On Oct 26, 2019, Richard Biener wrote: > How does it relate to the LTO-dump utility we have now which can in > theory provide a more complete view? Erhm... Not at all, AFAICT. The only vague resemblance I see is in the presence of the word "callgraph" in the description of what both can do, but even that's not quite the same callgraph, besides the different format. E.g., the reason we gather expanded calls rather than just use cgraph_edges is that the latter would dump several "calls" that are builtins expanded internally by the compiler, and would NOT dump other ops that are expanded as (lib)calls. In order to get an accurate assessment of stack size requirements, the presence of the builtins could be confusing but not so much trouble, but the absence of libcalls would underestimate the potential max total stack use by a symbol. > Maybe some infrastructure can be > shared here (the actual dumping of the cgraph?) You mean the one-liner loop in cgraph_node::dump_graphviz, called by lto-dump to dump the cgraph? That doesn't really seem worth sharing; more so considering we dump edges in a quite different format, and not just the edges. In this different format expected by gnatstack, we also dump the nodes, and include information in the labels of the nodes, such as their original spelling and location, as well as stack use: node: { title: "_ada_p" label: "P\np.adb:1:1\n48 bytes (static)" } and dynamic stack allocations (alloca and vlas): node: { title: "p.adb:p__u" label: "u\np.adb:21:3\n2 dynamic objects\n rt p.adb:23:5\n ri p.adb:24:5" } and though edges to libcalls may carry just as little info as a graphviz "from" -> "to" edge: edge: { sourcename: "add" targetname: "__addvsi3" } those between user symbols carry location info as well: edge: { sourcename: "p__s" targetname: "p.adb:p__v" label: "p.adb:46:5" } So I'm afraid I don't see anything that could be usefully factored out. Do you? Thanks, -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free!FSF VP & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: introduce -fcallgraph-info option
On Oct 26, 2019, Alexandre Oliva wrote: > E.g., the reason we gather expanded calls rather than just use > cgraph_edges is that the latter would dump several "calls" that are > builtins expanded internally by the compiler, and would NOT dump other > ops that are expanded as (lib)calls. It occurred to me that we could reuse most cgraph edges and avoid duplicating them in the callees vec, namely those that aren't builtins or libcalls. Those that are builtins might or might not become actual calls, so we disregard the cgraph_edges and record them in the vec instead. Those that are libcalls (builtins in a different sense) introduced during expand are not even present in the cgraph edges, so we record them in the vec as well. Here's the patch that implements this. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog >From Eric Botcazou , Alexandre Oliva > * common.opt (-fcallgraph-info[=]): New option. * doc/invoke.texi (Debugging options): Document it. * opts.c (common_handle_option): Handle it. * builtins.c (expand_builtin_alloca): Record allocation if -fcallgraph-info=da. * calls.c (expand_call): If -fcallgraph-info, record the call. (emit_library_call_value_1): Likewise. * flag-types.h (enum callgraph_info_type): New type. * explow.c: Include stringpool.h. (set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol. * function.c (allocate_stack_usage_info): New. (allocate_struct_function): Call it for -fcallgraph-info. (prepare_function_start): Call it otherwise. (record_final_call, record_dynamic_alloc): New. * function.h (struct callinfo_callee): New. (CALLEE_FROM_CGRAPH_P): New. (struct callinfo_dalloc): New. (struct stack_usage): Add callees and dallocs. (record_final_call, record_dynamic_alloc): Declare. * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object if -fcallgraph-info=da. * optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL. * print-tree.h (print_decl_identifier): Declare. (PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New. * print-tree.c: Include print-tree.h. (print_decl_identifier): New function. * toplev.c: Include print-tree.h. (callgraph_info_file): New global variable. (callgraph_info_indirect_emitted): Likewise. (output_stack_usage): Rename to... (output_stack_usage_1): ... this. Make it static, add cf parameter. If -fcallgraph-info=su, print stack usage to cf. If -fstack-usage, use print_decl_identifier for pretty-printing. (INDIRECT_CALL_NAME): New. (dump_final_indirect_call_node_vcg): New. (dump_final_callee_vcg, dump_final_node_vcg): New. (output_stack_usage): New. (lang_dependent_init): Open and start file if -fcallgraph-info. (finalize): If callgraph_info_file is not null, finish it, close it, and reset callgraph info state. for gcc/ada/ChangeLog * gcc-interface/misc.c (callgraph_info_file): Delete. --- gcc/ada/gcc-interface/misc.c |3 - gcc/builtins.c |4 + gcc/calls.c |6 + gcc/common.opt |8 ++ gcc/doc/invoke.texi | 17 gcc/explow.c |5 + gcc/flag-types.h | 16 gcc/function.c | 59 -- gcc/function.h | 30 +++ gcc/gimplify.c |4 + gcc/optabs-libfuncs.c|4 - gcc/opts.c | 26 ++ gcc/output.h |2 gcc/print-tree.c | 76 ++ gcc/print-tree.h |4 + gcc/toplev.c | 179 ++ 16 files changed, 393 insertions(+), 50 deletions(-) diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index 4abd4d5708a54..d68b37384ff7f 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -54,9 +54,6 @@ #include "ada-tree.h" #include "gigi.h" -/* This symbol needs to be defined for the front-end. */ -void *callgraph_info_file = NULL; - /* Command-line argc and argv. These variables are global since they are imported in back_end.adb. */ unsigned int save_argc; diff --git a/gcc/builtins.c b/gcc/builtins.c index 5d811f113c907..bd302383377ba 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5406,6 +5406,10 @@ expand_builtin_alloca (tree exp) = allocate_dynamic_stack_space (op0, 0, align, max_size, alloca_for_var); result = convert_memory_address (ptr_mode, result); + /* Dynamic allocations for variables are recorded during gimplification. */ + if (!alloca_for_var && (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_
Re: introduce -fcallgraph-info option
On Oct 28, 2019, Joseph Myers wrote: > We have a test in the testsuite that all option help text consistently > ends with '.' (see gcc.misc-tests/help.exp). I'd have expected these > options, lacking that '.', to cause that test to fail. Thanks. I've fixed the common.opt changes, and I'll submit momentarily a patch for help.exp to extend it to cover --help=common output as well. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free!FSF VP & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: introduce -fcallgraph-info option
On Oct 28, 2019, Joseph Myers wrote: > We have a test in the testsuite that all option help text consistently > ends with '.' (see gcc.misc-tests/help.exp). I'd have expected these > options, lacking that '.', to cause that test to fail. Here's the patch. Tested on x86_64-linux-gnu. Ok to install? Test --help=common for full sentences The portion of help.exp that checks that help output contains full sentences failed to cover common options. for gcc/testsuite/ChangeLog * gcc.misc-tests/help.exp: Test --help=common for full sentences. --- gcc/testsuite/gcc.misc-tests/help.exp |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.misc-tests/help.exp b/gcc/testsuite/gcc.misc-tests/help.exp index b8a09fc..4bb359f 100644 --- a/gcc/testsuite/gcc.misc-tests/help.exp +++ b/gcc/testsuite/gcc.misc-tests/help.exp @@ -146,8 +146,7 @@ check_for_options c "--help=joined,undocumented" "" "" "" # find the source a failure. foreach cls { "ada" "c" "c++" "d" "fortran" "go" \ - "optimizers" "param" "target" "warnings" } { - + "common" "optimizers" "param" "target" "warnings" } { check_for_options c "--help=$cls" "" "^ +-.*\[^:.\]$" "" } -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free!FSF VP & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: introduce -fcallgraph-info option
On Oct 28, 2019, Richard Biener wrote: > I guess you need to elaborate on 'per-file'. With LTO as far as I > understand you'll get the graph per LTRANS unit (did you check > where the output is generated?). Yeah, I guess this was not designed with LTO in mind; it probably even predates LTO. We get per-LTRANS unit indeed, and the output is generated in the temporary dir, which is not desirable behavior for sure. The outputs seem to be usable if you can figure out what they are, but I'm not sure how to go about combining the multiple .ci files, or how to name the combined output, since it's not generally expected that these files will be created at link time, rather than at compile time. I'll bring this up internally and come back with some improvement. > Is this mainly a debugging tool or does it serve a different purpose? It feeds gnatstack, that's a tool to compute max stack depth and perform other call graph analyzes. I don't think of it as a debugging tool. https://www.adacore.com/gnatpro/toolsuite/gnatstack http://docs.adacore.com/live/wave/gnatstack/html/gnatstack_ug/index.html -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free!FSF VP & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: introduce -fcallgraph-info option
On Oct 30, 2019, Richard Biener wrote: > One way of operation would be to > generate the graph during the compilation step Stack usage is only computed during prologue/epilogue generation in RTL. > Additional arguments to -fcallgraph-info might be used to direct the > output to a specific directory as well. I've adjusted open_auxiliary_file for LTO recompilation auxiliary files to be placed in the same location as the specified executable output name, and noted that in the documentation. I've also added a bitmap to output nodes for externals, accidentally dropped in the transition to incremental generation of the .ci file. Regstrapped on x86_64-linux-gnu. Ok to install? introduce -fcallgraph-info option This was first submitted many years ago https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html The command line option -fcallgraph-info is added and makes the compiler generate another output file (xxx.ci) for each compilation unit (or LTO partitoin), which is a valid VCG file (you can launch your favorite VCG viewer on it unmodified) and contains the "final" callgraph of the unit. "final" is a bit of a misnomer as this is actually the callgraph at RTL expansion time, but since most high-level optimizations are done at the Tree level and RTL doesn't usually fiddle with calls, it's final in almost all cases. Moreover, the nodes can be decorated with additional info: -fcallgraph-info=su adds stack usage info and -fcallgraph-info=da dynamic allocation info. for gcc/ChangeLog >From Eric Botcazou , Alexandre Oliva > * common.opt (-fcallgraph-info[=]): New option. * doc/invoke.texi (Developer options): Document it. * opts.c (common_handle_option): Handle it. * builtins.c (expand_builtin_alloca): Record allocation if -fcallgraph-info=da. * calls.c (expand_call): If -fcallgraph-info, record the call. (emit_library_call_value_1): Likewise. * flag-types.h (enum callgraph_info_type): New type. * explow.c: Include stringpool.h. (set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol. * function.c (allocate_stack_usage_info): New. (allocate_struct_function): Call it for -fcallgraph-info. (prepare_function_start): Call it otherwise. (record_final_call, record_dynamic_alloc): New. * function.h (struct callinfo_callee): New. (CALLEE_FROM_CGRAPH_P): New. (struct callinfo_dalloc): New. (struct stack_usage): Add callees and dallocs. (record_final_call, record_dynamic_alloc): Declare. * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object if -fcallgraph-info=da. * optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL. * print-tree.h (print_decl_identifier): Declare. (PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New. * print-tree.c: Include print-tree.h. (print_decl_identifier): New function. * toplev.c: Include print-tree.h. (callgraph_info_file): New global variable. (callgraph_info_external_printed): Likewise. (open_auxiliary_file): Use dump_base_name for LTO partitions. (output_stack_usage): Rename to... (output_stack_usage_1): ... this. Make it static, add cf parameter. If -fcallgraph-info=su, print stack usage to cf. If -fstack-usage, use print_decl_identifier for pretty-printing. (INDIRECT_CALL_NAME): New. (dump_final_node_vcg_start): New. (dump_final_callee_vcg, dump_final_node_vcg): New. (output_stack_usage): New. (lang_dependent_init): Open and start file if -fcallgraph-info. Allocated callgraph_info_external_printed. (finalize): If callgraph_info_file is not null, finish it, close it, and release callgraph_info_external_printed. for gcc/ada/ChangeLog * gcc-interface/misc.c (callgraph_info_file): Delete. --- gcc/ada/gcc-interface/misc.c |3 - gcc/builtins.c |4 + gcc/calls.c |6 + gcc/common.opt |8 ++ gcc/doc/invoke.texi | 22 + gcc/explow.c |5 + gcc/flag-types.h | 16 gcc/function.c | 59 - gcc/function.h | 30 +++ gcc/gimplify.c |4 + gcc/optabs-libfuncs.c|4 - gcc/opts.c | 26 ++ gcc/output.h |2 gcc/print-tree.c | 76 + gcc/print-tree.h |4 + gcc/toplev.c | 186 ++ 16 files changed, 402 insertions(+), 53 deletions(-) diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index 4abd4d5..d68b373 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c
[FYI] pass --enable-obsolete to build configure
e} tm-preds.h" tm_d_file="${tm_d_file} defaults.h" @@ -18868,7 +18887,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18871 "configure" +#line 18890 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18974,7 +18993,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18977 "configure" +#line 18996 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/gcc/configure.ac b/gcc/configure.ac index 8df849f4..71ae18e0 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -1830,6 +1830,7 @@ AC_SUBST(extra_opt_files) if test x$host = x$build then build_auto=auto-host.h + HAVE_AUTO_BUILD='# ' else # We create a subdir, then run autoconf in the subdir. # To prevent recursion we set host and build for the new @@ -1852,7 +1853,10 @@ else GMPINC="" CPPFLAGS="${CPPFLAGS} -DGENERATOR_FILE" \ ${realsrcdir}/configure \ --enable-languages=${enable_languages-all} \ - --target=$target_alias --host=$build_alias --build=$build_alias + ${enable_obsolete+--enable-obsolete="$enable_obsolete"} \ + ${enable_option_checking+--enable-option-checking="$enable_option_checking"} \ + --target=$target_alias --host=$build_alias \ + --build=$build_alias || exit # retaining $tempdir # We just finished tests for the build machine, so rename # the file auto-build.h in the gcc directory. @@ -1860,8 +1864,10 @@ else cd .. rm -rf $tempdir build_auto=auto-build.h + HAVE_AUTO_BUILD= fi AC_SUBST(build_subdir) +AC_SUBST(HAVE_AUTO_BUILD) tm_file="${tm_file} defaults.h" tm_p_file="${tm_p_file} tm-preds.h" -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free!FSF VP & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: [FYI] pass --enable-obsolete to build configure
\ + --build=$build_alias || exit # retaining $tempdir # We just finished tests for the build machine, so rename # the file auto-build.h in the gcc directory. @@ -12238,9 +12243,11 @@ else cd .. rm -rf $tempdir build_auto=auto-build.h + HAVE_AUTO_BUILD= fi + tm_file="${tm_file} defaults.h" tm_p_file="${tm_p_file} tm-preds.h" tm_d_file="${tm_d_file} defaults.h" @@ -18926,7 +18933,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18929 "configure" +#line 18936 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -19032,7 +19039,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19035 "configure" +#line 19042 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/gcc/configure.ac b/gcc/configure.ac index 0fe4e54..1a0d682 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -1838,6 +1838,7 @@ AC_SUBST(extra_opt_files) if test x$host = x$build then build_auto=auto-host.h + HAVE_AUTO_BUILD='# ' else # We create a subdir, then run autoconf in the subdir. # To prevent recursion we set host and build for the new @@ -1860,7 +1861,10 @@ else GMPINC="" CPPFLAGS="${CPPFLAGS} -DGENERATOR_FILE" \ ${realsrcdir}/configure \ --enable-languages=${enable_languages-all} \ - --target=$target_alias --host=$build_alias --build=$build_alias + ${enable_obsolete+--enable-obsolete="$enable_obsolete"} \ + ${enable_option_checking+--enable-option-checking="$enable_option_checking"} \ + --target=$target_alias --host=$build_alias \ + --build=$build_alias || exit # retaining $tempdir # We just finished tests for the build machine, so rename # the file auto-build.h in the gcc directory. @@ -1868,8 +1872,10 @@ else cd .. rm -rf $tempdir build_auto=auto-build.h + HAVE_AUTO_BUILD= fi AC_SUBST(build_subdir) +AC_SUBST(HAVE_AUTO_BUILD) tm_file="${tm_file} defaults.h" tm_p_file="${tm_p_file} tm-preds.h" -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: introduce -fcallgraph-info option
On Nov 4, 2019, Richard Biener wrote: > Please leave that part out for now, I'd rather discuss this separately > from the bulk of the patch. That is, I wonder why we shouldn't > simply adjust aux_base_name to something else for -flto [in the driver]. *nod*, that makes sense to me. After seeing your suggestion, I started looking into how to do that, but didn't get very far yet. For now, I've split that bit out of the main patch. So I'm installing the first, big one, and not installing the latter, posted mainly so that the documentation bit can be picked up. Thanks! introduce -fcallgraph-info option This was first submitted many years ago https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html The command line option -fcallgraph-info is added and makes the compiler generate another output file (xxx.ci) for each compilation unit (or LTO partitoin), which is a valid VCG file (you can launch your favorite VCG viewer on it unmodified) and contains the "final" callgraph of the unit. "final" is a bit of a misnomer as this is actually the callgraph at RTL expansion time, but since most high-level optimizations are done at the Tree level and RTL doesn't usually fiddle with calls, it's final in almost all cases. Moreover, the nodes can be decorated with additional info: -fcallgraph-info=su adds stack usage info and -fcallgraph-info=da dynamic allocation info. for gcc/ChangeLog >From Eric Botcazou , Alexandre Oliva > * common.opt (-fcallgraph-info[=]): New option. * doc/invoke.texi (Developer options): Document it. * opts.c (common_handle_option): Handle it. * builtins.c (expand_builtin_alloca): Record allocation if -fcallgraph-info=da. * calls.c (expand_call): If -fcallgraph-info, record the call. (emit_library_call_value_1): Likewise. * flag-types.h (enum callgraph_info_type): New type. * explow.c: Include stringpool.h. (set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol. * function.c (allocate_stack_usage_info): New. (allocate_struct_function): Call it for -fcallgraph-info. (prepare_function_start): Call it otherwise. (record_final_call, record_dynamic_alloc): New. * function.h (struct callinfo_callee): New. (CALLEE_FROM_CGRAPH_P): New. (struct callinfo_dalloc): New. (struct stack_usage): Add callees and dallocs. (record_final_call, record_dynamic_alloc): Declare. * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object if -fcallgraph-info=da. * optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL. * print-tree.h (print_decl_identifier): Declare. (PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New. * print-tree.c: Include print-tree.h. (print_decl_identifier): New function. * toplev.c: Include print-tree.h. (callgraph_info_file): New global variable. (callgraph_info_external_printed): Likewise. (output_stack_usage): Rename to... (output_stack_usage_1): ... this. Make it static, add cf parameter. If -fcallgraph-info=su, print stack usage to cf. If -fstack-usage, use print_decl_identifier for pretty-printing. (INDIRECT_CALL_NAME): New. (dump_final_node_vcg_start): New. (dump_final_callee_vcg, dump_final_node_vcg): New. (output_stack_usage): New. (lang_dependent_init): Open and start file if -fcallgraph-info. Allocated callgraph_info_external_printed. (finalize): If callgraph_info_file is not null, finish it, close it, and release callgraph_info_external_printed. for gcc/ada/ChangeLog * gcc-interface/misc.c (callgraph_info_file): Delete. --- gcc/ada/gcc-interface/misc.c |3 - gcc/builtins.c |4 + gcc/calls.c |6 + gcc/common.opt |8 ++ gcc/doc/invoke.texi | 23 + gcc/explow.c |5 + gcc/flag-types.h | 16 gcc/function.c | 59 +- gcc/function.h | 30 +++ gcc/gimplify.c |4 + gcc/optabs-libfuncs.c|4 - gcc/opts.c | 26 ++ gcc/output.h |2 gcc/print-tree.c | 76 ++ gcc/print-tree.h |4 + gcc/toplev.c | 178 ++ 16 files changed, 397 insertions(+), 51 deletions(-) diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index 4abd4d5..d68b373 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -54,9 +54,6 @@ #include "ada-tree.h" #include "gigi.h" -/* This symbol needs to be defined for the front-end. */ -void *callgraph_info
Re: introduce -fcallgraph-info option
On Nov 4, 2019, Richard Biener wrote: > I wonder why we shouldn't simply adjust aux_base_name to something > else for -flto [in the driver]. About that, having tried to make sense of the current uses of aux_base_name and of lto-wrapper, three main possibilities occur to me: a) adjust the driver code to accept -auxbase, and have lto-wrapper explicitly pass -aux-base ${output_dir-.}/$(lbasename ${output_name}) or somesuch for each -fltrans command; b) introduce -auxdir and get lto-wrapper to pass -auxdir ${output_dir-.} for each -fltrans (and offload) command; or c) get -fltrans to implicitly adjust aux_base_name with the directory passed to -dumpdir, if any, or . otherwise Any preferences? -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: introduce -fcallgraph-info option
On Nov 7, 2019, Richard Biener wrote: > A simple test shows we currently only pass -auxbase-strip /tmp/cc...o > to the LTRANS lto1 invocation plus -dumpbase cc...o This -auxbase-strip argument is introduced by the gcc driver that runs lto1, based on the -o (tmp ltrans .o) argument, but this driver has no clue as to the executable name or location, and there's nothing whatsoever in the driver interface that enables aux_base_name to be located separately from the output name. Thus the possibilities I brought up of introducing means for it to be told so, explicitly or by convention. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: introduce -fcallgraph-info option
On Nov 6, 2019, Thomas Schwinge wrote: >> which is a valid VCG file (you can launch your favorite VCG >> viewer on it unmodified) > What should be my "favorite VCG viewer"? -ENOCLUE, I'm afraid. I honestly don't even know which one Eric used back when he first attempted to contribute this feature, almost 10 years ago. What I do know is that visualization is not the primary goal. There are indeed newer and more elaborate and modern graph file formats for that. The primary intended consumer of this output is gnatstack, that's long used only this simple format. It's hard to justify rewriting it and creating an incompatibility when the simple format does the job well. Plus, it's simple enough and regular enough that it should be quite easy to parse it with a few lines of awk and post-process the .ci file into any other graph format of interest, when visualization of the graph is the aim. If you show me examples of graph formats that you'd like, that can represent all the data encoded in .ci files, it wouldn't take much effort to persuade me to write the few lines of awk, or perhaps even sed, to convert .ci files output by GCC to the other format ;-) > I tried that, but 'xvcg' didn't render anything useful for a > '-fcallgraph-info=su,da' dump, hmm. Did xvcg fail to display the node information added by su and da? As in, do you see the difference the options make to the graph text file, but not in the visualization? Or is it something else? > Also, I found that many years ago, in 2012, Steven Bosscher did "Rework > RTL CFG graph dumping to dump DOT format" (that's Graphviz), and then did > "remove vcg CFG dumper". gnatstack and -fcallgraph-info have been available since long before that move indeed. > Note that I'm not actively objecting VCG Good, thanks for pointing that out :-) > unmaintained mid-90s software, containing obfuscated layout/rendering > source code Since gnatstack is the primary consumer, I think that objection doesn't apply. As a Free Software activist, however, I am a little concerned about the claim about obfuscated source code. I haven't been able to find any substantiation of that in your message. I think that would be OT for this list, so would you please send me what you got about it at ol...@fsf.org? TIA, -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: introduce -fcallgraph-info option
On Nov 7, 2019, Richard Biener wrote: > So how's -dumpbase handled? It's part of the gcc driver interface, and lto-wrapper passes it to gcc for -fltrans compilations. -auxbase isn't, so one of the alternatives I suggested involved our taking auxbase from dumpdir & dumpbase for -fltrans compilations. The other alternatives amounted to exposing auxdir or auxbase in the driver interface, so that lto-wrapper could specify them explicitly. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: introduce -fcallgraph-info option
linking without -o: (a.out.ltrans#, a.out.ltrans#) ex $CC ltobjdir/foo.o ltobjdir/bar.o -fdump-rtl-expand -> /tmp/temp().ltrans0.ltrans.o a.out.ltrans0.su + a.out.ltrans0.#r.expand + a.out I'm a little hesitant, this amounts to quite significant behavior changes. Do these seem acceptable and desirable nevertheless? -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
introduce overridable clear_cache emitter
fault_ira_change_pseudo_allocno_class (int, reg_class_t, reg_class_t); diff --git a/gcc/tree.h b/gcc/tree.h index 684be10b440a7..d69f8bad1d015 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -5579,6 +5579,13 @@ is_lang_specific (const_tree t) #define BUILTIN_VALID_P(FNCODE) \ (IN_RANGE ((int)FNCODE, ((int)BUILT_IN_NONE) + 1, ((int) END_BUILTINS) - 1)) +/* Obtain a pointer to the identifier string holding the asm name for + BUILTIN, a BUILT_IN code. This is handy if the target + mangles/overrides the function name that implements the + builtin. */ +#define BUILTIN_ASM_NAME_PTR(BUILTIN) \ + (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (builtin_decl_explicit (BUILTIN + /* Return the tree node for an explicit standard builtin function or NULL. */ static inline tree builtin_decl_explicit (enum built_in_function fncode) diff --git a/libgcc/config/t-vxworks b/libgcc/config/t-vxworks index 02e2efa0eb7d7..b4bb85bff08be 100644 --- a/libgcc/config/t-vxworks +++ b/libgcc/config/t-vxworks @@ -4,7 +4,6 @@ LIBGCC2_DEBUG_CFLAGS = # We provide our own implementation for __clear_cache, using a # VxWorks specific entry point. LIB2FUNCS_EXCLUDE += _clear_cache -LIB2ADD += $(srcdir)/config/vxcache.c # This ensures that the correct target headers are used; some VxWorks # system headers have names that collide with GCC's internal (host) diff --git a/libgcc/config/t-vxworks7 b/libgcc/config/t-vxworks7 index 20c72f490dd3c..6ddd3e84f3309 100644 --- a/libgcc/config/t-vxworks7 +++ b/libgcc/config/t-vxworks7 @@ -4,7 +4,6 @@ LIBGCC2_DEBUG_CFLAGS = # We provide our own implementation for __clear_cache, using a # VxWorks specific entry point. LIB2FUNCS_EXCLUDE += _clear_cache -LIB2ADD += $(srcdir)/config/vxcache.c # This ensures that the correct target headers are used; some VxWorks # system headers have names that collide with GCC's internal (host) diff --git a/libgcc/config/vxcache.c b/libgcc/config/vxcache.c deleted file mode 100644 index e25e0cce0a4c2..0 --- a/libgcc/config/vxcache.c +++ /dev/null @@ -1,35 +0,0 @@ -/* Copyright (C) 2018-2020 Free Software Foundation, Inc. - Contributed by Alexandre Oliva - -This file is part of GCC. - -GCC is free software; you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free -Software Foundation; either version 3, or (at your option) any later -version. - -GCC is distributed in the hope that it will be useful, but WITHOUT ANY -WARRANTY; without even the implied warranty of MERCHANTABILITY or -FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -for more details. - -Under Section 7 of GPL version 3, you are granted additional -permissions described in the GCC Runtime Library Exception, version -3.1, as published by the Free Software Foundation. - -You should have received a copy of the GNU General Public License and -a copy of the GCC Runtime Library Exception along with this program; -see the files COPYING3 and COPYING.RUNTIME respectively. If not, see -<http://www.gnu.org/licenses/>. */ - -/* Instruction cache invalidation routine using VxWorks' cacheLib. */ - -#include -#include - -void -__clear_cache (char *beg __attribute__((__unused__)), - char *end __attribute__((__unused__))) -{ - cacheTextUpdate (beg, end - beg); -} -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: drop -aux{dir,base}, revamp -dump{dir,base}
On Jan 9, 2020, Richard Biener wrote: > Did I miss the actual (non-documentation) patch? No, I didn't post it. It's kind of big, and only yesterday did I get it to work as expected and now extensively documented, passing all of the extensive testsuite I wrote for it. Alas, some of the latest tweaks to driver and lto-wrapper to update lto dumps to the new semantics ended up regressing a handful of tests in the testsuite. I'm still looking for some simple work-around, and I'm not very happy with the naming of wpa dumps. commit c34c1f54a8e4c92a0fca101a0f732c5e29c44643 currently in aoliva/testme in the git repo is the latest I tested; I pushed other minor cleanups over that one, but other changes I might make and push later today might be more disruptive in terms of test results. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
Re: drop -aux{dir,base}, revamp -dump{dir,base}
On Jan 16, 2020, Alexandre Oliva wrote: > On Jan 9, 2020, Alexandre Oliva wrote: >> On Jan 9, 2020, Richard Biener wrote: >>> Did I miss the actual (non-documentation) patch? >> No, I didn't post it. It's kind of big, and only yesterday did I get it >> to work as expected and now extensively documented, passing all of the >> extensive testsuite I wrote for it. > Here it is, at last, regstrapped on x86_64-linux-gnu. Ok to install? And here's a followup that fixes a limitation (bug?) in libiberty that was hit when I attempted a last-minute simplification in lto-wrapper. Regstrapped separately on x86_64-linux-gnu. Ok to install? [libiberty] output empty args as a pair of quotes From: Alexandre Oliva writeargv writes out empty arguments in a way that expandargv skips them instead of preserving them. Fixed by writing out a pair of quotes for them. This enables lto-wrapper to pass down an empty string, as desired. for libiberty/ChangeLog * argv.c (writeargv): Output empty args as "". for gcc/ChangeLog * lto-wrapper.c (run_gcc): Use an empty string for -dumpdir. --- gcc/lto-wrapper.c |5 + libiberty/argv.c |8 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index ed076e3..aa71f1e 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -1482,10 +1482,7 @@ run_gcc (unsigned argc, char *argv[]) if (!incoming_dumppfx) { obstack_ptr_grow (&argv_obstack, "-dumpdir"); - /* An empty string would do, if only writeargv would write it -out in a way that would not be skipped by expandargv and -buildargv. */ - obstack_ptr_grow (&argv_obstack, current_dir); + obstack_ptr_grow (&argv_obstack, ""); } obstack_ptr_grow (&argv_obstack, "-dumpbase"); diff --git a/libiberty/argv.c b/libiberty/argv.c index 8c9794db..6a72208 100644 --- a/libiberty/argv.c +++ b/libiberty/argv.c @@ -327,6 +327,14 @@ writeargv (char * const *argv, FILE *f) arg++; } + /* Write out a pair of quotes for an empty argument. */ + if (arg == *argv) + if (EOF == fputs ("\"\"", f)) + { + status = 1; + goto done; + } + if (EOF == fputc ('\n', f)) { status = 1; -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
tolerate padding in mbstate_t
Padding in mbstate_t objects may get the memcmp to fail. Attempt to avoid the failure with zero initialization. Regstrapped on x86_64-linux-gnu, and also tested on a platform that used to fail because of padding in std::mbstate_t. Ok to install? for libstdc++-v3/ChangeLog * testsuite/27_io/fpos/mbstate_t/1.cc: Zero-init mbstate_t. --- testsuite/27_io/fpos/mbstate_t/1.cc | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc index f92d68f..559bd8d 100644 --- libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc +++ libstdc++-v3/testsuite/27_io/fpos/mbstate_t/1.cc @@ -28,8 +28,24 @@ void test01() { typedef std::mbstate_t state_type; - state_type state01 = state_type(); - state_type state02 = state_type(); + // Use zero-initialization of the underlying memory so that padding + // bytes, if any, stand a better chance of comparing the same. + // Zero-initialized memory is guaranteed to be a valid initial + // state. This doesn't quite guarantee that any padding bits won't + // be overwritten when copying from other instances that haven't + // been fully initialized: this data type is compatible with C, so + // it is likely plain old data, but it could have a default ctor + // that initializes only the relevant fields, whereas copy-ctor and + // operator= could be implemented as a full-object memcpy, including + // padding bits, rather than fieldwise copying. However, since + // we're comparing two values copied from the same state_type + // instance (or was this meant to take one of them from pos02 rather + // than both from pos01?), if padding bits are copied, we'll get the + // same for both of them, and if they aren't, we'll keep the values + // we initialized them with, so this should be good. + state_type state[2]; + std::memset(state, 0, sizeof (state)); + std::streampos pos01(0); std::streampos pos02(0); @@ -39,13 +55,13 @@ void test01() // state_type state(); // XXX Need to have better sanity checking for the mbstate_t type, - // or whatever the insantiating type for class fpos happens to be + // or whatever the instantiating type for class fpos happens to be // for streampos, as things like equality operators and assignment // operators, increment and deincrement operators need to be in // place. - pos01.state(state02); - state01 = pos01.state(); - VERIFY( std::memcmp(&state01, &state02, sizeof(state_type)) == 0 ); + pos01.state(state[1]); + state[0] = pos01.state(); + VERIFY( std::memcmp(&state[0], &state[1], sizeof(state_type)) == 0 ); } int main() -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);