Re: Help with getting started with adding IPA-related attribute
Thanks for the help, Martin! On 05/03/2017 03:42 AM, Martin Jambor wrote: Hi, On Sat, Apr 29, 2017 at 06:28:31AM -0500, Daniel Santos wrote: Brievity is not my forte, so let me start with the questions. Can somebody please point me to the pass and/or function where gcc 1.) decides rather or not to inline a function, There are two inlining passes. Early-inliner that works in topological order and inlines (almost) only when the result is expected to be smaller than uninlined code. Then there is a full IPA inlining pass, which can take advantage of LTO and heuristics that allows code to grow. To understand both, you really need to read most of ipa-inline*.[ch] files. The exact places where the decisions are made are: - early_inline_small_functions() and want_early_inline_function_p() in ipa-inline.c for early inlining, and - the IPA inliner has a few phases and sub-phases itself, see the helper functions in ipa_inline() and especially in inline_small_functions(). Alternatively, you want to check out all functions which call inline_call(). This is a treasure trove of information and I appreciate it. Maybe I can make a goal of writing a guide to this as a means of learning it better and having something to give back. There is a lot I need to learn about IPA and how the CFG is generated (as well as how it estimates code size, speed and such). I've found some nice work from IIT Bombay, although it's a bit out of date (https://www.cse.iitb.ac.in/grc/intdocs/gcc-conceptual-structure.html). I also recently discovered -fdump-ipa-all, but I tried it yet. I often have to explicitly "always_inline" and/or "flatten" cases where a large static function has only one caller. Inlining would make the caller much larger, but the result would be smaller and faster than the sum of the two -- this is one thing I want to investigate to see if it can be improved. 2.) decides rather or not to make a .constprop version of a function, Check out the places that call create_specialized_node() in ipa-cp.c, namely templated decide_about_value() and the end of decide_whether_version_node(), which however just checks the flag do_clone_for_all_contexts set earlier in estimate_local_effects(). I figured this one out after some work, but my understanding of the CFG is still very weak so I can't yet fully understand all of how it works. 3.) a good pass (when all constant propagation is done) to search for fn parameters and variables (marked with an attribute) that were not constproped away, and finally The attributes must be accepted by the front-end. For example in C and C++ this is done in gcc/c-family/c-attribs.c. Afterwards any pass or any code in the compiler can use lookup_attribute on the PARM_DECL tree representing the parameter. Got it. 4.) what mechanism should I use for that search? (iterate through the tree to find them and then see if they have rtl? I haven't worked in this area yet. No, I do not think you want to operate on the RTL level (though I have only very briefly skimmed through the rest of your email). Use the gimple/tree representation as I wrote above. Look at for example ipa_populate_param_decls() in ipa-prop.c to see how to iterate over all PARM_DECLs of a function. Hope this helps, Martin This is a great example. I figured out part of this by examining tree structures in memory while debugging -- a slow process. I ended up figuring out where the tree is expanded into RTL in the "expand" pass and I setup a check in expand_expr_real_1() for when it's expanding an SSA_NAME expression. My thought was to wait until the last possible moment for some pass to optimize it away. Now I just need to figure out how to only emit the warning/error once. I'm guessing I want to append something to the tree node for that SSA_NAME? (and then I'm not warning in duplicate for other versioned nodes) But I'll have to do this loop in order to figure out when to behave differently in decide_whether_version_node(). I have a crude patch with the base functionality -- to warn or error when a parameter or variable declared with __attribute__((constprop)) still exists by the time we expand to RTL, but I have a lot more to learn before I can make this really useful for C metaprogramming. diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index f2a88e147ba..5ec7b615e24 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -139,6 +139,7 @@ static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool *) static tree handle_bnd_legacy (tree *, tree, tree, int, bool *); static tree handle_bnd_instrument (tree *, tree, tree, int, bool *); static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *); +static tree handle_constprop_attribute (tree *, tree , tree , int , bool *); /* Table of machine-independent attributes common to all C-like languages. @@ -345,6 +346,8 @@ const struc
Bug in GCC 7.1?
The following program gives a warning under GCC 7.1 (built on cygwin, 64 bit) #include int main() { std::vector c {1,2,3,0}; while(c.size() > 0 && c.back() == 0) { auto sz = c.size() -1; c.resize(sz); } return 0; } $ c++7.1 -O3 tt.cxx In Funktion »int main()«: cc1plus: Warnung: »void* __builtin_memset(void*, int, long unsigned int)«: angegebene Größe 18446744073709551612 überschreitet maximale Objektgröße 9223372036854775807 [-Wstringop-overflow=] Compiling with GCC 6.1 (c++6.1 -O3 tt.cxx) works fine. Is this a problem of my program or a problem of GCC 7.1? Helmut
Re: Bug in GCC 7.1?
(I think you are looking for gcc-h...@gcc.gnu.org, or gcc's bugzilla, rather than this mailing list) On Fri, 5 May 2017, Helmut Zeisel wrote: The following program gives a warning under GCC 7.1 (built on cygwin, 64 bit) #include int main() { std::vector c {1,2,3,0}; while(c.size() > 0 && c.back() == 0) { auto sz = c.size() -1; c.resize(sz); } return 0; } $ c++7.1 -O3 tt.cxx Please use $ LC_ALL=C c++7.1 -O3 tt.cxx when you want to post the result, unless you are sending to a German forum. In Funktion »int main()«: cc1plus: Warnung: »void* __builtin_memset(void*, int, long unsigned int)«: angegebene Größe 18446744073709551612 überschreitet maximale Objektgröße 9223372036854775807 [-Wstringop-overflow=] Compiling with GCC 6.1 (c++6.1 -O3 tt.cxx) works fine. Is this a problem of my program or a problem of GCC 7.1? Sounds like a problem with gcc, maybe optimization creates a path that corresponds to size==0 and fails to notice that it cannot be taken. -- Marc Glisse
Re: Bug in GCC 7.1?
I added it to bugzilla, Bug 80641 Helmut
Re: .bad_compare in bootstrap of head
> On 05/04/2017 08:31 AM, Jeff Law wrote: > >On 05/04/2017 07:26 AM, Дмитрий Дьяченко wrote: > >>Fedora 26 x86_64 > >>r247595 > >> > >>~/src/gcc_current/configure --prefix=/usr/local/gcc_current > >>--enable-static --enable-checking=no --enable-languages=c,c++,lto > >>--enable-plugin --disable-multilib > >And that may be the key, I've been able to get a similar failure. Could > >easy be the checking flags. I'll do some bisecting here. > >jeff > > It's Jan's patch from April 30. > > commit e062e35c7ff7743aa31868370bf5cb9f8da835dd > Author: hubicka > Date: Sun Apr 30 15:02:11 2017 + I wonder how that patch can cause mismatches. Does it reproduce on one of compile farm machines (my x86-64 bootstrap works fine so does ia64 on terbium after fixing the gcc 4.1 issue yeterday). It would be great to have -fdump-ipa-inline-details dumps of the mismatching run. Perhaps it will be easy to spot the problem. Honza > > PR ipa/79224 > * ipa-inline-analysis.c (dump_predicate): Add optional parameter NL. > (account_size_time): Use two predicates - exec_pred and > nonconst_pred_ptr. > (evaluate_conditions_for_known_args): Compute both clause and > nonspec_clause. > (evaluate_properties_for_edge): Evaulate both clause and > nonspec_clause. > (inline_summary_t::duplicate): Update. > (estimate_function_body_sizes): Caluculate exec and nonconst > predicates > separately. > (compute_inline_parameters): Likewise. > (estimate_edge_size_and_time): Update caluclation of time. > (estimate_node_size_and_time): Compute both time and nonspecialized > time. > (estimate_ipcp_clone_size_and_time): Update. > (inline_merge_summary): Update. > (do_estimate_edge_time): Update. > (do_estimate_edge_size): Update. > (do_estimate_edge_hints): Update. > (inline_read_section, inline_write_summary): Stream both new > predicates. > * ipa-inline.c (compute_uninlined_call_time): Take > uninlined_call_time > as argument. > (compute_inlined_call_time): Cleanup. > (big_speedup_p): Update. > (edge_badness): Update. > * ipa-inline.h (INLINE_TIME_SCALE): Remove. > (size_time_entry): Replace predicate by exec_predicate and > nonconst_predicate. > (edge_growth_cache_entry): Cache both time nad nonspecialized time. > (estimate_edge_time): Return also nonspec_time. > (reset_edge_growth_cache): Update. >
Re: .bad_compare in bootstrap of head
> > On 05/04/2017 08:31 AM, Jeff Law wrote: > > >On 05/04/2017 07:26 AM, Дмитрий Дьяченко wrote: > > >>Fedora 26 x86_64 > > >>r247595 > > >> > > >>~/src/gcc_current/configure --prefix=/usr/local/gcc_current > > >>--enable-static --enable-checking=no --enable-languages=c,c++,lto > > >>--enable-plugin --disable-multilib > > >And that may be the key, I've been able to get a similar failure. Could > > >easy be the checking flags. I'll do some bisecting here. > > >jeff > > > > It's Jan's patch from April 30. > > > > commit e062e35c7ff7743aa31868370bf5cb9f8da835dd > > Author: hubicka > > Date: Sun Apr 30 15:02:11 2017 + > > I wonder how that patch can cause mismatches. Does it reproduce on one of > compile farm machines (my x86-64 bootstrap works fine so does ia64 on terbium > after fixing the gcc 4.1 issue yeterday). > It would be great to have -fdump-ipa-inline-details dumps of the mismatching > run. Perhaps it will be easy to spot the problem. Aha, --enable-checking=no. I will check what effect it have. Honza
Re: .bad_compare in bootstrap of head
> > I wonder how that patch can cause mismatches. Does it reproduce on one of > > compile farm machines (my x86-64 bootstrap works fine so does ia64 on > > terbium > > after fixing the gcc 4.1 issue yeterday). > > It would be great to have -fdump-ipa-inline-details dumps of the mismatching > > run. Perhaps it will be easy to spot the problem. > > Aha, --enable-checking=no. I will check what effect it have. It does reproduce for me. Time estimates are very slightly different between checking and non-checking build. I am investigating why. Honza > > Honza
Re: .bad_compare in bootstrap of head
On May 5, 2017 3:03:43 PM GMT+02:00, Jan Hubicka wrote: >> > I wonder how that patch can cause mismatches. Does it reproduce on >one of >> > compile farm machines (my x86-64 bootstrap works fine so does ia64 >on terbium >> > after fixing the gcc 4.1 issue yeterday). >> > It would be great to have -fdump-ipa-inline-details dumps of the >mismatching >> > run. Perhaps it will be easy to spot the problem. >> >> Aha, --enable-checking=no. I will check what effect it have. > >It does reproduce for me. Time estimates are very slightly different >between >checking and non-checking build. I am investigating why. You mean debugging and non-debugging build? Richard. >Honza >> >> Honza
r247646 (likely) causing bootstrap error
I see the following error during bootstrap on x86_64 configured with: --enable-languages=ada,all,c,c++,fortran,go,lto,objc,obj-c++ In file included from ./plugin-version.h:1:0, from /src/gcc-trunk/gcc/plugin.c:34: ./configargs.h:7:33: error: expected primary-expression before ‘;’ token } configure_default_options[] = ; ^ The error disappears after reverting to a commit prior to r247646. Thanks Martin
Re: r247646 (likely) causing bootstrap error
Sorry my apologies. I see that Nathan already fixed it. Must have slipped in during a conflict resolution during one of the many rebase between testing and commit approval. Best regards, Thomas On 05/05/17 18:27, Martin Sebor wrote: I see the following error during bootstrap on x86_64 configured with: --enable-languages=ada,all,c,c++,fortran,go,lto,objc,obj-c++ In file included from ./plugin-version.h:1:0, from /src/gcc-trunk/gcc/plugin.c:34: ./configargs.h:7:33: error: expected primary-expression before ‘;’ token } configure_default_options[] = ; ^ The error disappears after reverting to a commit prior to r247646. Thanks Martin
GCC target_clone support
I'm in the middle of adding support for the target_clone attribute to the PowerPC. At the moment, the x86_64/i386 port is the only port that supports target_clone. I added the arm/s390 port maintainers to this query, because those ports might also consider supporting target_clones in the future. In doing the work, I noticed that some of the functions called by the target hooks that were fairly generic, and we might want to move these functions into common code? Would people object if I put out a patch to move these functions to common code? I also have a question on the functionality of target_clone that I will address in a separate message. So far, the list of generic code that could be moved to common code to allow other ports to add target_clone support include: make_resolver_func: Make the resolver function decl to dispatch the versions of a multi-versioned function, DEFAULT_DECL. Create an empty basic block in the resolver and store the pointer in EMPTY_BB. Return the decl of the resolver function. make_dispatcher_decl: Make a dispatcher declaration for the multi-versioned function DECL. Calls to DECL function will be replaced with calls to the dispatcher by the front-end. Return the decl created. is_function_default_version: Returns true if decl is multi-versioned and DECL is the default function, that is it is not tagged with target specific optimization. attr_strcmp: Comparator function to be used in qsort routine to sort attribute specification strings to "target". sorted_attr_string: ARGLIST is the argument to target attribute. This function tokenizes the comma separated arguments, sorts them and returns a string which is a unique identifier for the comma separated arguments. It also replaces non-identifier characters "=,-" with "_". _function_versions: This function returns true if FN1 and FN2 are versions of the same function, that is, the target strings of the function decls are different. This assumes that FN1 and FN2 have the same signature. This is the TARGET_OPTION_FUNCTION_VERSIONS target hook, and it is the same between the x86 and ppc. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support (functionality question)
This message is separated from the question about moving code, as it is a questions about the functionality of target_clone support. Right now it looks like target_clone only generates the ifunc handler if there is a call to the function in the object file. It does not generate the ifunc handler if there is no call. For the default function, it generates the normal name. This means that any function that calls the function from a different object module will only get the standard function. From a library and user perspective, I think this is wrong. Instead the default function should be generated with a different name, and the ifunc function should list the standard name. Then you don't have to change all of the other calls in the object file, the normal ifunc handling will handle it. It also means you can more easily put this function in a library and automatically call the appropriate version. Do people agree with this assessment, and should I change the code to do this (for both x86 nd ppc targets)? If not, what is the reason for the objection? Consider mvc5.c from the testsuite: /* { dg-do compile } */ /* { dg-require-ifunc "" } */ /* { dg-options "-fno-inline" } */ /* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */ __attribute__((target_clones("default","avx","avx2"))) int foo () { return 10; } __attribute__((target_clones("default","avx","avx2"))) int bar () { return -foo (); } It generates: .file "mvc5.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: movl$10, %eax ret .type foo.avx.2, @function foo.avx.2: movl$10, %eax ret .type foo.avx2.3, @function foo.avx2.3: movl$10, %eax ret .weak foo.resolver .type foo.resolver, @function foo.resolver: subq$8, %rsp call__cpu_indicator_init movl__cpu_model+12(%rip), %eax testb $4, %ah je .L8 movl$foo.avx2.3, %eax addq$8, %rsp ret .L8: testb $2, %ah movl$foo.avx.2, %edx movl$foo, %eax cmovne %rdx, %rax addq$8, %rsp ret .type foo.ifunc, @gnu_indirect_function .setfoo.ifunc,foo.resolver // Note these functions are not referenced .type bar.avx2.1, @function bar.avx2.1: subq$8, %rsp xorl%eax, %eax callfoo.ifunc addq$8, %rsp negl%eax ret .type bar.avx.0, @function bar.avx.0: subq$8, %rsp xorl%eax, %eax callfoo.ifunc addq$8, %rsp negl%eax ret .type bar, @function // Note how it calls foo.ifunc instead of foo. bar: subq$8, %rsp xorl%eax, %eax callfoo.ifunc addq$8, %rsp negl%eax ret Now, if I remove the bar call, and just leave foo it generates: .type foo, @function foo: movl$10, %eax ret .type foo.avx.0, @function foo.avx.0: movl$10, %eax ret .type foo.avx2.1, @function foo.avx2.1: movl$10, %eax ret Note, it does not generate the resolver at all. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support
A minor feature that also should be considered is if you have two clone functions, one that calls the other, we should optimize the call to avoid using the indirect call setup by ifunc. I.e. extern __attribute__((target_clones("default","avx","avx2"))) int caller (); extern __attribute__((target_clones("default","avx","avx2"))) int callee (); __attribute__((target_clones("default","avx","avx2"))) int caller (void) { return -callee (); } __attribute__((target_clones("default","avx","avx2"))) int callee (void) { return 10; } I.e. caller.avx should call callee.avx, not callee (or callee.ifunc), and caller.avx2 should call callee.avx2. Do people think this is useful? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support (functionality question)
On Fri, May 05, 2017 at 12:32:03PM -0700, Evgeny Stupachenko wrote: > Hi Michael, > > On Fri, May 5, 2017 at 11:45 AM, Michael Meissner > wrote: > > This message is separated from the question about moving code, as it is a > > questions about the functionality of target_clone support. > > > > Right now it looks like target_clone only generates the ifunc handler if > > there > > is a call to the function in the object file. It does not generate the > > ifunc > > handler if there is no call. > > > > For the default function, it generates the normal name. This means that any > > function that calls the function from a different object module will only > > get > > the standard function. From a library and user perspective, I think this is > > wrong. Instead the default function should be generated with a different > > name, > > and the ifunc function should list the standard name. Then you don't have > > to > > change all of the other calls in the object file, the normal ifunc handling > > will handle it. It also means you can more easily put this function in a > > library and automatically call the appropriate version. > > I think library issue could be resolved using special headers. You can > look into GLIBC, there should be similar technique. > When you call resolver from initial (default) function, you make an > assumption that ifunc is supported. Yes, ifunc should be required before you even consider using target_clone. What I'm trying to do is make it easier for people to add target clones to their code, but not having to go through using special headers or other gunk. I want them to be able to add just the target clone line, and everything will work. That means in the current case, the default case should be renamed to .default, and what is .ifunc should become . -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support
On 05/05/2017 12:18 PM, Michael Meissner wrote: I'm in the middle of adding support for the target_clone attribute to the PowerPC. At the moment, the x86_64/i386 port is the only port that supports target_clone. I added the arm/s390 port maintainers to this query, because those ports might also consider supporting target_clones in the future. In doing the work, I noticed that some of the functions called by the target hooks that were fairly generic, and we might want to move these functions into common code? Would people object if I put out a patch to move these functions to common code? Sounds wise to me -- as you know its not unusual to find commonality when a feature is enabled on a new target. I also have a question on the functionality of target_clone that I will address in a separate message. So far, the list of generic code that could be moved to common code to allow other ports to add target_clone support include:Please consider this class of changes pre-approved. There's no reason to wait for review as you move this stuff around and generalize it slightly. jeff
Backporting Patches to GCC 7
I just submitted two patches against trunk. I'd like to also have them on the 7 branch, so when 7.2 comes out we'll have them. These patches only touch the RISC-V backend, which I'm a maintainer of. Is there a branch maintainer I'm supposed to have sign off on the patches or am I meant to just decide on my own what I should commit? For reference, here's the patches 284b54c RISC-V: Add -mstrict-align option 70218e8 RISC-V: Unify indention in riscv.md
Re: GCC target_clone support (functionality question)
On Fri, May 05, 2017 at 01:38:03PM -0700, Evgeny Stupachenko wrote: > On Fri, May 5, 2017 at 12:48 PM, Michael Meissner > wrote: > > On Fri, May 05, 2017 at 12:32:03PM -0700, Evgeny Stupachenko wrote: > >> Hi Michael, > >> > >> On Fri, May 5, 2017 at 11:45 AM, Michael Meissner > >> wrote: > >> > This message is separated from the question about moving code, as it is a > >> > questions about the functionality of target_clone support. > >> > > >> > Right now it looks like target_clone only generates the ifunc handler if > >> > there > >> > is a call to the function in the object file. It does not generate the > >> > ifunc > >> > handler if there is no call. > >> > > >> > For the default function, it generates the normal name. This means that > >> > any > >> > function that calls the function from a different object module will > >> > only get > >> > the standard function. From a library and user perspective, I think > >> > this is > >> > wrong. Instead the default function should be generated with a > >> > different name, > >> > and the ifunc function should list the standard name. Then you don't > >> > have to > >> > change all of the other calls in the object file, the normal ifunc > >> > handling > >> > will handle it. It also means you can more easily put this function in a > >> > library and automatically call the appropriate version. > >> > >> I think library issue could be resolved using special headers. You can > >> look into GLIBC, there should be similar technique. > >> When you call resolver from initial (default) function, you make an > >> assumption that ifunc is supported. > > > > Yes, ifunc should be required before you even consider using target_clone. > > If function foo() called using regular name it usually means that > there is no information about ifunc support. The whole point of ifunc is that the caller should not need to know whether a function is a normal function or an ifunc function. The linker handles it all making it go through the PLT like it does for shared library functions. I don't understand what you mean by this: > Another object file which > is built with target_clones will have foo() definition with > foo.resolver inside. And for users outside of GLIBC, it is more cumbersome to have to have the whole infrastructure to build separate versions of the file, and then write separate ifunc declaration. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Question about dump_printf/dump_printf_loc
I have a simple question about dump_printf and dump_printf_loc. I notice that most (all?) of the uses of these function are of the form: if (dump_enabled_p ()) dump_printf_loc (MSG_*, ..); Since dump_enabled_p() is just checking to see if dump_file or alt_dump_file is set and since dump_printf_loc has checks for these as well, is there any reason why we shouldn't or couldn't just use: dump_printf_loc (MSG_*, ..); with out the call to dump_enabled_p and have the dump function do nothing when there is no dump file set? I suppose the first version would have some performance advantage since dump_enabled_p is an inlined function, but is that enough of a reason to do it? The second version seems like it would look cleaner in the code where we are making these calls. Steve Ellcey sell...@cavium.com
Re: .bad_compare in bootstrap of head
On 05/05/2017 08:53 AM, Richard Biener wrote: On May 5, 2017 3:03:43 PM GMT+02:00, Jan Hubicka wrote: Aha, --enable-checking=no. I will check what effect it have. It does reproduce for me. Time estimates are very slightly different between checking and non-checking build. I am investigating why. You mean debugging and non-debugging build? Richard. I got the problem when I built with --enable-checking=yes,rtl. I haven't tested any other combinations. My first guess would be a gcc_checking_assert () with side-effects? Daniel