Re: about souce code location
On Fri, Sep 4, 2020 at 2:23 AM 易会战 via Gcc wrote: > > I am working a instrumention tool, and need get the location info for a > gimple statement. I use the location structure to get the info, and it can > work when i use -O1. When I use -O2, sometimes the info seems to be lost and > I get line num is zero. anyone can tell me how to get the info? Not all statements have a location, if you encounter such you need to look at the "surrounding context" to find one.
Re: A silly question regarding function types
On Fri, Sep 4, 2020 at 4:39 AM Gary Oblock via Gcc wrote: > > Note, isn't a problem, rather, it's something that puzzles me. > > On walking a function types argument types this way > > for ( arg = TYPE_ARG_TYPES ( func_type); >arg != NULL; >arg = TREE_CHAIN ( arg)) > { >. >. > } > > I noticed an extra void argument that didn't exist > tagged on the end. > > I then noticed other code doing this (which I copied:) > > for ( arg = TYPE_ARG_TYPES ( func_type); > arg != NULL && arg != void_list_node; > arg = TREE_CHAIN ( arg)) > { > . > . > } > > What is going on here??? Without a void_list_node on the end it's a variadic function > Thanks, > > Gary > > > > CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is > for the sole use of the intended recipient(s) and contains information that > is confidential and proprietary to Ampere Computing or its subsidiaries. It > is to be used solely for the purpose of furthering the parties' business > relationship. Any unauthorized review, copying, or distribution of this email > (or any attachments thereto) is strictly prohibited. If you are not the > intended recipient, please contact the sender immediately and permanently > delete the original and any copies of this email and any attachments thereto.
Re: Is there a way to look for a tree by its UID?
On 03/09/2020 12:19, Richard Biener wrote: On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc wrote: On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote: So, I am just wondering is there an interface where I could do something like: ``` // vars is the field in pt_solution of type bitmap EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi) { // uid is set tree pointed_to = get_tree_with_uid(uid); } ``` There is not. And there cannot be since the solution includes UIDs of decls that are "fake" and thus never really existed. Hi Richard and Jakub, thanks, I was looking for why get_tree_with_uid might be a somewhat bad idea. I am thinking about representing an alias set similarly to the pt_solution. Instead of having bits set in position of points-to variables UIDs, I was thinking about having bits set in position of may-alias variables' UIDs. I think an interface similar to the one I described can provide a good mechanism of jumping to different aliases, but I do agree that HEAP variables and shadow_variables (and perhaps other fake variables) might not be a good idea to keep in the interface to avoid jumping to trees which do not represent something in gimple. Richard, you mentioned in another e-mail that I might want to provide the alias-sets from IPA-PTA to another pass in a similar way to ipa_escape_pt. I think using a structure similar to pt_solution for may-alias solution is a good idea. Again, the bitmap to aliasing variables in UIDs. However, I think for this solution to be general I need several of these structs not just one. Ideally one per candidate alias-set, at most one per variable. I think you need to first get a set of candidates you want to transform (to limit the work done below), then use the internal points-to solutions and compute alias sets for this set plus the points-to solution this alias-set aliases. > You can then keep the candidate -> alias-set ID -> points-to relation (thus candidates should not be "all variables" for efficiency reasons). I think I can use a relatively simple heuristic: start by looking at malloc statements and obtain the alias-sets for variables which hold malloc's return values. This should address most efficiency concerns. So, I'm thinking about the following: * obtain variables which hold the result of malloc. These are the initial candidates. * for initial candidates compute alias-sets as bitmaps where only "real" decl UIDs are set. Compute this just before the end of IPA-PTA. * for each alias_set: for each alias: map[alias->decl] = alias_set * Use this map and the alias-sets bitmaps in pass just after IPA-PTA. * Potentially use something similar to get_tree_with_uid but that is only valid for trees which are keys in the map. Does this sound reasonable? Richard. Jakub
Re: Is there a way to look for a tree by its UID?
On Fri, Sep 04, 2020 at 10:12:57AM +0200, Erick Ochoa wrote: > I am thinking about representing an alias set similarly to the pt_solution. > Instead of having bits set in position of points-to variables UIDs, I was > thinking about having bits set in position of may-alias variables' UIDs. I > think an interface similar to the one I described can provide a good > mechanism of jumping to different aliases, but I do agree that HEAP > variables and shadow_variables (and perhaps other fake variables) might not > be a good idea to keep in the interface to avoid jumping to trees which do > not represent something in gimple. Not just those, but you shouldn't care also about variables with scalar types, I think your pass can't do anything useful with those either. Jakub
Re: LTO slows down calculix by more than 10% on aarch64
On Mon, 31 Aug 2020 at 16:53, Prathamesh Kulkarni wrote: > > On Fri, 28 Aug 2020 at 17:33, Alexander Monakov wrote: > > > > On Fri, 28 Aug 2020, Prathamesh Kulkarni via Gcc wrote: > > > > > I wonder if that's (one of) the main factor(s) behind slowdown or it's > > > not too relevant ? > > > > Probably not. Some advice to make your search more directed: > > > > Pass '-n' to 'perf report'. Relative sample ratios are hard to reason about > > when they are computed against different bases, it's much easier to see that > > a loop is slowing down if it went from 4000 to 4500 in absolute sample count > > as opposed to 90% to 91% in relative sample ratio. > > > > Before diving down 'perf report', be sure to fully account for differences > > in 'perf stat' output. Do the programs execute the same number of > > instructions, > > so the difference only in scheduling? Do the programs suffer from the same > > amount of branch mispredictions? Please show output of 'perf stat' on the > > mailing list too, so everyone is on the same page about that. > > > > I also suspect that the dramatic slowdown has to do with the extra branch. > > Your CPU might have some specialized counters for branch prediction, see > > 'perf list'. > Hi Alexander, > Thanks for the suggestions! I am in the process of doing the > benchmarking experiments, > and will post the results soon. Hi, I obtained perf stat results for following benchmark runs: -O2: 7856832.692380 task-clock (msec) #1.000 CPUs utilized 3758 context-switches #0.000 K/sec 40 cpu-migrations #0.000 K/sec 40847 page-faults #0.005 K/sec 7856782413676 cycles #1.000 GHz 6034510093417 instructions #0.77 insn per cycle 363937274287 branches # 46.321 M/sec 48557110132 branch-misses# 13.34% of all branches -O2 with orthonl inlined: 8319643.114380 task-clock (msec) #1.000 CPUs utilized 4285 context-switches #0.001 K/sec 28 cpu-migrations#0.000 K/sec 40843 page-faults #0.005 K/sec 8319591038295 cycles #1.000 GHz 6276338800377 instructions #0.75 insn per cycle 467400726106 branches # 56.180 M/sec 45986364011branch-misses #9.84% of all branches -O2 with orthonl inlined and PRE disabled (this removes the extra branches): 8207331.088040 task-clock (msec) #1.000 CPUs utilized 2266 context-switches#0.000 K/sec 32 cpu-migrations #0.000 K/sec 40846 page-faults #0.005 K/sec 8207292032467 cycles # 1.000 GHz 6035724436440 instructions #0.74 insn per cycle 364415440156 branches # 44.401 M/sec 53138327276branch-misses # 14.58% of all branches -O2 with orthonl inlined and hoisting disabled: 7797265.206850 task-clock (msec) #1.000 CPUs utilized 3139 context-switches #0.000 K/sec 20cpu-migrations #0.000 K/sec 40846 page-faults #0.005 K/sec 7797221351467 cycles #1.000 GHz 6187348757324 instructions #0.79 insn per cycle 461840800061 branches # 59.231 M/sec 26920311761branch-misses #5.83% of all branches Perf profiles for -O2 -fno-code-hoisting and inlined orthonl: https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data 3196866 |1f04:ldur d1, [x1, #-248] 216348301800│addw0, w0, #0x1 985098 |addx2, x2, #0x18 216215999206│addx1, x1, #0x48 215630376504│fmul d1, d5, d1 863829148015│fmul d1, d1, d6 864228353526│fmul d0, d1, d0 864568163014│fmadd d2, d0, d16, d2 │ cmpw0, #0x4 216125427594│ ↓ b.eq 1f34 15010377│ ldur d0, [x2, #-8] 143753737468│ ↑ b 1f04 -O2 with inlined orthonl: https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data 359871503840│ 1ef8: ldur d15, [x1, #-248] 144055883055│addw0, w0, #0x1 72262104254│addx2, x2, #0x18 143991169721│addx1, x1, #0x48 288648917780│fmul d15, d17, d1
Re: LTO slows down calculix by more than 10% on aarch64
> I obtained perf stat results for following benchmark runs: > > -O2: > > 7856832.692380 task-clock (msec) #1.000 CPUs utilized > 3758 context-switches #0.000 K/sec > 40 cpu-migrations #0.000 K/sec > 40847 page-faults #0.005 K/sec > 7856782413676 cycles #1.000 GHz > 6034510093417 instructions #0.77 insn per > cycle > 363937274287 branches # 46.321 M/sec >48557110132 branch-misses# 13.34% of all > branches (ouch, 2+ hours per run is a lot, collecting a profile over a minute should be enough for this kind of code) > -O2 with orthonl inlined: > > 8319643.114380 task-clock (msec) #1.000 CPUs utilized > 4285 context-switches #0.001 K/sec > 28 cpu-migrations#0.000 K/sec > 40843 page-faults #0.005 K/sec > 8319591038295 cycles #1.000 GHz > 6276338800377 instructions #0.75 insn per > cycle > 467400726106 branches # 56.180 M/sec >45986364011branch-misses #9.84% of all > branches So +100e9 branches, but +240e9 instructions and +480e9 cycles, probably implying that extra instructions are appearing in this loop nest, but not in the innermost loop. As a reminder for others, the innermost loop has only 3 iterations. > -O2 with orthonl inlined and PRE disabled (this removes the extra branches): > >8207331.088040 task-clock (msec) #1.000 CPUs utilized > 2266 context-switches#0.000 K/sec > 32 cpu-migrations #0.000 K/sec > 40846 page-faults #0.005 K/sec > 8207292032467 cycles # 1.000 GHz > 6035724436440 instructions #0.74 insn per cycle > 364415440156 branches # 44.401 M/sec >53138327276branch-misses # 14.58% of all branches This seems to match baseline in terms of instruction count, but without PRE the loop nest may be carrying some dependencies over memory. I would simply check the assembly for the entire 6-level loop nest in question, I hope it's not very complicated (though Fortran array addressing...). > -O2 with orthonl inlined and hoisting disabled: > >7797265.206850 task-clock (msec) #1.000 CPUs utilized > 3139 context-switches #0.000 K/sec > 20cpu-migrations #0.000 K/sec > 40846 page-faults #0.005 K/sec > 7797221351467 cycles #1.000 GHz > 6187348757324 instructions #0.79 insn per > cycle > 461840800061 branches # 59.231 M/sec >26920311761branch-misses #5.83% of all branches There's a 20e9 reduction in branch misses and a 500e9 reduction in cycle count. I don't think the former fully covers the latter (there's also a 90e9 reduction in insn count). Given that the inner loop iterates only 3 times, my main suggestion is to consider how the profile for the entire loop nest looks like (it's 6 loops deep, each iterating only 3 times). > Perf profiles for > -O2 -fno-code-hoisting and inlined orthonl: > https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data > > 3196866 |1f04:ldur d1, [x1, #-248] > 216348301800│addw0, w0, #0x1 > 985098 |addx2, x2, #0x18 > 216215999206│addx1, x1, #0x48 > 215630376504│fmul d1, d5, d1 > 863829148015│fmul d1, d1, d6 > 864228353526│fmul d0, d1, d0 > 864568163014│fmadd d2, d0, d16, d2 > │ cmpw0, #0x4 > 216125427594│ ↓ b.eq 1f34 > 15010377│ ldur d0, [x2, #-8] > 143753737468│ ↑ b 1f04 > > -O2 with inlined orthonl: > https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data > > 359871503840│ 1ef8: ldur d15, [x1, #-248] > 144055883055│addw0, w0, #0x1 > 72262104254│addx2, x2, #0x18 > 143991169721│addx1, x1, #0x48 > 288648917780│fmul d15, d17, d15 > 864665644756│fmul d15, d15, d18 > 863868426387│fmul d14, d15, d14 > 865228159813│fmadd d16, d14, d31, d16 > 245967│cmpw0, #0x4 > 215396760545│ ↓
Re: #line directives in generated C files
On Thu, Sep 3, 2020 at 8:19 PM Hans-Peter Nilsson wrote: > On Thu, 27 Aug 2020, Pip Cet via Gcc wrote: > > I may be missing an obvious workaround, but it seems we currently emit > > a #line directive when including lines from machine description files > > in C files, but never emit a second directive when switching back to > > the generated C file. This makes stepping through the backend in gdb > > Thanks for taking on this! Thanks for the encouragement! > IMHO stepping into the .md really isn't helpful. Even a pattern > name in a comment in the generated code would be better. I think it is helpful, FWIW, to be able to set a breakpoint on an md condition or in the preparation code (in conjunction with setting CXXFLAGS to "-O0 -g3" or similar), but since that's not a "normal" compilation it'd be acceptable to specify an extra switch for this feature. That would mean my genline.c program wouldn't have to run except in those constellations...
Re: about souce code location
how to check the location corresponding to a gimple statement? My instrument stmt include some memory access, I wish get right source code line. By context it is possible get wrong line. ---Original--- From: "Richard Biener"
Re: Is there a way to look for a tree by its UID?
On Fri, Sep 4, 2020 at 10:13 AM Erick Ochoa wrote: > > > > On 03/09/2020 12:19, Richard Biener wrote: > > On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc > > wrote: > >> > >> On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote: > >>> So, I am just wondering is there an interface where I could do something > >>> like: > >>> > >>> ``` > >>> // vars is the field in pt_solution of type bitmap > >>> EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi) > >>> { > >>> // uid is set > >>> tree pointed_to = get_tree_with_uid(uid); > >>> } > >>> ``` > >> > >> There is not. > > > > And there cannot be since the solution includes UIDs of > > decls that are "fake" and thus never really existed. > > Hi Richard and Jakub, > > thanks, I was looking for why get_tree_with_uid might be a somewhat bad > idea. > > I am thinking about representing an alias set similarly to the > pt_solution. Instead of having bits set in position of points-to > variables UIDs, I was thinking about having bits set in position of > may-alias variables' UIDs. I think an interface similar to the one I > described can provide a good mechanism of jumping to different aliases, > but I do agree that HEAP variables and shadow_variables (and perhaps > other fake variables) might not be a good idea to keep in the interface > to avoid jumping to trees which do not represent something in gimple. > > Richard, you mentioned in another e-mail that I might want to provide > the alias-sets from IPA-PTA to another pass in a similar way to > ipa_escape_pt. I think using a structure similar to pt_solution for > may-alias solution is a good idea. Again, the bitmap to aliasing > variables in UIDs. However, I think for this solution to be general I > need several of these structs not just one. Ideally one per candidate > alias-set, at most one per variable. Sure, you need one per alias-set. Indeed you might want to work with bitmaps of varinfo IDs first when computing alias-sets since ... > > > > I think you need to first get a set of candidates you want to > > transform (to limit the work done below), then use the > > internal points-to solutions and compute alias sets for this > > set plus the points-to solution this alias-set aliases. > > > You can then keep the candidate -> alias-set ID -> points-to > > relation (thus candidates should not be "all variables" for > > efficiency reasons). > > I think I can use a relatively simple heuristic: start by looking at > malloc statements and obtain the alias-sets for variables which hold > malloc's return values. This should address most efficiency concerns. > > So, I'm thinking about the following: > > * obtain variables which hold the result of malloc. These are the > initial candidates. ... those would be the is_heapvar ones. Since you can probably only handle the case where all pointers either only point to a single allocation sites result and nothing else or not to it that case looks special and thus easy anyway. > * for initial candidates compute alias-sets as bitmaps where only "real" > decl UIDs are set. Compute this just before the end of IPA-PTA. > * for each alias_set: > for each alias: >map[alias->decl] = alias_set > * Use this map and the alias-sets bitmaps in pass just after IPA-PTA. > * Potentially use something similar to get_tree_with_uid but that is > only valid for trees which are keys in the map. Hmm, isn't this more than you need? Given a set of candidates C you try to form alias-sets so that all members of an alias set A are members of C, they are layout-compatible and not member of another alias-set. Plus no member escapes and all pointers you can track may only point to a subset of a single alias-set. >From the above C is what constrains the size of your sets and the mapping. > Does this sound reasonable? > > > > > Richard. > > > >> Jakub > >>
Re: Is there a way to look for a tree by its UID?
On 04/09/2020 15:19, Richard Biener wrote: On Fri, Sep 4, 2020 at 10:13 AM Erick Ochoa wrote: On 03/09/2020 12:19, Richard Biener wrote: On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc wrote: On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote: So, I am just wondering is there an interface where I could do something like: ``` // vars is the field in pt_solution of type bitmap EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi) { // uid is set tree pointed_to = get_tree_with_uid(uid); } ``` There is not. And there cannot be since the solution includes UIDs of decls that are "fake" and thus never really existed. Hi Richard and Jakub, thanks, I was looking for why get_tree_with_uid might be a somewhat bad idea. I am thinking about representing an alias set similarly to the pt_solution. Instead of having bits set in position of points-to variables UIDs, I was thinking about having bits set in position of may-alias variables' UIDs. I think an interface similar to the one I described can provide a good mechanism of jumping to different aliases, but I do agree that HEAP variables and shadow_variables (and perhaps other fake variables) might not be a good idea to keep in the interface to avoid jumping to trees which do not represent something in gimple. Richard, you mentioned in another e-mail that I might want to provide the alias-sets from IPA-PTA to another pass in a similar way to ipa_escape_pt. I think using a structure similar to pt_solution for may-alias solution is a good idea. Again, the bitmap to aliasing variables in UIDs. However, I think for this solution to be general I need several of these structs not just one. Ideally one per candidate alias-set, at most one per variable. Sure, you need one per alias-set. Indeed you might want to work with bitmaps of varinfo IDs first when computing alias-sets since ... Yes, that's what I've been doing :) I think you need to first get a set of candidates you want to transform (to limit the work done below), then use the internal points-to solutions and compute alias sets for this set plus the points-to solution this alias-set aliases. > You can then keep the candidate -> alias-set ID -> points-to relation (thus candidates should not be "all variables" for efficiency reasons). I think I can use a relatively simple heuristic: start by looking at malloc statements and obtain the alias-sets for variables which hold malloc's return values. This should address most efficiency concerns. So, I'm thinking about the following: * obtain variables which hold the result of malloc. These are the initial candidates. ... those would be the is_heapvar ones. Since you can probably only handle the case where all pointers either only point to a single allocation sites result and nothing else or not to it that case looks special and thus easy anyway. I did a git grep and is_heapvar is gone. But, I believe that I still collect these variables as quickly as possible. I iterate over the call graph and if I find malloc, then I just look at the callers and collect the lhs. This lhs corresponds to the "decl" in the varinfo_t struct. I then just iterate over the variables in varmap to find matching lhs with the decl and computing alias sets by looking at the intersection of pt_solution. This seems to work well. I still need to find out whether they escape, but it should be simple to do so from here. * for initial candidates compute alias-sets as bitmaps where only "real" decl UIDs are set. Compute this just before the end of IPA-PTA. * for each alias_set: for each alias: map[alias->decl] = alias_set * Use this map and the alias-sets bitmaps in pass just after IPA-PTA. * Potentially use something similar to get_tree_with_uid but that is only valid for trees which are keys in the map. Hmm, isn't this more than you need? Given a set of candidates C you try to form alias-sets so that all members of an alias set A are members of C, they are layout-compatible and not member of another alias-set. Plus no member escapes and all pointers you can track may only point to a subset of a single alias-set. Yes? I'm not really sure what you are trying to say here. Can you please elaborate more on "this" in the sentence "isn't this more than you need?". I think I need a way that given some tree (which I know is in candidates) I can refer to its alias set. Since the given set of candidates might have more than 1 opportunity for transformation (i.e. computing the alias-sets for all c in C yields two distinct alias sets A_0 = {c_0, c_1, ... c_x}, A_1 = {c_x+1, ... c_y} ) I need a way of referring to these distinct alias sets. Here, I was hoping to refer to alias sets by the members of the sets themselves. c_0 -> A_0 c_1 -> A_0 ... c_x+1 -> A_1 c_x+2 -> A_1 The get_tree_with_uid interface is more of a quick way to find out if they are layout compatible. Because if we have a bitm
gcc-9-20200904 is now available
Snapshot gcc-9-20200904 is now available on https://gcc.gnu.org/pub/gcc/snapshots/9-20200904/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 revision 5371ab207594ae2ef4c5223c2adae88b7a27b76b You'll find: gcc-9-20200904.tar.xzComplete GCC SHA256=27e7479857a3ed45f8db6e470f1a6398db3a7a0bb058d2b7f28797e276126d08 SHA1=dac2912c3130cb98b81d06a88f55e9b3bb745c41 Diffs from 9-20200828 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.