Re: about souce code location

2020-09-04 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 2:23 AM 易会战 via Gcc  wrote:
>
> I am working a instrumention tool, and need get the location info for a 
> gimple statement. I use the location structure to get the info, and it can 
> work when i use -O1. When I use -O2, sometimes the info seems to be lost and 
> I get line num is zero.  anyone can tell me how to get the info?

Not all statements have a location, if you encounter such you need to
look at the "surrounding context"
to find one.


Re: A silly question regarding function types

2020-09-04 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 4:39 AM Gary Oblock via Gcc  wrote:
>
> Note, isn't a problem, rather, it's something that puzzles me.
>
> On walking a function types argument types this way
>
> for ( arg = TYPE_ARG_TYPES ( func_type);
>arg != NULL;
>arg = TREE_CHAIN ( arg))
> {
>.
>.
>  }
>
> I noticed an extra void argument that didn't exist
> tagged on the end.
>
> I then noticed other code doing this (which I copied:)
>
> for ( arg = TYPE_ARG_TYPES ( func_type);
> arg != NULL && arg != void_list_node;
> arg = TREE_CHAIN ( arg))
>  {
>  .
>  .
>   }
>
> What is going on here???

Without a void_list_node on the end it's a variadic function

> Thanks,
>
> Gary
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Re: Is there a way to look for a tree by its UID?

2020-09-04 Thread Erick Ochoa




On 03/09/2020 12:19, Richard Biener wrote:

On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc  wrote:


On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote:

So, I am just wondering is there an interface where I could do something
like:

```
 // vars is the field in pt_solution of type bitmap
 EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi)
 {
// uid is set
tree pointed_to = get_tree_with_uid(uid);
 }
```


There is not.


And there cannot be since the solution includes UIDs of
decls that are "fake" and thus never really existed.


Hi Richard and Jakub,

thanks, I was looking for why get_tree_with_uid might be a somewhat bad 
idea.


I am thinking about representing an alias set similarly to the 
pt_solution. Instead of having bits set in position of points-to 
variables UIDs, I was thinking about having bits set in position of 
may-alias variables' UIDs. I think an interface similar to the one I 
described can provide a good mechanism of jumping to different aliases, 
but I do agree that HEAP variables and shadow_variables (and perhaps 
other fake variables) might not be a good idea to keep in the interface 
to avoid jumping to trees which do not represent something in gimple.


Richard, you mentioned in another e-mail that I might want to provide 
the alias-sets from IPA-PTA to another pass in a similar way to 
ipa_escape_pt. I think using a structure similar to pt_solution for 
may-alias solution is a good idea. Again, the bitmap to aliasing 
variables in UIDs. However, I think for this solution to be general I 
need several of these structs not just one. Ideally one per candidate 
alias-set, at most one per variable.




I think you need to first get a set of candidates you want to
transform (to limit the work done below), then use the
internal points-to solutions and compute alias sets for this
set plus the points-to solution this alias-set aliases. >
You can then keep the candidate -> alias-set ID -> points-to
relation (thus candidates should not be "all variables" for
efficiency reasons).


I think I can use a relatively simple heuristic: start by looking at 
malloc statements and obtain the alias-sets for variables which hold 
malloc's return values. This should address most efficiency concerns.


So, I'm thinking about the following:

* obtain variables which hold the result of malloc. These are the 
initial candidates.
* for initial candidates compute alias-sets as bitmaps where only "real" 
decl UIDs are set. Compute this just before the end of IPA-PTA.

* for each alias_set:
for each alias:
  map[alias->decl] = alias_set
* Use this map and the alias-sets bitmaps in pass just after IPA-PTA.
* Potentially use something similar to get_tree_with_uid but that is 
only valid for trees which are keys in the map.


Does this sound reasonable?



Richard.


 Jakub



Re: Is there a way to look for a tree by its UID?

2020-09-04 Thread Jakub Jelinek via Gcc
On Fri, Sep 04, 2020 at 10:12:57AM +0200, Erick Ochoa wrote:
> I am thinking about representing an alias set similarly to the pt_solution.
> Instead of having bits set in position of points-to variables UIDs, I was
> thinking about having bits set in position of may-alias variables' UIDs. I
> think an interface similar to the one I described can provide a good
> mechanism of jumping to different aliases, but I do agree that HEAP
> variables and shadow_variables (and perhaps other fake variables) might not
> be a good idea to keep in the interface to avoid jumping to trees which do
> not represent something in gimple.

Not just those, but you shouldn't care also about variables with scalar
types, I think your pass can't do anything useful with those either.

Jakub



Re: LTO slows down calculix by more than 10% on aarch64

2020-09-04 Thread Prathamesh Kulkarni via Gcc
On Mon, 31 Aug 2020 at 16:53, Prathamesh Kulkarni
 wrote:
>
> On Fri, 28 Aug 2020 at 17:33, Alexander Monakov  wrote:
> >
> > On Fri, 28 Aug 2020, Prathamesh Kulkarni via Gcc wrote:
> >
> > > I wonder if that's (one of) the main factor(s) behind slowdown or it's
> > > not too relevant ?
> >
> > Probably not. Some advice to make your search more directed:
> >
> > Pass '-n' to 'perf report'. Relative sample ratios are hard to reason about
> > when they are computed against different bases, it's much easier to see that
> > a loop is slowing down if it went from 4000 to 4500 in absolute sample count
> > as opposed to 90% to 91% in relative sample ratio.
> >
> > Before diving down 'perf report', be sure to fully account for differences
> > in 'perf stat' output. Do the programs execute the same number of 
> > instructions,
> > so the difference only in scheduling? Do the programs suffer from the same
> > amount of branch mispredictions? Please show output of 'perf stat' on the
> > mailing list too, so everyone is on the same page about that.
> >
> > I also suspect that the dramatic slowdown has to do with the extra branch.
> > Your CPU might have some specialized counters for branch prediction, see
> > 'perf list'.
> Hi Alexander,
> Thanks for the suggestions! I am in the process of doing the
> benchmarking experiments,
> and will post the results soon.
Hi,
I obtained perf stat results for following benchmark runs:

-O2:

7856832.692380  task-clock (msec) #1.000 CPUs utilized
  3758   context-switches  #0.000 K/sec
40 cpu-migrations #0.000 K/sec
 40847  page-faults   #0.005 K/sec
 7856782413676  cycles   #1.000 GHz
 6034510093417  instructions   #0.77  insn per cycle
  363937274287   branches   #   46.321 M/sec
   48557110132   branch-misses#   13.34% of all branches

-O2 with orthonl inlined:

8319643.114380  task-clock (msec)   #1.000 CPUs utilized
  4285   context-switches #0.001 K/sec
28 cpu-migrations#0.000 K/sec
 40843  page-faults  #0.005 K/sec
 8319591038295  cycles  #1.000 GHz
 6276338800377  instructions  #0.75  insn per cycle
  467400726106   branches  #   56.180 M/sec
   45986364011branch-misses  #9.84% of all branches

-O2 with orthonl inlined and PRE disabled (this removes the extra branches):

   8207331.088040  task-clock (msec)   #1.000 CPUs utilized
  2266   context-switches#0.000 K/sec
32 cpu-migrations   #0.000 K/sec
 40846  page-faults #0.005 K/sec
 8207292032467  cycles #   1.000 GHz
 6035724436440  instructions #0.74  insn per cycle
  364415440156   branches #   44.401 M/sec
   53138327276branch-misses #   14.58% of all branches

-O2 with orthonl inlined and hoisting disabled:

   7797265.206850  task-clock (msec) #1.000 CPUs utilized
  3139  context-switches  #0.000 K/sec
20cpu-migrations #0.000 K/sec
 40846  page-faults  #0.005 K/sec
 7797221351467  cycles  #1.000 GHz
 6187348757324  instructions  #0.79  insn per cycle
  461840800061   branches  #   59.231 M/sec
   26920311761branch-misses #5.83% of all branches

Perf profiles for
-O2 -fno-code-hoisting and inlined orthonl:
https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data

  3196866 |1f04:ldur   d1, [x1, #-248]
216348301800│addw0, w0, #0x1
985098 |addx2, x2, #0x18
216215999206│addx1, x1, #0x48
215630376504│fmul   d1, d5, d1
863829148015│fmul   d1, d1, d6
864228353526│fmul   d0, d1, d0
864568163014│fmadd  d2, d0, d16, d2
│ cmpw0, #0x4
216125427594│  ↓ b.eq   1f34
15010377│ ldur   d0, [x2, #-8]
143753737468│  ↑ b  1f04

-O2 with inlined orthonl:
https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data

359871503840│ 1ef8:   ldur   d15, [x1, #-248]
144055883055│addw0, w0, #0x1
  72262104254│addx2, x2, #0x18
143991169721│addx1, x1, #0x48
288648917780│fmul   d15, d17, d1

Re: LTO slows down calculix by more than 10% on aarch64

2020-09-04 Thread Alexander Monakov via Gcc
> I obtained perf stat results for following benchmark runs:
> 
> -O2:
> 
> 7856832.692380  task-clock (msec) #1.000 CPUs utilized
>   3758   context-switches  #0.000 K/sec
> 40 cpu-migrations #0.000 K/sec
>  40847  page-faults   #0.005 K/sec
>  7856782413676  cycles   #1.000 GHz
>  6034510093417  instructions   #0.77  insn per 
> cycle
>   363937274287   branches   #   46.321 M/sec
>48557110132   branch-misses#   13.34% of all 
> branches

(ouch, 2+ hours per run is a lot, collecting a profile over a minute should be
enough for this kind of code)

> -O2 with orthonl inlined:
> 
> 8319643.114380  task-clock (msec)   #1.000 CPUs utilized
>   4285   context-switches #0.001 K/sec
> 28 cpu-migrations#0.000 K/sec
>  40843  page-faults  #0.005 K/sec
>  8319591038295  cycles  #1.000 GHz
>  6276338800377  instructions  #0.75  insn per 
> cycle
>   467400726106   branches  #   56.180 M/sec
>45986364011branch-misses  #9.84% of all 
> branches

So +100e9 branches, but +240e9 instructions and +480e9 cycles, probably implying
that extra instructions are appearing in this loop nest, but not in the 
innermost
loop. As a reminder for others, the innermost loop has only 3 iterations.

> -O2 with orthonl inlined and PRE disabled (this removes the extra branches):
> 
>8207331.088040  task-clock (msec)   #1.000 CPUs utilized
>   2266   context-switches#0.000 K/sec
> 32 cpu-migrations   #0.000 K/sec
>  40846  page-faults #0.005 K/sec
>  8207292032467  cycles #   1.000 GHz
>  6035724436440  instructions #0.74  insn per cycle
>   364415440156   branches #   44.401 M/sec
>53138327276branch-misses #   14.58% of all branches

This seems to match baseline in terms of instruction count, but without PRE
the loop nest may be carrying some dependencies over memory. I would simply
check the assembly for the entire 6-level loop nest in question, I hope it's
not very complicated (though Fortran array addressing...).

> -O2 with orthonl inlined and hoisting disabled:
> 
>7797265.206850  task-clock (msec) #1.000 CPUs utilized
>   3139  context-switches  #0.000 K/sec
> 20cpu-migrations #0.000 K/sec
>  40846  page-faults  #0.005 K/sec
>  7797221351467  cycles  #1.000 GHz
>  6187348757324  instructions  #0.79  insn per 
> cycle
>   461840800061   branches  #   59.231 M/sec
>26920311761branch-misses #5.83% of all branches

There's a 20e9 reduction in branch misses and a 500e9 reduction in cycle count.
I don't think the former fully covers the latter (there's also a 90e9 reduction
in insn count).

Given that the inner loop iterates only 3 times, my main suggestion is to
consider how the profile for the entire loop nest looks like (it's 6 loops deep,
each iterating only 3 times).

> Perf profiles for
> -O2 -fno-code-hoisting and inlined orthonl:
> https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data
> 
>   3196866 |1f04:ldur   d1, [x1, #-248]
> 216348301800│addw0, w0, #0x1
> 985098 |addx2, x2, #0x18
> 216215999206│addx1, x1, #0x48
> 215630376504│fmul   d1, d5, d1
> 863829148015│fmul   d1, d1, d6
> 864228353526│fmul   d0, d1, d0
> 864568163014│fmadd  d2, d0, d16, d2
> │ cmpw0, #0x4
> 216125427594│  ↓ b.eq   1f34
> 15010377│ ldur   d0, [x2, #-8]
> 143753737468│  ↑ b  1f04
> 
> -O2 with inlined orthonl:
> https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data
> 
> 359871503840│ 1ef8:   ldur   d15, [x1, #-248]
> 144055883055│addw0, w0, #0x1
>   72262104254│addx2, x2, #0x18
> 143991169721│addx1, x1, #0x48
> 288648917780│fmul   d15, d17, d15
> 864665644756│fmul   d15, d15, d18
> 863868426387│fmul   d14, d15, d14
> 865228159813│fmadd  d16, d14, d31, d16
> 245967│cmpw0, #0x4
> 215396760545│ ↓

Re: #line directives in generated C files

2020-09-04 Thread Pip Cet via Gcc
On Thu, Sep 3, 2020 at 8:19 PM Hans-Peter Nilsson  wrote:
> On Thu, 27 Aug 2020, Pip Cet via Gcc wrote:
> > I may be missing an obvious workaround, but it seems we currently emit
> > a #line directive when including lines from machine description files
> > in C files, but never emit a second directive when switching back to
> > the generated C file. This makes stepping through the backend in gdb
>
> Thanks for taking on this!

Thanks for the encouragement!

> IMHO stepping into the .md really isn't helpful.  Even a pattern
> name in a comment in the generated code would be better.

I think it is helpful, FWIW, to be able to set a breakpoint on an md
condition or in the preparation code (in conjunction with setting
CXXFLAGS to "-O0 -g3" or similar), but since that's not a "normal"
compilation it'd be acceptable to specify an extra switch for this
feature. That would mean my genline.c program wouldn't have to run
except in those constellations...


Re: about souce code location

2020-09-04 Thread 易会战 via Gcc
how to check the location corresponding to a gimple statement? My instrument 
stmt include some memory access, I wish get right source code line. By context 
it is possible get wrong line.



---Original---
From: "Richard Biener"

Re: Is there a way to look for a tree by its UID?

2020-09-04 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 10:13 AM Erick Ochoa
 wrote:
>
>
>
> On 03/09/2020 12:19, Richard Biener wrote:
> > On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc  
> > wrote:
> >>
> >> On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote:
> >>> So, I am just wondering is there an interface where I could do something
> >>> like:
> >>>
> >>> ```
> >>>  // vars is the field in pt_solution of type bitmap
> >>>  EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi)
> >>>  {
> >>> // uid is set
> >>> tree pointed_to = get_tree_with_uid(uid);
> >>>  }
> >>> ```
> >>
> >> There is not.
> >
> > And there cannot be since the solution includes UIDs of
> > decls that are "fake" and thus never really existed.
>
> Hi Richard and Jakub,
>
> thanks, I was looking for why get_tree_with_uid might be a somewhat bad
> idea.
>
> I am thinking about representing an alias set similarly to the
> pt_solution. Instead of having bits set in position of points-to
> variables UIDs, I was thinking about having bits set in position of
> may-alias variables' UIDs. I think an interface similar to the one I
> described can provide a good mechanism of jumping to different aliases,
> but I do agree that HEAP variables and shadow_variables (and perhaps
> other fake variables) might not be a good idea to keep in the interface
> to avoid jumping to trees which do not represent something in gimple.
>
> Richard, you mentioned in another e-mail that I might want to provide
> the alias-sets from IPA-PTA to another pass in a similar way to
> ipa_escape_pt. I think using a structure similar to pt_solution for
> may-alias solution is a good idea. Again, the bitmap to aliasing
> variables in UIDs. However, I think for this solution to be general I
> need several of these structs not just one. Ideally one per candidate
> alias-set, at most one per variable.

Sure, you need one per alias-set.  Indeed you might want to work
with bitmaps of varinfo IDs first when computing alias-sets
since ...
> >
> > I think you need to first get a set of candidates you want to
> > transform (to limit the work done below), then use the
> > internal points-to solutions and compute alias sets for this
> > set plus the points-to solution this alias-set aliases. >
> > You can then keep the candidate -> alias-set ID -> points-to
> > relation (thus candidates should not be "all variables" for
> > efficiency reasons).
>
> I think I can use a relatively simple heuristic: start by looking at
> malloc statements and obtain the alias-sets for variables which hold
> malloc's return values. This should address most efficiency concerns.
>
> So, I'm thinking about the following:
>
> * obtain variables which hold the result of malloc. These are the
> initial candidates.

... those would be the is_heapvar ones.  Since you can probably
only handle the case where all pointers either only point to a
single allocation sites result and nothing else or not to it that case
looks special and thus easy anyway.

> * for initial candidates compute alias-sets as bitmaps where only "real"
> decl UIDs are set. Compute this just before the end of IPA-PTA.
> * for each alias_set:
>  for each alias:
>map[alias->decl] = alias_set
> * Use this map and the alias-sets bitmaps in pass just after IPA-PTA.
> * Potentially use something similar to get_tree_with_uid but that is
> only valid for trees which are keys in the map.

Hmm, isn't this more than you need?  Given a set of candidates C
you try to form alias-sets so that all members of an alias set A
are members of C, they are layout-compatible and not member
of another alias-set.  Plus no member escapes and all pointers
you can track may only point to a subset of a single alias-set.

>From the above C is what constrains the size of your sets and
the mapping.

> Does this sound reasonable?
>
> >
> > Richard.
> >
> >>  Jakub
> >>


Re: Is there a way to look for a tree by its UID?

2020-09-04 Thread Erick Ochoa




On 04/09/2020 15:19, Richard Biener wrote:

On Fri, Sep 4, 2020 at 10:13 AM Erick Ochoa
 wrote:




On 03/09/2020 12:19, Richard Biener wrote:

On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc  wrote:


On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote:

So, I am just wondering is there an interface where I could do something
like:

```
  // vars is the field in pt_solution of type bitmap
  EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi)
  {
 // uid is set
 tree pointed_to = get_tree_with_uid(uid);
  }
```


There is not.


And there cannot be since the solution includes UIDs of
decls that are "fake" and thus never really existed.


Hi Richard and Jakub,

thanks, I was looking for why get_tree_with_uid might be a somewhat bad
idea.

I am thinking about representing an alias set similarly to the
pt_solution. Instead of having bits set in position of points-to
variables UIDs, I was thinking about having bits set in position of
may-alias variables' UIDs. I think an interface similar to the one I
described can provide a good mechanism of jumping to different aliases,
but I do agree that HEAP variables and shadow_variables (and perhaps
other fake variables) might not be a good idea to keep in the interface
to avoid jumping to trees which do not represent something in gimple.

Richard, you mentioned in another e-mail that I might want to provide
the alias-sets from IPA-PTA to another pass in a similar way to
ipa_escape_pt. I think using a structure similar to pt_solution for
may-alias solution is a good idea. Again, the bitmap to aliasing
variables in UIDs. However, I think for this solution to be general I
need several of these structs not just one. Ideally one per candidate
alias-set, at most one per variable.


Sure, you need one per alias-set.  Indeed you might want to work
with bitmaps of varinfo IDs first when computing alias-sets
since ...


Yes, that's what I've been doing :)



I think you need to first get a set of candidates you want to
transform (to limit the work done below), then use the
internal points-to solutions and compute alias sets for this
set plus the points-to solution this alias-set aliases. >
You can then keep the candidate -> alias-set ID -> points-to
relation (thus candidates should not be "all variables" for
efficiency reasons).


I think I can use a relatively simple heuristic: start by looking at
malloc statements and obtain the alias-sets for variables which hold
malloc's return values. This should address most efficiency concerns.

So, I'm thinking about the following:

* obtain variables which hold the result of malloc. These are the
initial candidates.


... those would be the is_heapvar ones.  Since you can probably
only handle the case where all pointers either only point to a
single allocation sites result and nothing else or not to it that case
looks special and thus easy anyway.


I did a git grep and is_heapvar is gone. But, I believe that I still 
collect these variables as quickly as possible. I iterate over the call 
graph and if I find malloc, then I just look at the callers and collect 
the lhs. This lhs corresponds to the "decl" in the varinfo_t struct.


I then just iterate over the variables in varmap to find matching lhs 
with the decl and computing alias sets by looking at the intersection of 
pt_solution. This seems to work well. I still need to find out whether 
they escape, but it should be simple to do so from here.





* for initial candidates compute alias-sets as bitmaps where only "real"
decl UIDs are set. Compute this just before the end of IPA-PTA.
* for each alias_set:
  for each alias:
map[alias->decl] = alias_set
* Use this map and the alias-sets bitmaps in pass just after IPA-PTA.
* Potentially use something similar to get_tree_with_uid but that is
only valid for trees which are keys in the map.


Hmm, isn't this more than you need?  Given a set of candidates C
you try to form alias-sets so that all members of an alias set A
are members of C, they are layout-compatible and not member
of another alias-set.  Plus no member escapes and all pointers
you can track may only point to a subset of a single alias-set.


Yes? I'm not really sure what you are trying to say here. Can you please 
elaborate more on "this" in the sentence "isn't this more than you need?".


I think I need a way that given some tree (which I know is in 
candidates) I can refer to its alias set. Since the given set of 
candidates might have more than 1 opportunity for transformation (i.e. 
computing the alias-sets for all c in C yields two distinct alias sets 
A_0 = {c_0, c_1, ... c_x}, A_1 = {c_x+1, ... c_y} ) I need a way of 
referring to these distinct alias sets. Here, I was hoping to refer to 
alias sets by the members of the sets themselves.


c_0 -> A_0
c_1 -> A_0
...
c_x+1 -> A_1
c_x+2 -> A_1

The get_tree_with_uid interface is more of a quick way to find out if 
they are layout compatible. Because if we have a bitm

gcc-9-20200904 is now available

2020-09-04 Thread GCC Administrator via Gcc
Snapshot gcc-9-20200904 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/9-20200904/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 
revision 5371ab207594ae2ef4c5223c2adae88b7a27b76b

You'll find:

 gcc-9-20200904.tar.xzComplete GCC

  SHA256=27e7479857a3ed45f8db6e470f1a6398db3a7a0bb058d2b7f28797e276126d08
  SHA1=dac2912c3130cb98b81d06a88f55e9b3bb745c41

Diffs from 9-20200828 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.