Re: replacing the backwards threader and more

2021-06-25 Thread Richard Biener via Gcc
On Thu, Jun 24, 2021 at 6:14 PM Jeff Law  wrote:
>
>
>
> On 6/21/2021 8:40 AM, Aldy Hernandez wrote:
> >
> >
> > On 6/9/21 2:09 PM, Richard Biener wrote:
> >> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc
> >>  wrote:
> >>>
> >>> Hi Jeff.  Hi folks.
> >>>
> >>> What started as a foray into severing the old (forward) threader's
> >>> dependency on evrp, turned into a rewrite of the backwards threader
> >>> code.  I'd like to discuss the possibility of replacing the current
> >>> backwards threader with a new one that gets far more threads and can
> >>> potentially subsume all threaders in the future.
> >>>
> >>> I won't include code here, as it will just detract from the high level
> >>> discussion.  But if it helps, I could post what I have, which just
> >>> needs
> >>> some cleanups and porting to the latest trunk changes Andrew has made.
> >>>
> >>> Currently the backwards threader works by traversing DEF chains through
> >>> PHIs leading to possible paths that start in a constant.  When such a
> >>> path is found, it is checked to see if it is profitable, and if so, the
> >>> constant path is threaded.  The current implementation is rather
> >>> limited
> >>> since backwards paths must end in a constant.  For example, the
> >>> backwards threader can't get any of the tests in
> >>> gcc.dg/tree-ssa/ssa-thread-14.c:
> >>>
> >>> if (a && b)
> >>>   foo ();
> >>> if (!b && c)
> >>>   bar ();
> >>>
> >>> etc.
> >>>
> >>> After my refactoring patches to the threading code, it is now possible
> >>> to drop in an alternate implementation that shares the profitability
> >>> code (is this path profitable?), the jump registry, and the actual jump
> >>> threading code.  I have leveraged this to write a ranger-based threader
> >>> that gets every single thread the current code gets, plus 90-130% more.
> >>>
> >>> Here are the details from the branch, which should be very similar to
> >>> trunk.  I'm presenting the branch numbers because they contain Andrew's
> >>> upcoming relational query which significantly juices up the results.
> >>>
> >>> New threader:
> >>>ethread:65043(+3.06%)
> >>>dom:32450  (-13.3%)
> >>>backwards threader:72482   (+89.6%)
> >>>vrp:40532  (-30.7%)
> >>> Total threaded:  210507 (+6.70%)
> >>>
> >>> This means that the new code gets 89.6% more jump threading
> >>> opportunities than the code I want to replace.  In doing so, it reduces
> >>> the amount of DOM threading opportunities by 13.3% and by 30.7% from
> >>> the
> >>> VRP jump threader.  The total  improvement across the jump threading
> >>> opportunities in the compiler is 6.70%.
> >>>
> >>> However, these are pessimistic numbers...
> >>>
> >>> I have noticed that some of the threading opportunities that DOM and
> >>> VRP
> >>> now get are not because they're smarter, but because they're picking up
> >>> opportunities that the new code exposes.  I experimented with
> >>> running an
> >>> iterative threader, and then seeing what VRP and DOM could actually
> >>> get.
> >>>This is too expensive to do in real life, but it at least shows what
> >>> the effect of the new code is on DOM/VRP's abilities:
> >>>
> >>> Iterative threader:
> >>>   ethread:65043(+3.06%)
> >>>   dom:31170(-16.7%)
> >>>   thread:86717(+127%)
> >>>   vrp:33851(-42.2%)
> >>> Total threaded:  216781 (+9.90%)
> >>>
> >>> This means that the new code not only gets 127% more cases, but it
> >>> reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
> >>> respectively).   The end result is that we have the possibility of
> >>> getting almost 10% more jump threading opportunities in the entire
> >>> compilation run.
> >>
> >> Yeah, DOM once was iterating ...
> >>
> >> You probably have noticed that we have very man (way too many)
> >> 'thread' passes, often in close succession with each other or
> >> DOM or VRP.  So in the above numbers I wonder if you can break
> >> down the numbers individually for the actual passes (in their order)?
> >
> > As promised.
> >
> > *** LEGACY:
> > ethread42:61152 30.1369% (61152 threads for 30.1% of total)
> > thread117:29646 14.6101%
> > vrp118:62088 30.5982%
> > thread132:2232 1.09997%
> > dom133:31116 15.3346%
> > thread197:1950 0.960998%
> > dom198:10661 5.25395%
> > thread200:587 0.289285%
> > vrp201:3482 1.716%
> > Total:  202914
>
> > The above is from current trunk with my patches applied, defaulting to
> > legacy mode.  It follows the pass number nomenclature in the
> > *.statistics files.
> >
> > New threader code (This is what I envision current trunk to look with
> > my patchset):
> >
> > *** RANGER:
> > ethread42:64389 30.2242%
> > thread117:49449 23.2114%
> > vrp118:46118 21.6478%
> > thread132:8153 3.82702%
> > dom133:27168 12.7527%
> > thread197:5542 2.60141%
> > dom198:8191 3.84485%
> > thread200:1038 0.487237%
> > vrp201:2990 1.40351%
> > Total:  213038
> So this makes me th

Re: replacing the backwards threader and more

2021-06-25 Thread Richard Biener via Gcc
On Fri, Jun 25, 2021 at 9:54 AM Richard Biener
 wrote:
>
> On Thu, Jun 24, 2021 at 6:14 PM Jeff Law  wrote:
> >
> >
> >
> > On 6/21/2021 8:40 AM, Aldy Hernandez wrote:
> > >
> > >
> > > On 6/9/21 2:09 PM, Richard Biener wrote:
> > >> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc
> > >>  wrote:
> > >>>
> > >>> Hi Jeff.  Hi folks.
> > >>>
> > >>> What started as a foray into severing the old (forward) threader's
> > >>> dependency on evrp, turned into a rewrite of the backwards threader
> > >>> code.  I'd like to discuss the possibility of replacing the current
> > >>> backwards threader with a new one that gets far more threads and can
> > >>> potentially subsume all threaders in the future.
> > >>>
> > >>> I won't include code here, as it will just detract from the high level
> > >>> discussion.  But if it helps, I could post what I have, which just
> > >>> needs
> > >>> some cleanups and porting to the latest trunk changes Andrew has made.
> > >>>
> > >>> Currently the backwards threader works by traversing DEF chains through
> > >>> PHIs leading to possible paths that start in a constant.  When such a
> > >>> path is found, it is checked to see if it is profitable, and if so, the
> > >>> constant path is threaded.  The current implementation is rather
> > >>> limited
> > >>> since backwards paths must end in a constant.  For example, the
> > >>> backwards threader can't get any of the tests in
> > >>> gcc.dg/tree-ssa/ssa-thread-14.c:
> > >>>
> > >>> if (a && b)
> > >>>   foo ();
> > >>> if (!b && c)
> > >>>   bar ();
> > >>>
> > >>> etc.
> > >>>
> > >>> After my refactoring patches to the threading code, it is now possible
> > >>> to drop in an alternate implementation that shares the profitability
> > >>> code (is this path profitable?), the jump registry, and the actual jump
> > >>> threading code.  I have leveraged this to write a ranger-based threader
> > >>> that gets every single thread the current code gets, plus 90-130% more.
> > >>>
> > >>> Here are the details from the branch, which should be very similar to
> > >>> trunk.  I'm presenting the branch numbers because they contain Andrew's
> > >>> upcoming relational query which significantly juices up the results.
> > >>>
> > >>> New threader:
> > >>>ethread:65043(+3.06%)
> > >>>dom:32450  (-13.3%)
> > >>>backwards threader:72482   (+89.6%)
> > >>>vrp:40532  (-30.7%)
> > >>> Total threaded:  210507 (+6.70%)
> > >>>
> > >>> This means that the new code gets 89.6% more jump threading
> > >>> opportunities than the code I want to replace.  In doing so, it reduces
> > >>> the amount of DOM threading opportunities by 13.3% and by 30.7% from
> > >>> the
> > >>> VRP jump threader.  The total  improvement across the jump threading
> > >>> opportunities in the compiler is 6.70%.
> > >>>
> > >>> However, these are pessimistic numbers...
> > >>>
> > >>> I have noticed that some of the threading opportunities that DOM and
> > >>> VRP
> > >>> now get are not because they're smarter, but because they're picking up
> > >>> opportunities that the new code exposes.  I experimented with
> > >>> running an
> > >>> iterative threader, and then seeing what VRP and DOM could actually
> > >>> get.
> > >>>This is too expensive to do in real life, but it at least shows what
> > >>> the effect of the new code is on DOM/VRP's abilities:
> > >>>
> > >>> Iterative threader:
> > >>>   ethread:65043(+3.06%)
> > >>>   dom:31170(-16.7%)
> > >>>   thread:86717(+127%)
> > >>>   vrp:33851(-42.2%)
> > >>> Total threaded:  216781 (+9.90%)
> > >>>
> > >>> This means that the new code not only gets 127% more cases, but it
> > >>> reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
> > >>> respectively).   The end result is that we have the possibility of
> > >>> getting almost 10% more jump threading opportunities in the entire
> > >>> compilation run.
> > >>
> > >> Yeah, DOM once was iterating ...
> > >>
> > >> You probably have noticed that we have very man (way too many)
> > >> 'thread' passes, often in close succession with each other or
> > >> DOM or VRP.  So in the above numbers I wonder if you can break
> > >> down the numbers individually for the actual passes (in their order)?
> > >
> > > As promised.
> > >
> > > *** LEGACY:
> > > ethread42:61152 30.1369% (61152 threads for 30.1% of total)
> > > thread117:29646 14.6101%
> > > vrp118:62088 30.5982%
> > > thread132:2232 1.09997%
> > > dom133:31116 15.3346%
> > > thread197:1950 0.960998%
> > > dom198:10661 5.25395%
> > > thread200:587 0.289285%
> > > vrp201:3482 1.716%
> > > Total:  202914
> >
> > > The above is from current trunk with my patches applied, defaulting to
> > > legacy mode.  It follows the pass number nomenclature in the
> > > *.statistics files.
> > >
> > > New threader code (This is what I envision current trunk to look with
> > > my patchset):
> > >
> > > *** 

Re: GCC documentation: porting to Sphinx

2021-06-25 Thread Martin Liška

Hello.

I've got something that is very close to be a patch candidate that can be
eventually merged. Right now, the patches are available here:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/sphinx-v3

Changes since last version:

- gdc manual was ported
- 'make doc' works fine both with and w/o installed sphinx-build
- 'make pdf' and 'make html' works fine
- libgccjit was ported to the shared Makefile and configuration
- likewise for 3 existing Ada manuals
- .texi files are removed

List of known issues (planned to be fixed after merging):
- cross manual references are not working
- update_web_docs_git needs to be changed - will simplify rapidly
- Sphinx warnings should be addressed
- remove texinfo references in Manuals
- list package requirements for Sphinx manual generation

I'm looking forward to a feedback.
Thanks,
Martin


Re: GCC documentation: porting to Sphinx

2021-06-25 Thread Martin Liška

On 6/25/21 3:11 PM, Martin Liška wrote:

List of known issues (planned to be fixed after merging):


I forgot about:

- diagnostics URL (for e.g. warnings) needs to be adjusted

Martin


Does GCC have __gnuc_literal_encoding__ macro defined?

2021-06-25 Thread sotrdg sotrdg via Gcc
I just realized clang has this but clang does not support -fexec-charset.

Clang Language Extensions — Clang 13 documentation 
(llvm.org)


__clang_literal_encoding__
Defined to a narrow string literal that represents the current encoding of 
narrow string literals, e.g., "hello". This macro typically expands to “UTF-8” 
(but may change in the future if the -fexec-charset="Encoding-Name" option is 
implemented.)
__clang_wide_literal_encoding__
Defined to a narrow string literal that represents the current encoding of wide 
string literals, e.g., L"hello". This macro typically expands to “UTF-16” or 
“UTF-32” (but may change in the future if the 
-fwide-exec-charset="Encoding-Name" option is implemented.)


I think we probably need macros like
__gnuc_source_encoding__
__gnuc_literal_encoding__
__gnuc_wide_literal_encoding__
__gnuc_literal_encoding_is_ascii_based__
__gnuc_wide_literal_encoding_is_ascii_based__
In GCC

Sent from Mail for Windows 10



Re: daily report on extending static analyzer project [GSoC]

2021-06-25 Thread Ankur Saini via Gcc
AIM for today : 

- try to create an intra-procedural link between the calls the calling and 
returning snodes
- figure out the program point where exploded graph would know about the 
function calls
- figure out how the exploded node will know which function to call
- create enodes and eedges for the calls

—

PROGRESS :

- I created an intraprocedural link between where the the splitting is 
happening to connect the call and returning snodes. like this :-

(in supergraph.cc at "supergraph::supergraph (logger *logger)" )
```
185 if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
186 {
187m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
188node_for_stmts = add_node (fun, bb, as_a  (stmt), 
NULL);
189m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
190 }
191 else
192 {
193   gcall *call = dyn_cast (stmt);
194   if (call)
195   {
196 supernode *old_node_for_stmts = node_for_stmts;
197 node_for_stmts = add_node (fun, bb, as_a  (stmt), 
NULL);
198
199 superedge *sedge = new callgraph_superedge 
(old_node_for_stmts,
200 node_for_stmts,
201 SUPEREDGE_INTRAPROCEDURAL_CALL,
202 NULL);
203 add_edge (sedge);
204   }
205 }
```

- now that we have a intraprocedural link between such calls, and the analyzer 
will consider them as “impossible edge” ( whenever a "node->on_edge()” returns 
false ) while processing worklist, and I think this should be the correct place 
to speculate about the function call by creating exploded nodes and edges 
representing calls ( maybe by adding a custom edge info ).

- after several of failed attempts to do as mentioned above, looks like I was 
looking the wrong way all along. I think I just found out what my mentor meant 
when telling me to look into "calls node->on_edge”. During the edge inspection 
( in program_point::on_edge() ) , if it’s an Intraprocedural s sedge, maybe I 
can add an extra intraprocedural sedge to the correct edge right here with the 
info state of that program point. 

Q. But even if we find out which function to call, how will the analyzer know 
which snode does that function belong ?

Q. on line 461 of program-point.cc 

```
457 else
458   {
459 /* Otherwise, we ignore these edges  */
460 if (logger)
461   logger->log ("rejecting interprocedural edge");
462 return false;
463   }
```
why are we rejecting “interprocedural" edge when we are examining an 
“intraprocedural” edge ? or is it for the "cg_sedge->m_cedge” edge, which is an 
interprocedural edge ?

STATUS AT THE END OF THE DAY :- 

- try to create an intra-procedural link between the calls the calling and 
returning snodes ( Done )
- figure out the program point where exploded graph would know about the 
function calls ( Done )
- figure out how the exploded node will know which function to call ( Pending )
- create enodes and eedges for the calls ( Pending )


Thank you
- Ankur

> On 25-Jun-2021, at 2:23 AM, David Malcolm  wrote:
> 
> On Thu, 2021-06-24 at 19:59 +0530, Ankur Saini wrote:
>> CURRENT STATUS :
>> 
>> analyzer is now splitting nodes even at call sites which doesn’t have
>> a cgraph_edge. But as now the call and return nodes are not
>> connected, the part of the function after such calls becomes
>> unreachable making them impossible to properly analyse.
>> 
>> AIM for today : 
>> 
>> - try to create an intra-procedural link between the calls the
>> calling and returning snodes 
>> - find the place where the exploded nodes and edges are being formed 
>> - figure out the program point where exploded graph would know about
>> the function calls
>> 
>> —
>> 
>> PROGRESS :
>> 
>> - I initially tried to connect the calling and returning snodes with
>> an intraprocedural sedge but looks like for that only nodes which
>> have a cgraph_edge or a CFG edge are connected in the supergraph. I
>> tried a few ways to connect them but at the end thought I would be
>> better off leaving them like this and connecting them during the
>> creation of exploded graph itself.
>> 
>> - As the exploded graph is created during building and processing of
>> the worklist, "build_initial_worklist ()” and “process_worklist()”
>> should be the interesting areas to analyse, especially the processing
>> part.
>> 
>> - “build_initial_worklist()” is just creating enodes for functions
>> that can be called explicitly ( possible entry points ) so I guess
>> the better place to investigate is “process_worklist ()” function.
> 
> Yes.
> 
> Have a look at exploded_graph::process_node (which is called by
> process_worklist).
> The eedges for calls with supergraph edges happens there in
> the "case PK_AFT

Re: daily report on extending static analyzer project [GSoC]

2021-06-25 Thread David Malcolm via Gcc
On Fri, 2021-06-25 at 20:33 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - try to create an intra-procedural link between the calls the calling
> and returning snodes
> - figure out the program point where exploded graph would know about
> the function calls
> - figure out how the exploded node will know which function to call
> - create enodes and eedges for the calls
> 
> —
> 
> PROGRESS :
> 
> - I created an intraprocedural link between where the the splitting is 
> happening to connect the call and returning snodes. like this :-
> 
> (in supergraph.cc at "supergraph::supergraph (logger *logger)" )
> ```
> 185 if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> 186 {
> 187    m_cgraph_edge_to_caller_prev_node.put(edge, 
> node_for_stmts);
> 188    node_for_stmts = add_node (fun, bb, as_a  (stmt), 
> NULL);
> 189    m_cgraph_edge_to_caller_next_node.put (edge, 
> node_for_stmts);
> 190 }
> 191 else
> 192 {
> 193   gcall *call = dyn_cast (stmt);
> 194   if (call)
> 195   {
> 196 supernode *old_node_for_stmts = node_for_stmts;
> 197 node_for_stmts = add_node (fun, bb, as_a  
> (stmt), NULL);
  ^
Given the dyn_cast of stmt to gcall * at line 193 you can use "call"
here, without the as_a cast, as you've already got "stmt" as a gcall *
as tline 193.

You might need to add a hash_map recording the mapping from such stmts
to the edges, like line 189 does.  I'm not sure, but you may need it
later.


> 198
> 199 superedge *sedge = new callgraph_superedge 
> (old_node_for_stmts,
> 200 node_for_stmts,
> 201 SUPEREDGE_INTRAPROCEDURAL_CALL,
> 202 NULL);
> 203 add_edge (sedge);
> 204   }    
> 205 }
> ```
> 
> - now that we have a intraprocedural link between such calls, and the
> analyzer will consider them as “impossible edge” ( whenever a "node-
> >on_edge()” returns false ) while processing worklist, and I think this
> should be the correct place to speculate about the function call by
> creating exploded nodes and edges representing calls ( maybe by adding
> a custom edge info ).
> 
> - after several of failed attempts to do as mentioned above, looks like
> I was looking the wrong way all along. I think I just found out what my
> mentor meant when telling me to look into "calls node->on_edge”. During
> the edge inspection ( in program_point::on_edge() ) , if it’s an
> Intraprocedural s sedge, maybe I can add an extra intraprocedural sedge
> to the correct edge right here with the info state of that program
> point. 

I don't think we need a superedge for such a call, just an
exploded_edge.  (Though perhaps adding a superedge might make things
easier?  I'm not sure, but I'd first try not bothering to add one)

> 
> Q. But even if we find out which function to call, how will the
> analyzer know which snode does that function belong ?

Use this method of supergraph:
  supernode *get_node_for_function_entry (function *fun) const;
to get the supernode for the entrypoint of a given function.

You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.

> Q. on line 461 of program-point.cc 
> 
> ```
> 457 else
> 458   {
> 459 /* Otherwise, we ignore these edges  */
> 460 if (logger)
> 461   logger->log ("rejecting interprocedural edge");
> 462 return false;
> 463   }
> ```
> why are we rejecting “interprocedural" edge when we are examining an
> “intraprocedural” edge ? or is it for the "cg_sedge->m_cedge” edge,
> which is an interprocedural edge ?

Currently, those interprocedural edges don't do much.  Above the "else"
clause of the lines above the ones you quote is some support for call
summaries.

The idea is that we ought to be able to compute summaries of what a
function call does, and avoid exponential explosions during the
analysis by reusing summaries at a callsite.  But that code doesn't
work well at the moment; see:
  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=99390

If you ignore call summaries for now, I think you need to change this
logic so it detects if we have a function pointer that we "know" the
value of from the region_model, and have it generate an exploded_node
and exploded_edge for the call.  Have a look at how SUPEREDGE_CALL is
handled by program_state and program_point; you should implement
something similar, I think.  Given that you need both the super_edge,
point *and* state all together to detect this case, I think the logic
you need to add probably needs to be in exploded_node::on_edge as a
specialcase before the call there to next_point->on_edge.

Hope this is helpful
Dave


> 
> STATUS AT THE END OF T

Re: replacing the backwards threader and more

2021-06-25 Thread Aldy Hernandez via Gcc

Hi folks.

I'm done with benchmarking, testing and cleanups, so I'd like to post my 
patchset for review.  However, before doing so, I'd like to address a 
handful of meta-issues that may affect how I post these patches.


Trapping on differences
===

Originally I wanted to contribute verification code that would trap if 
the legacy code threaded any edges the new code couldn't (to be removed 
after a week).  However, after having tested on various architectures 
and only running once into a missing thread, I'm leaning towards 
omitting the verification code, since it's fragile, time consuming, and 
quite hacky.


For the record, I have tested on x86-64, aarch64, ppc64 and ppc64le. 
There is only one case, across bootstrap and regression tests where the 
verification code is ever tripped (discussed below).


Performance
===

I re-ran benchmarks as per our callgrind suite, and the penalty with the 
current pipeline is 1.55% of overall compilation time.  As is being 
discussed, we should be able to mitigate this significantly by removing 
other threading passes.


Failing testcases
=

I have yet to run into incorrect code being generated, but I have had to 
tweak a considerable number of tests.  I have verified every single 
discrepancy and documented my changes in the testsuite when it merited 
doing so.  However, there are a couple tests that trigger regressions 
and I'd like to ask for guidance on how to address them.


1. gcc.c-torture/compile/pr83510.c

I would like to XFAIL this.

What happens here is that thread1 threads a switch statement such that 
the various cases have been split into different independent blocks. 
One of these blocks exposes an arr[i_27] access which is later 
propagated by VRP to be arr[10].  This is an invalid access, but the 
array bounds code doesn't know it is an unreachable path.


However, it is not until dom2 that we "know" that the value of the 
switch index is such that the path to arr[10] is unreachable.  For that 
matter, it is not until dom3 that we remove the unreachable path.


2. -Wfree-nonheap-object

This warning is triggered while cleaning up an auto_vec.  I see that the 
va_heap::release() inline is wrapped with a pragma ignore 
"-Wfree-nonheap-object", but this is not sufficient because jump 
threading may alter uses in such a way that may_emit_free_warning() will 
warn on the *inlined* location, thus bypassing the pragma.


I worked around this with a mere:

> @@ -13839,6 +13839,7 @@ maybe_emit_free_warning (tree exp)

   location_t loc = tree_inlined_location (exp);
+  loc = EXPR_LOCATION (exp);


but this causes a ton of Wfree-nonheap* tests to fail.  I think someone 
more knowledgeable should address this (msebor??).


3. uninit-pred-9_b.c

The uninit code is getting confused with the threading and the bogus 
warning in line 24 is back.  I looked at the thread, and it is correct.


I'm afraid all these warnings are quite fragile in the presence of more 
aggressive optimizations, and I suspect it will only get worse.


4. libphobos/src/std/net/isemail.d

This is a D test where we don't actually fail, but we trigger the 
verification code.  It is the only jump threading edge that the new code 
fails to get over the old code, and it only happens on ppc64.


It triggers because a BB4 -> BB5 is too expensive to thread, but a BBn 
-> BB3 -> BB4 -> BB5 is considered safe to thread because BB3 is a latch 
and it alters the profitability equation.  The reason we don't get it, 
is that we assume that if a X->Y is unprofitable, it is not worth 
looking at W->X->Y and so forth.


Jeff had some fancy ideas on how to attack this.  Once such idea was to 
stop looking back, but only for things we were absolutely sure would 
never yield a profitable path.  I tried a subset of this, by allowing 
further looks on this latch test, but my 1.55% overall performance 
penalty turned into an 8.33% penalty.  Personally it looks way too 
expensive for this one isolated case.  Besides, the test where this 
clamping code originally came from still succeeds (commit 
eab2541b860c48203115ac6dca3284e982015d2c).


CONCLUSION
==

That's basically it.

If we agree the above things are not big issues, or can be addressed as 
follow-ups, I'd like to start the ball rolling on the new threader. 
This would allow more extensive testing of the code, and separate it a 
bit from the other big changes coming up :).


Aldy



Re: replacing the backwards threader and more

2021-06-25 Thread Martin Sebor via Gcc

On 6/25/21 10:20 AM, Aldy Hernandez via Gcc wrote:

Hi folks.

I'm done with benchmarking, testing and cleanups, so I'd like to post my 
patchset for review.  However, before doing so, I'd like to address a 
handful of meta-issues that may affect how I post these patches.


Trapping on differences
===

Originally I wanted to contribute verification code that would trap if 
the legacy code threaded any edges the new code couldn't (to be removed 
after a week).  However, after having tested on various architectures 
and only running once into a missing thread, I'm leaning towards 
omitting the verification code, since it's fragile, time consuming, and 
quite hacky.


For the record, I have tested on x86-64, aarch64, ppc64 and ppc64le. 
There is only one case, across bootstrap and regression tests where the 
verification code is ever tripped (discussed below).


Performance
===

I re-ran benchmarks as per our callgrind suite, and the penalty with the 
current pipeline is 1.55% of overall compilation time.  As is being 
discussed, we should be able to mitigate this significantly by removing 
other threading passes.


Failing testcases
=

I have yet to run into incorrect code being generated, but I have had to 
tweak a considerable number of tests.  I have verified every single 
discrepancy and documented my changes in the testsuite when it merited 
doing so.  However, there are a couple tests that trigger regressions 
and I'd like to ask for guidance on how to address them.


1. gcc.c-torture/compile/pr83510.c

I would like to XFAIL this.

What happens here is that thread1 threads a switch statement such that 
the various cases have been split into different independent blocks. One 
of these blocks exposes an arr[i_27] access which is later propagated by 
VRP to be arr[10].  This is an invalid access, but the array bounds code 
doesn't know it is an unreachable path.


The test has a bunch of loops that iterate over the 10 array elements.
There have been bug reports about loop unrolling causing false positives
-Warray-bounds (e.g., PR 92539, 92110, or 86341) so this could be
the same issue.



However, it is not until dom2 that we "know" that the value of the 
switch index is such that the path to arr[10] is unreachable.  For that 
matter, it is not until dom3 that we remove the unreachable path.


If you do XFAIL it can you please isolate a small test case and open
a bug and make it a -Warray-bounds blocker?



2. -Wfree-nonheap-object

This warning is triggered while cleaning up an auto_vec.  I see that the 
va_heap::release() inline is wrapped with a pragma ignore 
"-Wfree-nonheap-object", but this is not sufficient because jump 
threading may alter uses in such a way that may_emit_free_warning() will 
warn on the *inlined* location, thus bypassing the pragma.


I worked around this with a mere:

 > @@ -13839,6 +13839,7 @@ maybe_emit_free_warning (tree exp)

   location_t loc = tree_inlined_location (exp);
+  loc = EXPR_LOCATION (exp);


but this causes a ton of Wfree-nonheap* tests to fail.  I think someone 
more knowledgeable should address this (msebor??).


This sounds like the same problem as PR 98871.  Does the patch below
fix it?
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572515.html
If so, I suggest getting that patch in first to avoid testsuite
failures.  If it doesn't fix it I'll look into it before you commit
your changes.



3. uninit-pred-9_b.c

The uninit code is getting confused with the threading and the bogus 
warning in line 24 is back.  I looked at the thread, and it is correct.


I'm afraid all these warnings are quite fragile in the presence of more 
aggressive optimizations, and I suspect it will only get worse.


From my recent review of open -Wmaybe-uninitialized bugs (and
the code) it does seem to be both fragile and getting worse.  I've
only found a few simple problems so far in the code but nothing that
would make a dramatic difference so I can't say if it's possible to
do much better, but I'm not done or ready to give up.  If you XFAIL
this too please open a bug for it and make it a blocker for
-Wuninitialized?

Martin



4. libphobos/src/std/net/isemail.d

This is a D test where we don't actually fail, but we trigger the 
verification code.  It is the only jump threading edge that the new code 
fails to get over the old code, and it only happens on ppc64.


It triggers because a BB4 -> BB5 is too expensive to thread, but a BBn 
-> BB3 -> BB4 -> BB5 is considered safe to thread because BB3 is a latch 
and it alters the profitability equation.  The reason we don't get it, 
is that we assume that if a X->Y is unprofitable, it is not worth 
looking at W->X->Y and so forth.


Jeff had some fancy ideas on how to attack this.  Once such idea was to 
stop looking back, but only for things we were absolutely sure would 
never yield a profitable path.  I tried a subset of this, by allowing 
further looks on this latch test, but my 1.55

Re: __fp16 is ambiguous error in C++

2021-06-25 Thread Jim Wilson
On Thu, Jun 24, 2021 at 7:26 PM ALO via Gcc  wrote:

> foo.c: In function '__fp16 foo(__fp16, __fp16)':
> foo.c:6:23: error: call of overloaded 'exp(__fp16&)' is ambiguous
> 6 | return a + std::exp(b);
> | ^
>

No, there isn't a solution for this.  You might want to try an ARM port
clang/gcc to see what they do, but it probably isn't much better than the
RISC-V port.  Looks like the same gcc result to me with a quick check.  And
note that only the non-upstream V extension branch for RISC-V has the
__fp16 support because the vector extension depends on it.  It is hard to
argue for changes when the official RISC-V GCC port has no __fp16 support.

Kito started a related thread in March, and there was tentative agreement
to add _Float16 support to the GCC C++ front end.
https://gcc.gnu.org/pipermail/gcc/2021-March/234971.html
That may or may not help you.

I think it will be difficult to do anything useful here until the C and C++
standards figure out how they want half-float support to work.  If we do
something before then, it will probably end up incompatible with the
official solution and we will end up stuck with a mess.

Jim


Re: Does GCC have __gnuc_literal_encoding__ macro defined?

2021-06-25 Thread Jonathan Wakely via Gcc
GCC defines __GNUC_EXECUTION_CHARSET_NAME and
__GNUC_WIDE_EXECUTION_CHARSET_NAME instead.

https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html#Common-Predefined-Macros



Re: __fp16 is ambiguous error in C++

2021-06-25 Thread Jonathan Wakely via Gcc
> foo.c:6:23: error: call of overloaded 'exp(__fp16&)' is ambiguous

__fp16 isn't ambiguous, calling std::exp with an argument of that type
is ambiguous, because the standard library doesn't provide an overload
for that type.

It could be added (probably defined to cast to float and use the
overload for float), but we would need to do it for every function in
. It's a simple matter of programming, but somebody needs to
do the work.



gcc-10-20210625 is now available

2021-06-25 Thread GCC Administrator via Gcc
Snapshot gcc-10-20210625 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20210625/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision f5a09fe4d8acfff287333c043d13f554b58fe96c

You'll find:

 gcc-10-20210625.tar.xz   Complete GCC

  SHA256=79763ed3c689f1d60b4958ab3dc8dafb0b3fb2be47f9bf74800c730d455d17b6
  SHA1=4597771c30f8e2dc63af22f6ef2c2ff9c5845e19

Diffs from 10-20210618 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.