On 12/11/2015 03:05 AM, Richard Biener wrote:
On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote:
On 12/03/2015 07:38 AM, Richard Biener wrote:
This pass is now enabled by default with -Os but has no limits on the
amount of
stmts it copies.
The more statements it copies, the more likely it is that the path spitting
will turn out to be useful! It's counter-intuitive.
Well, it's still not appropriate for -Os (nor -O2 I think). -ftracer is enabled
with -fprofile-use (but it is also properly driven to only trace hot paths)
and otherwise not by default at any optimization level.
Definitely not appropriate for -Os. But as I mentioned, I really want
to look at the tracer code as it may totally subsume path splitting.
Don't see how this would work for the CFG pattern it operates on unless you
duplicate the exit condition into that new block creating an even more
obfuscated CFG.
Agreed, I don't see any way to fix the multiple exit problem. Then
again, this all runs after the tree loop optimizer, so I'm not sure how
big of an issue it is in practice.
It was only after I approved this code after twiddling it for Ajit that I
came across Honza's tracer implementation, which may in fact be
retargettable to these loops and do a better job. I haven't experimented
with that.
Well, I originally suggested to merge this with the tracer pass...
I missed that, or it didn't sink into my brain.
Again, the more statements it copies the more likely it is to be profitable.
Think superblocks to expose CSE, DCE and the like.
Ok, so similar to tracer (where I think the main benefit is actually increasing
scheduling opportunities for architectures where it matters).
Right. They're both building superblocks, which has the effect of
larger windows for scheduling, DCE, CSE, etc.
Note that both passes are placed quite late and thus won't see much
of the GIMPLE optimizations (DOM mainly). I wonder why they were
not placed adjacent to each other.
Ajit had it fairly early, but that didn't play well with if-conversion.
I just pushed it past if-conversion and vectorization, but before the
last DOM pass. That turns out to be where tracer lives too as you noted.
I wouldn't lose any sleep if we disabled by default or removed, particularly
if we can repurpose Honza's code. In fact, I might strongly support the
former until we hear back from Ajit on performance data.
See above for what we do with -ftracer. path-splitting should at _least_
restrict itself to operate on optimize_loop_for_speed_p () loops.
I think we need to decide if we want the code at all, particularly given
the multiple-exit problem.
The difficulty is I think Ajit posted some recent data that shows it's
helping. So maybe the thing to do is ask Ajit to try the tracer
independent of path splitting and take the obvious actions based on
Ajit's data.
It should also (even if counter-intuitive) limit the amount of stmt copying
it does - after all there is sth like an instruction cache size which exceeeding
for loops will never be a good idea (and even smaller special loop caches on
some archs).
Yup.
Note that a better heuristic than "at least more than one stmt" would be
to have at least one PHI in the merger block. Otherwise I don't see how
CSE opportunities could exist we don't see without the duplication.
And yes, more PHIs -> more possible CSE. I wouldn't say so for
the number of stmts. So please limit the number of stmt copies!
(after all we do limit the number of stmts we copy during jump threading!)
Let's get some more data before we try to tune path splitting. In an
ideal world, the tracer can handle this for us and we just remove path
splitting completely.
Jeff