On 04/28/2016 06:08 PM, Patrick Palka wrote:


The glitch in that plan is there is no easy linkage between the use of b_5
in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew and myself
are looking at independently for some of Aldy's work.

I see.. One other deficiency I noticed in the existing threading code
is that there may have been multiple ASSERT_EXPRs registered for b_5,
so bb3 could look like

    <bb 3>:
    b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>;
    b_16 = ASSERT_EXPR <b_15, b_15 != 1>;
    foo ();

but currently we don't consider the 2nd ASSERT_EXPR because we only
look at the immediate uses of b_5.  This oversight makes us fail to
thread

    void bar (void);
    void baz (void);

    void
    foo (int a)
    {
      if (a != 5 && a != 10)
        bar ();
      if (a == 10)
        baz ();
    }
Can you file this as a BZ please and add me to the cc list. I'll probably add Andrew as well since this is great example of something we'd like to catch with his work. Thanks.




In this specific instance, there's a good chance your analysis is catching
something earlier and allowing it to be better simplified.  But let's do the
analysis to make sure.

From what I can tell, the patch does cause fewer conditionals to get
executed in general.  I spot two extra jumps that are threaded in the
final code compared to without the patch.  I wouldn't trust my
analysis though!
I'll walk through it.


By the way, the test case ssa-thread-11.c is somewhat buggy since its
two functions lack return statements.  Also I would expect abort() to
have the noreturn attribute.
Those testcases are heavily reduced and ultimately are useful only to show cases where jump threading ought to happen -- there's all kinds of undefined behaviour in those tests. Their only purpose is to set up a CFG and conditionals in which jump threading should be happening and wasn't at some point or another.


That makes sense!  I will play around with this technique.
Aside from the time, the biggest problem is ASLR and the ld.so hashing bits which cause slight variations from one run to the next. Otherwise it is highly stable, results are independent of the runtime load on the machine and measure the primary effect I'm usually searching for. All excellent properties :-)

For your patch the reduction in runtime branches is tiny, on the order of 0.01%, but clearly still visible and well outside the typical noise.

What is far more interesting is its overall effect on total instructions executed. Typically for each runtime branch eliminated I see 2-4 total instruction fetches eliminated. Which makes sense if you think about it -- the branch usually has a comparison and perhaps some setup code which becomes dead as a result of threading the jump.

Your patch eliminates 8.5 instruction fetches per branch eliminated, which is very good, in fact, it's by far the highest ratio I can recall ever seeing. So essentially while it doesn't fire often, when it fires it's a much bigger win than most jump threading cases.

Jeff

Reply via email to