On Thu, Nov 6, 2014 at 7:08 PM, Pat Haugen <pthau...@linux.vnet.ibm.com> wrote:
> It appears to me that both loop invariant motion passes (tree/rtl) don't
> look at basic block frequencies and will gladly hoist invariant code from a
> cold block within a loop. This can impact performance by executing (possibly
> costly) code that would otherwise not be executed, adds another register
> that is live through the loop, and possibly increases number of callee-saved
> registers used in the procedure (which require save/restore) if there is a
> call in the loop. Following simplified test was patterned after code
> observed in Boost library, which apparently has a fair number of asserts
> sprinkled throughout.
>
> volatile int x;
> void assert_fail(int, char *, char *);
> void foo(int *a, int n, int k) {
>   int i;
>
>   for(i=0; i<n; i++) {
>     if (__builtin_expect(x,0))
>       assert_fail(k / 5, "one", "two");
>     a[i] = k;
>   }
> }
>
> Compiling the above with -O2, tree-LIM (tree-ssa-loop-im.c) will yank the
> 'k/5' expression out of the loop and rtl-LIM (loop-invariant.c) will yank
> the address computations for the 2 strings out of the loop. The default
> probability for __builtin_expect is 90%, but even adding --param
> builtin-expect-probability=100 to the compile such that the block containing
> the call looks to never be executed still results in invariant hoisting.
>
> A simple change like the following prevents the hoisting in the rtl pass, I
> assume something similar could also be added to the tree pass. But I'd like
> to hear others thoughts on whether this is an acceptable approach/issue that
> should be fixed. The code below prevents motion if the block containing it
> is executed <= 10% of the time the loop header is. Is there a better
> approach there instead of another magic number? Should it be based on
> PARAM_VALUE (BUILTIN_EXPECT_PROBABILITY)? Basing on
> BUILTIN_EXPECT_PROBABILITY seems to make sense to me for the above case, but
> what about the more general case where __builtin_expect() isn't used and
> branch probabilities are guessed based on other heurstics?

Shouldn't we never hoist anything from a bb with lower execution frequency
to a bb with higher one?  It seems LIM simply assumes that inside a loop
is always higher frequency than outside of it.

So - why artificially have that factor of 0.1 instead of simply checking for
less than?

Btw, we also hoist through multiple loop nests so even an unlikely
branch in the innermost loop may execute more times than the outermost
loop entry.   So I believe the change has to be made somewhere else,
like in invariantness_dom_walker::before_dom_children.

Btw, for your example with calling a noreturn function in the loop exit
destination the block containing the k/5 shouldn't be in the loop at all.

I wonder why they are not using __attribute__((noreturn)) in which case
they'd even get better profiles can can get rid of the __builtin_expect?

Richard.

> Index: loop-invariant.c
> ===================================================================
> --- loop-invariant.c    (revision 216843)
> +++ loop-invariant.c    (working copy)
> @@ -1000,6 +1000,11 @@ find_invariants_bb (basic_block bb, bool
>  {
>    rtx_insn *insn;
>
> +
> +  /* Don't move invariants from cold block inside the loop. */
> +  if (bb->frequency <= (bb->loop_father->header->frequency / 10) )
> +    return;
> +
>    FOR_BB_INSNS (bb, insn)
>      {
>        if (!NONDEBUG_INSN_P (insn))
>
>
> Thanks for any feedback,
> Pat
>
>
>

Reply via email to