https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org
           Keywords|EH                          |

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we fail to sink

 d_29 = {t_28, t_28, t_28 t_28};

we compute a good place in select_best_block but then since it is at the
same loop depth as the original place we apply

  /* If BEST_BB is at the same nesting level, then require it to have
     significantly lower execution frequency to avoid gratuitous movement.  */
  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
      /* If result of comparsion is unknown, prefer EARLY_BB.
         Thus use !(...>=..) rather than (...<...)  */
      && !(best_bb->count * 100 >= early_bb->count * threshold))
    return best_bb;

and fail to sink.  I'm not exactly sure why we do the above - we probably
should when best_bb post-dominates early_bb, also if the sunk stmt
possibly (or provably) will enlarge lifetime of its uses (but that's also
hard to guess since we process sinking of the defs of the uses only
afterwards).  In this case we have a single use and a single def so
sinking shouldn't make things worse.  We could also weight in
spilling class of a reg here.

In our case we have the dominated block with a higher(!) count than
the dominating block which means the profile is corrupt.

With --param sink-frequency-threshold we sink the ctor and the feeding
division but still get

.L5:
        movq    (%rbx), %rax
        pxor    %xmm1, %xmm1
        leaq    0(%rbp,%rax), %rdx
        .p2align 4,,10
        .p2align 3
.L4:
        movaps  (%rsp), %xmm0
        addps   (%rax), %xmm0
        addq    $16, %rax
        movaps  %xmm0, -16(%rax)
        addps   %xmm0, %xmm1
        cmpq    %rax, %rdx
        jne     .L4
        movaps  %xmm1, %xmm0
        movhlps %xmm1, %xmm0
        addps   %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        shufps  $85, %xmm1, %xmm0
        addps   %xmm1, %xmm0
.LEHB1:
        call    _Z1gf
        addq    $8, %rbx
        cmpq    %rbx, %r12
        jne     .L5

because we (rightfully so) refuse to sink into the outer loop.  What we
fail to do is hoist the reload out of the inner loop (I suppose
clang does exactly that).

We don't have any pass after reload that would perform loop invatiant motion,
I'm not sure how this situation is handled in general in RA - is a post-RA
pass optimizing the spill/reload placement "globally" usually done?

Reply via email to