https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Daniel Fruzynski from comment #3)
> What about adding new pass at the end? It would look for various possible
> optimizations, which were missed earlier because they are cross-basic block.

We do have post-reload compare elimination that works cross-BB. However, it
runs before BB-reordering pass (which duplicates the compare), so this is what
compare elimination pass sees:

    1: NOTE_INSN_DELETED
    6: NOTE_INSN_BASIC_BLOCK 2
    3: NOTE_INSN_FUNCTION_BEG
    4: dx:SI=0x1
      REG_EQUAL 0x1
    5: ax:SI=0
      REG_EQUAL 0
   11: L11:
   12: NOTE_INSN_BASIC_BLOCK 3
   13: flags:CCZ=cmp(dx:SI,0)         <--- here is the compare
   14: pc={(flags:CCZ==0)?L24:pc}
      REG_BR_PROB 536870916
   15: NOTE_INSN_BASIC_BLOCK 4
   17: dx:DI=sign_extend(ax:SI)
   18: dx:SI=[dx:DI*0x4+`data']
   19: {dx:SI=dx:SI<<0x1;clobber flags:CC;}  <--- here is the "add"
   20: {ax:SI=ax:SI+0x1;clobber flags:CC;}
   35: pc=L11
   36: barrier
   24: L24:
   25: NOTE_INSN_BASIC_BLOCK 5
   26: {ax:SI=ax:SI-0x1;clobber flags:CC;}
   28: dx:DI=sign_extend(ax:SI)
   29: dx:SI=[dx:DI*0x4+`data']
   37: pc=L11
   38: barrier
   39: NOTE_INSN_DELETED

The pass can't do anything in this case, several edges are going into BB3.

FYI, BB-reorder pass creates:

    6: NOTE_INSN_BASIC_BLOCK 2
   41: NOTE_INSN_PROLOGUE_END
    3: NOTE_INSN_FUNCTION_BEG
    4: dx:SI=0x1
      REG_EQUAL 0x1
   46: {ax:DI=0;clobber flags:CC;}

   11: L11:
   12: NOTE_INSN_BASIC_BLOCK 3
   13: flags:CCZ=cmp(dx:SI,0)
      REG_DEAD dx:SI
   14: pc={(flags:CCZ==0)?L24:pc}
      REG_DEAD flags:CCZ
      REG_BR_PROB 536870916

   15: NOTE_INSN_BASIC_BLOCK 4
   17: dx:DI=sign_extend(ax:SI)
   18: dx:SI=[dx:DI*0x4+`data']
   19: {dx:SI=dx:SI<<0x1;clobber flags:CC;}
      REG_UNUSED flags:CC
   20: {ax:SI=ax:SI+0x1;clobber flags:CC;}
      REG_UNUSED flags:CC

   50: NOTE_INSN_BASIC_BLOCK 5
   48: flags:CCZ=cmp(dx:SI,0)
      REG_DEAD dx:SI
   49: pc={(flags:CCZ==0)?L24:pc}
      REG_DEAD flags:CCZ
      REG_BR_PROB 536870916

and then scheduler reorders insn stream to:

   ...
   15: NOTE_INSN_BASIC_BLOCK 4
   17: dx:DI=sign_extend(ax:SI)
   20: {ax:SI=ax:SI+0x1;clobber flags:CC;}
      REG_UNUSED flags:CC
   18: dx:SI=[dx:DI*0x4+`data']
   19: {dx:SI=dx:SI<<0x1;clobber flags:CC;}
      REG_UNUSED flags:CC
   48: flags:CCZ=cmp(dx:SI,0)
      REG_DEAD dx:SI
   49: pc={(flags:CCZ!=0)?L51:pc}
      REG_DEAD flags:CCZ
      REG_BR_PROB 536870916
   ...

So, this is indeed an exceptional situation, it is pure coincidence that the
optimization possibility is created. We *can* rerun compare elimination pass
after the scheduler, but I suspect the above case will be the one and only it
will catch.

I'm curious if removing this extra instruction really improves the runtime
above the noise level. Can you profile the loop w/ and w/o compare insn using
perf?

Reply via email to