https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88271
--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Daniel Fruzynski from comment #3) > What about adding new pass at the end? It would look for various possible > optimizations, which were missed earlier because they are cross-basic block. We do have post-reload compare elimination that works cross-BB. However, it runs before BB-reordering pass (which duplicates the compare), so this is what compare elimination pass sees: 1: NOTE_INSN_DELETED 6: NOTE_INSN_BASIC_BLOCK 2 3: NOTE_INSN_FUNCTION_BEG 4: dx:SI=0x1 REG_EQUAL 0x1 5: ax:SI=0 REG_EQUAL 0 11: L11: 12: NOTE_INSN_BASIC_BLOCK 3 13: flags:CCZ=cmp(dx:SI,0) <--- here is the compare 14: pc={(flags:CCZ==0)?L24:pc} REG_BR_PROB 536870916 15: NOTE_INSN_BASIC_BLOCK 4 17: dx:DI=sign_extend(ax:SI) 18: dx:SI=[dx:DI*0x4+`data'] 19: {dx:SI=dx:SI<<0x1;clobber flags:CC;} <--- here is the "add" 20: {ax:SI=ax:SI+0x1;clobber flags:CC;} 35: pc=L11 36: barrier 24: L24: 25: NOTE_INSN_BASIC_BLOCK 5 26: {ax:SI=ax:SI-0x1;clobber flags:CC;} 28: dx:DI=sign_extend(ax:SI) 29: dx:SI=[dx:DI*0x4+`data'] 37: pc=L11 38: barrier 39: NOTE_INSN_DELETED The pass can't do anything in this case, several edges are going into BB3. FYI, BB-reorder pass creates: 6: NOTE_INSN_BASIC_BLOCK 2 41: NOTE_INSN_PROLOGUE_END 3: NOTE_INSN_FUNCTION_BEG 4: dx:SI=0x1 REG_EQUAL 0x1 46: {ax:DI=0;clobber flags:CC;} 11: L11: 12: NOTE_INSN_BASIC_BLOCK 3 13: flags:CCZ=cmp(dx:SI,0) REG_DEAD dx:SI 14: pc={(flags:CCZ==0)?L24:pc} REG_DEAD flags:CCZ REG_BR_PROB 536870916 15: NOTE_INSN_BASIC_BLOCK 4 17: dx:DI=sign_extend(ax:SI) 18: dx:SI=[dx:DI*0x4+`data'] 19: {dx:SI=dx:SI<<0x1;clobber flags:CC;} REG_UNUSED flags:CC 20: {ax:SI=ax:SI+0x1;clobber flags:CC;} REG_UNUSED flags:CC 50: NOTE_INSN_BASIC_BLOCK 5 48: flags:CCZ=cmp(dx:SI,0) REG_DEAD dx:SI 49: pc={(flags:CCZ==0)?L24:pc} REG_DEAD flags:CCZ REG_BR_PROB 536870916 and then scheduler reorders insn stream to: ... 15: NOTE_INSN_BASIC_BLOCK 4 17: dx:DI=sign_extend(ax:SI) 20: {ax:SI=ax:SI+0x1;clobber flags:CC;} REG_UNUSED flags:CC 18: dx:SI=[dx:DI*0x4+`data'] 19: {dx:SI=dx:SI<<0x1;clobber flags:CC;} REG_UNUSED flags:CC 48: flags:CCZ=cmp(dx:SI,0) REG_DEAD dx:SI 49: pc={(flags:CCZ!=0)?L51:pc} REG_DEAD flags:CCZ REG_BR_PROB 536870916 ... So, this is indeed an exceptional situation, it is pure coincidence that the optimization possibility is created. We *can* rerun compare elimination pass after the scheduler, but I suspect the above case will be the one and only it will catch. I'm curious if removing this extra instruction really improves the runtime above the noise level. Can you profile the loop w/ and w/o compare insn using perf?