https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94427
--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> --- OK, so it turns out the identified commit only allows us to shoot ourselves in the foot - and there one too few branches, not too many. The hottest loop, consuming most of the time is: Percent Instructions ------------------------------------------------ 0.03 │ fb0:┌─+add -0x8(%r9,%rcx,4),%eax 5.03 │ │ mov %eax,-0x4(%r13,%rcx,4) 2.48 │ │ mov -0x8(%r8,%rcx,4),%esi 0.02 │ │ add -0x8(%rdx,%rcx,4),%esi 0.06 │ │ cmp %eax,%esi 4.49 │ │ cmovge %esi,%eax 17.17 │ │ mov %ecx,%esi 0.03 │ │ cmp $0xc521974f,%eax 3.50 │ │ cmovl %ebx,%eax <----------- this used to be a branch 21.84 │ │ mov %eax,-0x4(%r13,%rcx,4) 3.88 │ │ add $0x1,%rcx 0.00 │ │ cmp %rdi,%rcx 0.04 │ └──jne fb0 where the marked conditional move was a branch one revision before, because, after fwprop3 the IL looked like: <bb 16> [local count: 955630217]: # cstore_281 = PHI <[fast_algorithms.c:142:53] sc_223(14), [fast_algorithms.c:142:53] cstore_249(15)> [fast_algorithms.c:142:49] MEM <int> [(void *)_72] = cstore_281; [fast_algorithms.c:143:13] _78 = [fast_algorithms.c:143:13] *_72; [fast_algorithms.c:143:10] if (_78 < -987654321) goto <bb 18>; [50.00%] else goto <bb 17>; [50.00%] <bb 17> [local count: 477815109]: <bb 18> [local count: 955630217]: # cstore_250 = PHI <[fast_algorithms.c:143:33] -987654321(16), [fast_algorithms.c:143:33] cstore_281(17)> [fast_algorithms.c:143:29] MEM <int> [(void *)_72] = cstore_250; The aforementioned revision turned this into more optimized code: <bb 16> [local count: 955630217]: # cstore_281 = PHI <[fast_algorithms.c:142:53] sc_223(14), [fast_algorithms.c:142:53] _73(15)> [fast_algorithms.c:143:10] if (cstore_281 < -987654321) goto <bb 18>; [50.00%] else goto <bb 17>; [50.00%] <bb 17> [local count: 477815109]: <bb 18> [local count: 955630217]: # cstore_250 = PHI <[fast_algorithms.c:143:33] -987654321(16), [fast_algorithms.c:143:33] cstore_281(17)> [fast_algorithms.c:143:29] MEM <int> [(void *)_72] = cstore_250; Which then phiopt3 changed to: cstore_248 = MAX_EXPR <cstore_249, -987654321>; [fast_algorithms.c:143:29] MEM <int> [(void *)_72] = cstore_248; and expander apparently always expands MAX_EXPR into a conditional move if it can(?). When I hacked phiopt not to do the transformation for - ehm - any GIMPLE_COND statement originating from source line 143, I recovered the original run-time of the benchmark. On both AMD and Intel.