https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
--- Comment #33 from prathamesh3492 at gcc dot gnu.org --- Created attachment 42341 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42341&action=edit Test-case to reproduce regression with cortex-m7 I have attached an artificial test-case that is fairly representative of the regression we are seeing in a benchmark. The test-case mimics a deterministic finite automaton. With code-hoisting there's an additional spill of r5 near beginning of the function. Looking at the loop from the attached test-case: for (; *a && b != 'z'; a++) { next = *a; if (next == ',') { a++; break; } switch (b) { ... } } The for loop has same computation a++ in two sibling basic blocks, which gets hoisted. From PRE dump with code-hoisting: <bb 26> [23.80%] [count: INV]: # _25 = PHI <_151(25), _23(2)> # b_50 = PHI <b_152(25), 97(2)> # a_55 = PHI <a_153(25), a_28(2)> next_29 = (int) _25; _44 = a_55 + 1; if (next_29 == 44) goto <bb 27>; [5.00%] [count: INV] else goto <bb 12>; [95.00%] [count: INV] (a+1) seems to get hoisted in bb26: _44 = a_55 + 1 just before if (next_29 == 44) which corresponds to if (next == ',') condition. The issue I think is that there is a use of 'a' near end of function: *s = a; which possibly results in register pressure forcing the compiler to spill r5. Commenting out the assignment removes the spill. Looking at register allocation with code-hoisting, it seems r2 is used to hold the hoisted value (a + 1): r0 = s r1 = tab r3 = a r4 = b r5 = *a r2 = r3 + 1 (holding the hoisted value) And without code-hoisting, it seems only r3 is assigned to 'a'. r0 = s r1 = tab r2 = b r3 = a r4 = *a This is evident from asm differences for the early-exit code-path: if (next == ',') { a++; break; } <breaks to>: *s = a; return b; Without code-hoisting: .L2: cmp r4, #44 beq .L4 .L4: adds r3, r3, #1 ldr r4, [sp], #4 str r3, [r0] mov r0, r2 bx lr With code-hoisting: .L2: cmp r5, #44 add r2, r3, #1 beq .L3 .L3: str r2, [r0] mov r0, r4 pop {r4, r5} bx lr Without code-hoisting it is reusing r3 to store a + 1, while due to code hoisting it uses the extra register 'r2' to store the value of hoisted expression a + 1. Would it be a good idea to somehow "limit" the distance (in terms of number of basic blocks maybe?) between the definition of hoisted variable and it's furthest use during PRE ? If that exceeds a certain threshold then PRE should choose not to hoist that expression. The threshold could be a param that can be set by backends. Does this analysis look reasonable ? Thanks, Prathamesh