Hi. I would like to get started with how to improve code generation for a backend. Any pointers, especially to good documentation is welcome.
For this example consider this C function for a reference counted type: void TCRelease(TCTypeRef tc) { if (--tc->retainCount == 0) { if (tc->destroy) { tc->destroy(tc); } free((void *)tc); } } The generated m68k asm is this: _TCRelease: move.l %a2,-(%sp) move.l 8(%sp),%a2 move.w (%a2),%d0 ; Question 1: subq.w #1,%d0 move.w %d0,(%a2) jne .L7 move.l 4(%a2),%a0 ; Question 2: cmp.w #0,%a0 jeq .L9 move.l %a2,-(%sp) ; Question 3: jsr (%a0) addq.l #4,%sp .L9: move.l %a2,8(%sp) move.l (%sp)+,%a2 jra _free .L7: move.l (%sp)+,%a2 rts Question 1: This could be done as one instructions "sub.l #1, (%a2)", the result in d0 is never used again, and adding directly to memory will update the status flags. Would save 4 bytes, and 8 cycles on a 68000. How would I attack this problem? Peephole optimisation, or maybe the gcc is not aware that the instruction updates flags? Question 2: Doing this as a "move.l 4(%a2), %d0" to a temporary data register would update the status register, allowing for the branch without the compare with immediate instruction. Obviously requiring an extra "move %d0, %a0" if the branch is not taken to be able to make the jump. But still 2 bytes, and 8 cycles saved in work case (12 cycles is best case). Is this a peephole optimisation? Or is it about providing accurate instruction costs for inst? Question 3: Storing a2 on the stack is only ever needed if this code path is taken. Is this even worth to bother with? And is this something that moving from reload to LRA for the m68k target solves? // Fredrik Olsson