https://sourceware.org/bugzilla/show_bug.cgi?id=30333
Bug ID: 30333 Summary: [avr-ld] NOPs not removed after rcall for devices with >8k of flash even with -mrelax Product: binutils Version: 2.40 Status: UNCONFIRMED Severity: normal Priority: P2 Component: ld Assignee: unassigned at sourceware dot org Reporter: sourceware-bugzilla at mhxnet dot de Target Milestone: --- Created attachment 14811 --> https://sourceware.org/bugzilla/attachment.cgi?id=14811&action=edit Reproduction code & build script I've been working on a bootloader for xmega3 cores recently and noticed that as soon as I compile the code for devices with more than 8k of flash, the size of the binary increases by more than 20 bytes (almost 5% of the bootloader binary size). The issue isn't limited to xmega3, though, and I've used an older core in the examples further down. My assumption from reading various pieces of documentation is that `-mrelax` is supposed to take care of replacing long calls with short calls and shrinking the holes in the binary accordingly. If it doesn't shrink the binary, there's no obvious point apart from the short calls executing in one cycle less. From glancing at the code, it seems that shrinking of some sort is implemented, but it's not clear to me if it's doing what it's supposed to. Here's an example to reproduce the behaviour: static void __attribute__((__noinline__)) f(void) { *((volatile char *) 0x0140) = 42; } __attribute__((naked, section(".vectors"), noreturn)) void start(void) { f(); for(;;){} __builtin_unreachable(); } Compiling this with avr-gcc -mmcu=atmega88 -Os -mrelax -nostartfiles -nostdlib -o x.elf x.c yields: 00000000 <start>: 0: 01 d0 rcall .+2 ; 0x4 <f> 00000002 <.L3>: 2: ff cf rjmp .-2 ; 0x2 <.L3> 00000004 <f>: 4: 8a e2 ldi r24, 0x2A ; 42 6: 80 93 40 01 sts 0x0140, r24 ; 0x800140 <_end+0x40> a: 08 95 ret Compiling it instead for `atmega168`: 00000000 <start>: 0: 02 d0 rcall .+4 ; 0x6 <f> 2: 00 00 nop 00000004 <.L3>: 4: ff cf rjmp .-2 ; 0x4 <.L3> 00000006 <f>: 6: 8a e2 ldi r24, 0x2A ; 42 8: 80 93 40 01 sts 0x0140, r24 ; 0x800140 <_end+0x40> c: 08 95 ret Dropping the `-mrelax` will generate a `call` instead of an `rcall`+`nop`. My expectation would be that, at least with `-mrelax`, I get an `rcall` without a `nop` regardless of the flash size of the MCU. If this isn't a bug, I'd like to understand why, as I haven't found any documentation that would explain this behaviour. I'm using `crossdev`-based builds of gcc/binutils on Gentoo Linux. avr-gcc (Gentoo 13.0.1_pre20230305 p8) 13.0.1 20230305 (experimental) GNU ld (Gentoo 2.40 p4) 2.40.0 The behaviour doesn't change if I e.g. use an older version of gcc. I'm attaching a tarball with the reproduction code and a script to build ELF and disassembly files for two MCUs. I'm more than happy to provide more information if needed. -- You are receiving this mail because: You are on the CC list for the bug.