[Bug c/27305] New: Compiler generates incorrect code when calling functions
Consider the following code: typedef unsigned int UINT32; typedef unsigned char BOOL; #define __SWI_BIOS_ContainerUsage 1234 #define __swicall1(type,name,type1,arg1)\ static inline type name(type1 arg1) { \ register long __r0 __asm__ ("r0") = (long)arg1; \ register long __res __asm__ ("r0"); \ __asm__ __volatile__ ("swi\t%2\n\t" \ : "=r" (__res)\ : "0" (__r0), "i" (__SWI_##name) \ : "r1", "r2", "r3", "ip", "lr", "cc", \ "memory"); \ return((type)__res);\ } __swicall1(UINT32,BIOS_ContainerUsage,BOOL,verbose); int sprintf(char *p, const char *frmt, ...); void testme(char *tmp) { sprintf(tmp, " %d%% Containers\n", BIOS_ContainerUsage(1)); sprintf(tmp, " %d%% Containers\n", 2 * BIOS_ContainerUsage(1)); } -- Summary: Compiler generates incorrect code when calling functions Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: blocker Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Eric dot Doenges at betty-tv dot com GCC host triplet: powerpc-apple-darwin8.5.0 GCC target triplet: arm-elf-unknown http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27305
[Bug c/27308] New: Compiler generates incorrect code when calling a function with the result of an inline function as parameter
Consider the following code: typedef unsigned int UINT32; typedef unsigned char BOOL; #define __SWI_BIOS_ContainerUsage 1234 #define __swicall1(type,name,type1,arg1) \ static inline type name(type1 arg1) { \ register long __r0 __asm__ ("r0") = (long)arg1; \ register long __res __asm__ ("r0"); \ __asm__ __volatile__ ("swi\t%2\n\t" \ : "=r" (__res) \ : "0" (__r0), "i" (__SWI_##name)\ : "r1", "r2", "r3", "ip", "lr", "cc", \ "memory"); \ return((type)__res); \ } __swicall1(UINT32,BIOS_ContainerUsage,BOOL,verbose); int sprintf(char *p, const char *frmt, ...); void testme(char *tmp) { sprintf(tmp, " %d%% Containers\n", BIOS_ContainerUsage(1)); sprintf(tmp, " %d%% Containers\n", 2 * BIOS_ContainerUsage(1)); } For the first call to sprintf, gcc generates the following assembler code: mov r0, #1 swi #1234 ldr r5, .L3 mov r0, r4 mov r1, r5 mov r2, r4 bl sprintf This is clearly wrong, since r2 should hold the result of the swi (which is returned in r0). For the second call to sprintf, gcc generates correct code: mov r0, #1 swi #1234 mov r2, r0, asl #1 mov r1, r5 mov r0, r4 ldmfd sp!, {r4, r5, lr} b sprintf -- Summary: Compiler generates incorrect code when calling a function with the result of an inline function as parameter Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: blocker Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Eric dot Doenges at betty-tv dot com GCC host triplet: powerpc-apple-darwin8.5.0 GCC target triplet: arm-elf-unknown http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27308
[Bug c/27308] Compiler generates incorrect code when calling a function with the result of an inline function as parameter
--- Comment #3 from Eric dot Doenges at betty-tv dot com 2006-04-25 14:37 --- Storing the result to memory generates correct code -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27308
[Bug c/27308] Compiler generates incorrect code when calling a function with the result of an inline function as parameter
--- Comment #4 from Eric dot Doenges at betty-tv dot com 2006-04-25 14:43 --- Removing the __asm__ ("r0") from __res works around the bug - but then I cannot depend on gcc always allocating r0 for __res, can I ? I found no other way to tell gcc which registers it must use. I'm assuming this is a bug in gcc, not the asm constraint, because the same code works flawlessly with gcc-3.4.3. As to simplifying the testcase - storing the result of BIOS_ContainerUsage to memory generates correct code regardless of wether __res is forced to r0 or not, making it worthless as a test case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27308
[Bug target/27308] Compiler generates incorrect code when calling a function with the result of an inline function as parameter
--- Comment #6 from Eric dot Doenges at betty-tv dot com 2006-04-26 06:26 --- Unfortunately, removing the __asm__ ("r0") from __r0 does not circumvent the problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27308
[Bug c/27016] New: ARM optimizer produces severely suboptimal code
The compiler creates extremely bad code for the ARM target. Consider the following source file: --- SNIP --- unsigned int code_in_ram[100]; void testme(void) { unsigned int *p_rom, *p_ram, *p_end, len; extern unsigned int _ram_erase_sector_start; extern unsigned int _ram_erase_sector_end; p_ram = code_in_ram; p_rom = &_ram_erase_sector_start; len = ((unsigned int)&_ram_erase_sector_end - (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int); for (p_rom = &_ram_erase_sector_start, p_end = &_ram_erase_sector_end; p_rom < p_end;) { *p_ram++ = *p_rom++; } } --- SNIP --- Compiled with arm-elf-gcc -mcpu=arm7tdmi -S -Os testme.c, we get the following code: --- SNIP --- .file "testme.c" .text .align 2 .global testme .type testme, %function testme: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r1, .L6 ldr r2, .L6+4 @ lr needed for prologue b .L2 .L3: ldr r3, [r1], #4 str r3, [r2, #-4] .L2: ldr r3, .L6+8 cmp r1, r3 add r2, r2, #4 bcc .L3 bx lr .L7: .align 2 .L6: .word _ram_erase_sector_start .word code_in_ram .word _ram_erase_sector_end .size testme, .-testme .comm code_in_ram,400,4 .ident "GCC: (GNU) 4.1.0" --- SNIP --- Even a cursory examination reveals that it would be a lot better to write: ldr r1, .L6 ldr r2, .L6+4 ldr r0, .L6+8 b .L2 .L3: ldr r3, [r1], #4 str r3, [r2], #4 .L2: cmp r1, r0 bcc .L3 bx lr This code would be one instruction shorter overall , and two instructions less in the loop. The way gcc-4.1.0 refuses to use post-indexed addressing for the store is especially bizzare, since it does use post-indexed addressing for the preceeding load. Gcc 3.4.3 does not exhibit this behaviour; it compiles the above code to: ldr r2, .L6 ldr r0, .L6+4 cmp r2,r0 ldr r1, .L6 movcs pc,lr .L4: ldr r2,[r2],#4 cmp r2, r0 str r3,[r1],#4 bcc .L4 mov pc,lr While not perfect either, this also only has 4 instructions in the loop. -- Summary: ARM optimizer produces severely suboptimal code Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Eric dot Doenges at betty-tv dot com GCC host triplet: powerpc-apple-darwin8.5.0 GCC target triplet: arm-elf-unknown http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016
[Bug middle-end/27016] ARM optimizer produces severely suboptimal code
--- Comment #2 from Eric dot Doenges at betty-tv dot com 2006-04-04 09:13 --- (In reply to comment #1) > This code is undefined: > len = ((unsigned int)&_ram_erase_sector_end > - (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int); > > That is obviously undefined as taking the difference between two pointers > which > are not in the same array is undefined code. > > Even the comparision: > p_rom < p_end; > is undefined. > In the code I took this snippet from, _ram_erase_sector_start and _ram_erase_sector_end are symbols generated by the linker at the start and the end of a special segment which I need to copy to ram, so I would argue that these pointers do in fact refer to the same "array" (in this case, the "array" is the entire flash memory). However, none of this should affect the decision to use (or not to use) the post-indexed addressing mode. If I replace the for loop with a for (len = 100; len > 0; --len), the quality of the generated code actually degrades even further: ldr r2, .L7 ldr r1, .L7+4 @ lr needed for prologue .L2: ldr r3, [r1, #-4] str r3, [r2, #-4] ldr r3, .L7+8 add r2, r2, #4 cmp r2, r3 add r1, r1, #4 bne .L2 bx lr While I thinks it's nifty that gcc recognizes that it doesn't need to keep the len variable, but instead uses p_ram to determine when the loop is finished, I also think it's pretty brain-dead that it won't use post-indexed addressing for either the ldr or str in the loop. And why it thinks it needs to load the constant end address to compare against every time inside the loop instead of once into a scratch register outside the loop is anyone's guess. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016