http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46779
Summary: wrong code generation for array access Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: mschu...@ivs.cs.ovgu.de CC: mschu...@ivs.cs.ovgu.de Target: avr-*-* Created attachment 22611 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22611 example program for reproducing the wrong code generation The gcc versions 4.4.0-4.4.5 generates wrong code for an array access if some thing come together and it was very difficult to produce a nearly minimal test case. It seems to be that the generation of the code goes wrong if using size optimization, inline assembler and nested loops. Maybe the optimizer runs out of usable registers, because some registers are globbered by the inline assembler. The inline assembler is not from my self, because I used a macro from the avr-libc (version 1.6.8) for filling up a boot page for later writing this into the flash. The relevant code look as follows (in the code I expanded the macro directly): uint8_t array[256]={'A','B'}; int main(void) { uint8_t *buf=array; uint32_t page=0; uint16_t w; uint8_t y; uint16_t i; for (y=0;y<100;++y) { page=((uint16_t)y)<<8; for (i=0; i<10; i+=2) { w = (buf[i+1]); w<<=8; w|= buf[i]; __asm__ __volatile__ ( "movw r0, %4\n\t" "movw r30, %A3\n\t" "sts %1, %C3\n\t" "sts %0, %2\n\t" "spm\n\t" "clr r1\n\t" : : "i" (_SFR_MEM_ADDR(__SPM_REG)), "i" (_SFR_MEM_ADDR(RAMPZ)), "r" ((uint8_t)__BOOT_PAGE_FILL), "r" ((uint32_t)(page+i)), "r" ((uint16_t)w) : "r0", "r30", "r31" ); } } return 0; } To reproduce the bug, compile the provided attachment with: avr-gcc -Os main.cc -mmcu=atmega128 This generates, showing only the inner loop: ea: 60 e0 ldi r22, 0x00 ; 0 ec: eb 01 movw r28, r22 ee: 6c 91 ld r22, X f0: 70 e0 ldi r23, 0x00 ; 0 f2: 6c 2b or r22, r28 f4: 7d 2b or r23, r29 f6: 0b 01 movw r0, r22 f8: f9 01 movw r30, r18 fa: 40 93 5b 00 sts 0x005B, r20 fe: 10 93 68 00 sts 0x0068, r17 102: e8 95 spm 104: 11 24 eor r1, r1 106: 12 96 adiw r26, 0x02 ; 2 108: 2e 5f subi r18, 0xFE ; 254 10a: 3f 4f sbci r19, 0xFF ; 255 10c: 4f 4f sbci r20, 0xFF ; 255 10e: 5f 4f sbci r21, 0xFF ; 255 110: 71 e0 ldi r23, 0x01 ; 1 112: aa 30 cpi r26, 0x0A ; 10 114: b7 07 cpc r27, r23 116: 49 f7 brne .-46 ; 0xea <main+0x1c> and you see at 0xee the RAM is read, but only at this position, however, in the C-source we have two reads. This example compiled with gcc version 4.4.x generates wrong code, instead using gcc version 4.5.x it works as it should. However, I am not sure if this is fixed there or is this bug there also latently contained. Maybe, it is bug in the optimizer, which only needs another example to show up there too. Some information to the used compiler: avr-gcc -v Using built-in specs. Target: avr Configured with: /tmp/cross-build/gcc-4.4.0/configure --target=avr --prefix=/localapp/cross-gcc/builds/2.20.1-4.4.0-7.1/avr --program-prefix=avr- --with-gnu-ld --with-gnu-as --enable-languages=c,c++ Thread model: single gcc version 4.4.0 (GCC) The other compiler version are compiled with same configure flags.