[Bug c/54888] New: GCC with -Os is faster than -O3 on some AVR code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54888 Bug #: 54888 Summary: GCC with -Os is faster than -O3 on some AVR code Classification: Unclassified Product: gcc Version: 4.3.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: m...@world3.net Created attachment 28411 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28411 Compiler output I am using AVR-GCC to write some very low power RTC related code. The interrupt "ISR(RTC_OVF_vect)" executes faster with -Os optimization than it does with -O1, -O2 or -O3. I have measured execution time on an oscilloscope to confirm. V4.3.3 is the one that comes with Atmel Studio / WinAVR. Command line: avr-gcc -funsigned-char -funsigned-bitfields -DF_CPU=800UL -O3 -fpack-struct -fshort-enums -g2 -Wall -c -std=gnu99 -MD -MP -MF "rtc.d" -MT"rtc.d" -MT"rtc.o" -mmcu=atxmega128d3 -o"rtc.o" ".././rtc.c" I don't get any warnings etc. when compiling. Build machine is Windows 7 x64. Target is an XMEGA128D3, same issue confirmed with the 128A3U (unsurprisingly). The problem appears to be with GCC, rather than avr-libc, but please correct me if I am wrong.
[Bug c/54888] GCC with -Os is faster than -O3 on some AVR code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54888 --- Comment #1 from mojo at world3 dot net 2012-10-10 14:51:26 UTC --- Created attachment 28412 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28412 Compiler output with -O3
[Bug target/54888] GCC with -Os is faster than -O3 on some AVR code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54888 --- Comment #3 from mojo at world3 dot net 2012-10-22 12:40:57 UTC --- (In reply to comment #2) > And I actually don't understand teh issue: Optimizing for size does not > require > to produce slow code. The code may run fast. -O3 is supposed to produce the fastest possible code, but it doesn't. -Os is faster. At the very least the two should be equal. In other words -O3 is broken.
[Bug c/79269] New: Calculate size of struct with flexible array at compile time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79269 Bug ID: 79269 Summary: Calculate size of struct with flexible array at compile time Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mojo at world3 dot net Target Milestone: --- As per the C99 standard, GCC treats structs with flexible arrays as if the array had a size of 0. GCC offers an extension to allow such structs to be initialized. For example: struct { char a; char b[]; } test = { 10, { 0, 1, 2, 3 } }; In this case sizeof(test) will return 1, because b[] is considered to have a size of zero even though GCC will correctly allocate 4 bytes and initialize them. To compliment this extension, a new builtin similar to sizeof() could be created that returns the actual size of the struct (5 in this example). This would be extremely useful for embedded systems in particular. For example, one could: struct { char length; char array[]; } test = { __new_sizeof(test) - 1, { 0, 1, 2, 3 } };
[Bug c/79269] Calculate size of struct with flexible array at compile time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79269 --- Comment #2 from mojo at world3 dot net --- Thanks. __builtin_object_size() works well at runtime and solves me immediate need, which spurred me to suggest this enhancement. After giving it some thought I agree with you, I can't see any easy way to handle incomplete objects.
[Bug c/63760] New: Support __func__ in PROGMEM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63760 Bug ID: 63760 Summary: Support __func__ in PROGMEM Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mojo at world3 dot net The C99 __func__ magic variable is defined as: static const char __func__[] = "function-name"; On architectures such as AVR where there is a distinction between program memory and RAM this definition is not ideal. It will place the string in RAM, even though it is constant. Devices using these Harvard style architectures typically have very little RAM, a few kilobytes at most. I suggest adding a __funcP__ magic variable that is defined as: static const char __funcP__[] PROGMEM = "function-name"; That way the string will end up in program memory, which is almost always much larger than available RAM and is commonly used for the storage of constant strings.
[Bug c/63760] Support __func__ in PROGMEM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63760 --- Comment #2 from mojo at world3 dot net --- On platforms with this kind of architecture the default is to place everything in RAM, unless you specifically state otherwise. With Harvard style architectures different instructions are used to access RAM and program memory. GCC doesn't handle that natively, so you need to add things like PROGMEM or __flash to tell it where to store and how to access the data.
[Bug c/63760] Support __func__ in PROGMEM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63760 --- Comment #4 from mojo at world3 dot net --- I agree, a separate __funcP__is the best option.
[Bug c/100962] New: Poor optimization of AVR code when using structs in __flash
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962 Bug ID: 100962 Summary: Poor optimization of AVR code when using structs in __flash Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mojo at world3 dot net Target Milestone: --- Example code here: https://godbolt.org/z/1hnPoGdTd In this code a const __flash struct holds some data used to initialize peripherals. Line 59 is the definition of the struct. With the __flash attribute the generated AVR assembly uses the X register as a pointer to the peripheral. The X pointer lacks displacement with LDI so rather inefficient code is generated, e.g. 141 channels[ch].dma.ch->TRFCNT = BUFFER_SIZE; 142 channels[ch].dma.ch->REPCNT = 0; ldi r18,lo8(26) ldi r19,0 adiw r26,4 st X+,r18 st X,r19 sbiw r26,4+1 adiw r26,6 st X,__zero_reg__ sbiw r26,6 Removing the __flash attribute produces much better code, with the Z register used with displacement. The issue appears to be because the other pointer register that supports displacement, Y, is used for the stack so unavailable. Introducing the need to use LPM instructions to read data from flash seems to cause Z not to be used for the peripheral, with X used instead. Z is used only for LPM. The best possible optimisation here seems to be to read all values needed from flash first, and then switch to using Z as a pointer to the peripheral.
[Bug target/100962] Poor optimization of AVR code when using structs in __flash
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962 --- Comment #2 from mojo at world3 dot net --- avr-gcc-11.1.0-x64-windows>bin\avr-gcc -Og -xc -Wall -mmcu=atxmega64a1u test.c avr-gcc-11.1.0-x64-windows>bin\avr-objdump -h -S a.out > list.s Still producing code like this 2de: 18 97 sbiwr26, 0x08 ; 8 2e0: 19 96 adiwr26, 0x09 ; 9 Thanks.
[Bug target/100962] Poor optimization of AVR code when using structs in __flash
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962 --- Comment #3 from mojo at world3 dot net --- Apologies, I noticed I had -Og on. Tried with -O3 and it optimised the struct away. With -O2 it uses the Z register with displacement, reading data from flash. So it seems that only -Og produces poor code with V11. The older version 5.4.0 has issues either way. Not sure if that is a bug or just necessary for debugging.