[Bug c/54888] New: GCC with -Os is faster than -O3 on some AVR code

2012-10-10 Thread mojo at world3 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54888



 Bug #: 54888

   Summary: GCC with -Os is faster than -O3 on some AVR code

Classification: Unclassified

   Product: gcc

   Version: 4.3.3

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: m...@world3.net





Created attachment 28411

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28411

Compiler output



I am using AVR-GCC to write some very low power RTC related code. The interrupt

"ISR(RTC_OVF_vect)" executes faster with -Os optimization than it does with

-O1, -O2 or -O3. I have measured execution time on an oscilloscope to confirm.



V4.3.3 is the one that comes with Atmel Studio / WinAVR. Command line:



avr-gcc -funsigned-char -funsigned-bitfields -DF_CPU=800UL  -O3

-fpack-struct -fshort-enums -g2 -Wall -c -std=gnu99 -MD -MP -MF "rtc.d"

-MT"rtc.d" -MT"rtc.o"  -mmcu=atxmega128d3   -o"rtc.o" ".././rtc.c"



I don't get any warnings etc. when compiling. Build machine is Windows 7 x64.

Target is an XMEGA128D3, same issue confirmed with the 128A3U (unsurprisingly).



The problem appears to be with GCC, rather than avr-libc, but please correct me

if I am wrong.


[Bug c/54888] GCC with -Os is faster than -O3 on some AVR code

2012-10-10 Thread mojo at world3 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54888



--- Comment #1 from mojo at world3 dot net 2012-10-10 14:51:26 UTC ---

Created attachment 28412

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28412

Compiler output with -O3


[Bug target/54888] GCC with -Os is faster than -O3 on some AVR code

2012-10-22 Thread mojo at world3 dot net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54888



--- Comment #3 from mojo at world3 dot net 2012-10-22 12:40:57 UTC ---

(In reply to comment #2)



> And I actually don't understand teh issue: Optimizing for size does not 
> require

> to produce slow code.  The code may run fast.



-O3 is supposed to produce the fastest possible code, but it doesn't. -Os is

faster. At the very least the two should be equal.



In other words -O3 is broken.


[Bug c/79269] New: Calculate size of struct with flexible array at compile time

2017-01-29 Thread mojo at world3 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79269

Bug ID: 79269
   Summary: Calculate size of struct with flexible array at
compile time
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mojo at world3 dot net
  Target Milestone: ---

As per the C99 standard, GCC treats structs with flexible arrays as if the
array had a size of 0. GCC offers an extension to allow such structs to be
initialized. For example:

struct {
char a;
char b[];
} test = { 10, { 0, 1, 2, 3 } };

In this case sizeof(test) will return 1, because b[] is considered to have a
size of zero even though GCC will correctly allocate 4 bytes and initialize
them.

To compliment this extension, a new builtin similar to sizeof() could be
created that returns the actual size of the struct (5 in this example).

This would be extremely useful for embedded systems in particular. For example,
one could:

struct {
char length;
char array[];
} test = { __new_sizeof(test) - 1, { 0, 1, 2, 3 } };

[Bug c/79269] Calculate size of struct with flexible array at compile time

2017-01-29 Thread mojo at world3 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79269

--- Comment #2 from mojo at world3 dot net ---
Thanks. __builtin_object_size() works well at runtime and solves me immediate
need, which spurred me to suggest this enhancement.

After giving it some thought I agree with you, I can't see any easy way to
handle incomplete objects.

[Bug c/63760] New: Support __func__ in PROGMEM

2014-11-06 Thread mojo at world3 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63760

Bug ID: 63760
   Summary: Support __func__ in PROGMEM
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mojo at world3 dot net

The C99 __func__ magic variable is defined as:

static const char __func__[] = "function-name";

On architectures such as AVR where there is a distinction between program
memory and RAM this definition is not ideal. It will place the string in RAM,
even though it is constant. Devices using these Harvard style architectures
typically have very little RAM, a few kilobytes at most.

I suggest adding a __funcP__ magic variable that is defined as:

static const char __funcP__[] PROGMEM = "function-name";

That way the string will end up in program memory, which is almost always much
larger than available RAM and is commonly used for the storage of constant
strings.


[Bug c/63760] Support __func__ in PROGMEM

2014-11-06 Thread mojo at world3 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63760

--- Comment #2 from mojo at world3 dot net ---
On platforms with this kind of architecture the default is to place everything
in RAM, unless you specifically state otherwise.

With Harvard style architectures different instructions are used to access RAM
and program memory. GCC doesn't handle that natively, so you need to add things
like PROGMEM or __flash to tell it where to store and how to access the data.


[Bug c/63760] Support __func__ in PROGMEM

2014-11-14 Thread mojo at world3 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63760

--- Comment #4 from mojo at world3 dot net ---
I agree, a separate __funcP__is the best option.


[Bug c/100962] New: Poor optimization of AVR code when using structs in __flash

2021-06-08 Thread mojo at world3 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962

Bug ID: 100962
   Summary: Poor optimization of AVR code when using structs in
__flash
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mojo at world3 dot net
  Target Milestone: ---

Example code here: https://godbolt.org/z/1hnPoGdTd

In this code a const __flash struct holds some data used to initialize
peripherals. Line 59 is the definition of the struct.

With the __flash attribute the generated AVR assembly uses the X register as a
pointer to the peripheral. The X pointer lacks displacement with LDI so rather
inefficient code is generated, e.g.

141 channels[ch].dma.ch->TRFCNT = BUFFER_SIZE;
142 channels[ch].dma.ch->REPCNT = 0;

ldi r18,lo8(26)
ldi r19,0
adiw r26,4
st X+,r18
st X,r19
sbiw r26,4+1
adiw r26,6
st X,__zero_reg__
sbiw r26,6

Removing the __flash attribute produces much better code, with the Z register
used with displacement.

The issue appears to be because the other pointer register that supports
displacement, Y, is used for the stack so unavailable. Introducing the need to
use LPM instructions to read data from flash seems to cause Z not to be used
for the peripheral, with X used instead. Z is used only for LPM.

The best possible optimisation here seems to be to read all values needed from
flash first, and then switch to using Z as a pointer to the peripheral.

[Bug target/100962] Poor optimization of AVR code when using structs in __flash

2021-06-08 Thread mojo at world3 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962

--- Comment #2 from mojo at world3 dot net ---
avr-gcc-11.1.0-x64-windows>bin\avr-gcc -Og  -xc -Wall -mmcu=atxmega64a1u test.c
avr-gcc-11.1.0-x64-windows>bin\avr-objdump -h -S a.out > list.s

Still producing code like this

 2de:   18 97   sbiwr26, 0x08   ; 8
 2e0:   19 96   adiwr26, 0x09   ; 9

Thanks.

[Bug target/100962] Poor optimization of AVR code when using structs in __flash

2021-06-08 Thread mojo at world3 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100962

--- Comment #3 from mojo at world3 dot net ---
Apologies, I noticed I had -Og on. Tried with -O3 and it optimised the struct
away. With -O2 it uses the Z register with displacement, reading data from
flash.

So it seems that only -Og produces poor code with V11. The older version 5.4.0
has issues either way. Not sure if that is a bug or just necessary for
debugging.